I need to do some multiplication of two arrays and I'm trying to learn some NEON assembly.
I have 2 arrays of int16_t elements. Each array has 4 elements (a[0]-a[3] and b[0]-b[3])
I need to produce resulting array c with 4 int16_t values as:
c[0] = a[0] * b[0] c[1] = a[0] * b[1] + a[1] * b[1] c[2] = a[0] * b[2] + a[1] * b[2] + a[2] * b[2] c[3] = a[0] * b[3] + a[1] * b[3] + a[2] * b[3] + a[3] * b[3]
I'm sure that something like that should be trivial in NEON but I have no idea how to get it working.
My approach is like this:
vmov.32 d0, #0 // (destination array c) //load arrays a and b into d1 and d2: vld1.16 d1, [r0] vld1.16 d2, [r1] vmla.s16 d0, d1, d2[0] // 1st column // ? TODO... rotate vmla.s16 d0, d1, d2[1] // 2nd column vmla.s16 d0, d1, d2[2] vmla.s16 d0, d1, d2[3]
Basically, at the place of my TODO I want to shift elements of array b so that b becomes:
{b[0], b[1], b[2], b[3]} -> {0, b[0], b[1], b[2]}
Is my approach correct, or I cannot do so in arm-neon?
PS. I tried to use intrinsics with evaluation version of RVDS and it seems that it doesn't work: the generated asm is empty and doesn't have these instructions at all!
Thanks.
















