NEON non-vector 32 x 32 += 64 MAC ?
Posted 25 May 2012 - 01:35 AM
Is there a NEON instruction to compute just a single 32 x 32 multiplication adding to a 64 bit accumulator? If so, can it issue every cycle? Is there a stall between back-to-back MACs of this type accumulating to the same register?
Thanks for your help!
Posted 25 May 2012 - 08:57 AM
No, but the ARM instruction set does - see http://infocenter.ar...b/CIHBJEHG.html
> Is there a stall between back-to-back MACs of this type accumulating to the same register
I'm not 100% on this specific case, but typically there are no stalls. Back-to-back MAC instructions are common, and so they are designed to pipeline with no bubbles.