From the Cortex A8 & A9 NEON manuals, it appears that the NEON unit can compute a pair of 32 x 32 multiplications adding to a pair of 64 bit accumulators every other cycle.
Is there a NEON instruction to compute just a single 32 x 32 multiplication adding to a 64 bit accumulator? If so, can it issue every cycle? Is there a stall between back-to-back MACs of this type accumulating to the same register?
Thanks for your help!
»
Quick Links
Page 1 of 1
NEON non-vector 32 x 32 += 64 MAC ?
#2
Posted 25 May 2012 - 08:57 AM
> Is there a NEON instruction to compute just a single 32 x 32 multiplication adding to a 64 bit accumulator?
No, but the ARM instruction set does - see http://infocenter.ar...b/CIHBJEHG.html
> Is there a stall between back-to-back MACs of this type accumulating to the same register
I'm not 100% on this specific case, but typically there are no stalls. Back-to-back MAC instructions are common, and so they are designed to pipeline with no bubbles.
No, but the ARM instruction set does - see http://infocenter.ar...b/CIHBJEHG.html
> Is there a stall between back-to-back MACs of this type accumulating to the same register
I'm not 100% on this specific case, but typically there are no stalls. Back-to-back MAC instructions are common, and so they are designed to pipeline with no bubbles.
When optimizing software, consider that the quickest code to run is the bit you removed from the call path.
Share this topic:
Page 1 of 1














