Quick Links
Cortex M4 FPU against fixed point math
#1
Posted 01 March 2012 - 10:03 AM
I'm working on 3 cortex, a STM32F027, cortex M3, a TI Cortex M4 and an Infineon Cortex M4.
I would like to move from a TI C2000 TMS320F2810 (fixed point 32bit core) to an M4 to control a 3 phase power bridge.
My algorithms nowadays work in fixed point math, IQ22, and are based for 98% on simple multiplications and some sine/cosine calculations: PI, PID, Pll, low pass filter, notch, ..
I ported the algorithm in the cortex mainly redifining the IQmpy, moltiplication, and the IQsin, sine calculation first in fixed point then in floating point.
I was expencitng to have a speed improvment running in floating because every multiplication in fixed math requires a shift while in floating I don't need the shift but I'm exeriencing a dramatic slow down of the algorithm running in floating point.
I'm doing my test in IAR.
I checked the assembler and I verified the compiler is using the floating point.
My only explenation is that the FPU doesn't have, as far as I know, direct access to the CPU registers so every multiplication in FPU requires 2 loads to the FPU registers and another load to move the result to the CPU register.
Is there anybody that can confirm me that?
Thank you very much
michele
#2
Posted 01 March 2012 - 10:56 AM
Without looking at the code I cannot be sure what is happening, but maybe the switching between the IQ22 and single precision is possibly the main issue. It is not just copying the data from integer register to floating point register and back, as you will also need to add in the exponent and sign bit, and the IEEE754 single precision format use 23 bits rather than 22 bits, so you might have additional shift operations there.
Can you change all the operations to single precision floating point?
regards,
Joseph
#3
Posted 01 March 2012 - 01:38 PM
I'll post a couple of example with the assembler.
In my case I'm completely switching all my code from IQ22 to float and I verified it is using the floating point.
Let's me make a "simple" question: is it true that the FPU doesn't have direct access to the core registers so to perform an operation in the FPU I have to load the data from the CPU registers to the FPU and back?
I check the manual but I'm not sure if I understood right: in the assembler it seems to me to see some load operations.
Thank you very much for your help
michele
#4
Posted 01 March 2012 - 03:16 PM
Yes, you are correct.
The floating instructions operates on the floating point register bank. There are instructions to transfer floating point data to/from memory. So in theory the floating point data do not have to go through the integer register bank at all. But when mixing with IQ22 or fixed point, which (assumed) are processed in the integer registers, then it has to be transferred and converted between the two register bank. Instructions to convert between floating point and fixed point are available. So even the conversion is needed it shouldn't be too much worst.
The instruction set of the Cortex-M4 floating point unit can be found in this pdf document:
http://infocenter.ar...tex_m4_dgug.pdf
or from ARM Infocenter:
http://infocenter.ar.../help/index.jsp
-> Developer Guides and Articles
-> Software Development
-> Cortex-M4 Devices Generic User Guide
Potentially there are other areas that can make the performance worst
- accidentally used double precision data/functions
- Compiler/run-time library setting (e.g. hard VFP vs soft VFP)
regards,
Joseph
#5
Posted 02 March 2012 - 11:00 AM
I verified that there was a cast error in my algorythms moving from IQ math to float: now the floating code runs a 10-20% slower then the IQ one.
I notice some vmov, vstr operations .. I guess that explain what I said before.
I downloaded the ARM DSP library: is there a speed report about IQ, float operation?
Thanks a lot for your support
Michele
#6
Posted 02 March 2012 - 01:03 PM
vmov can be moving data between FPU <-> Integer register, as well as FPU <->FPU.
Regarding speed, do you mean instruction timing? This is documented in Technical Reference Manual (TRM)
http://infocenter.ar...439c/index.html (table 7.1)
If you are referring to the speed of the DSP functions, I don't have this information.
(At far as I know we use Q15, Q31, single precision floating point in the DSP library).
regards,
Joseph
#7
Posted 05 March 2012 - 03:21 PM
Thanks
Michele
#8
Posted 09 March 2012 - 09:08 PM
The information available on public domain is limited.
There are some information available. For example:
http://www.emcu.it/S...journal_1_1.pdf
http://www.embedded-...Johnson_ARM.pdf
I know that this might not be exactly what you want, but you can generate the data using instruction set simulator in Keil MDK if needed.
regards,
Joseph














