Login

Important information

This site uses cookies to store information on your computer. By continuing to use our site, you consent to our cookies.

ARM websites use two types of cookie: (1) those that enable the site to function and perform as required; and (2) analytical cookies which anonymously track visitors only while using the site. If you are not happy with this use of these cookies please review our Privacy Policy to learn how they can be disabled. By disabling cookies some features of the site will not work.

ARM Community: Cortex M4 FPU against fixed point math - ARM Community

Jump to content

Page 1 of 1
  • You cannot start a new topic
  • You cannot reply to this topic

Cortex M4 FPU against fixed point math Rate Topic: ***** 1 Votes

#1 User is offline   tuttoaldoc 

  • Member
  • Pip
  • Group: Members
  • Posts: 4
  • Joined: 01-March 12

Posted 01 March 2012 - 10:03 AM

Hi,
I'm working on 3 cortex, a STM32F027, cortex M3, a TI Cortex M4 and an Infineon Cortex M4.
I would like to move from a TI C2000 TMS320F2810 (fixed point 32bit core) to an M4 to control a 3 phase power bridge.
My algorithms nowadays work in fixed point math, IQ22, and are based for 98% on simple multiplications and some sine/cosine calculations: PI, PID, Pll, low pass filter, notch, ..
I ported the algorithm in the cortex mainly redifining the IQmpy, moltiplication, and the IQsin, sine calculation first in fixed point then in floating point.
I was expencitng to have a speed improvment running in floating because every multiplication in fixed math requires a shift while in floating I don't need the shift but I'm exeriencing a dramatic slow down of the algorithm running in floating point.
I'm doing my test in IAR.
I checked the assembler and I verified the compiler is using the floating point.
My only explenation is that the FPU doesn't have, as far as I know, direct access to the CPU registers so every multiplication in FPU requires 2 loads to the FPU registers and another load to move the result to the CPU register.
Is there anybody that can confirm me that?
Thank you very much
michele
0

#2 User is offline   Joseph Yiu 

  • Regular Contributor
  • PipPipPip
  • Group: Members.
  • Posts: 217
  • Joined: 01-March 10

Posted 01 March 2012 - 10:56 AM

Hi Michele,

Without looking at the code I cannot be sure what is happening, but maybe the switching between the IQ22 and single precision is possibly the main issue. It is not just copying the data from integer register to floating point register and back, as you will also need to add in the exponent and sign bit, and the IEEE754 single precision format use 23 bits rather than 22 bits, so you might have additional shift operations there.

Can you change all the operations to single precision floating point?

regards,
Joseph
1

#3 User is offline   tuttoaldoc 

  • Member
  • Pip
  • Group: Members
  • Posts: 4
  • Joined: 01-March 12

Posted 01 March 2012 - 01:38 PM

dear joseph,
I'll post a couple of example with the assembler.
In my case I'm completely switching all my code from IQ22 to float and I verified it is using the floating point.
Let's me make a "simple" question: is it true that the FPU doesn't have direct access to the core registers so to perform an operation in the FPU I have to load the data from the CPU registers to the FPU and back?
I check the manual but I'm not sure if I understood right: in the assembler it seems to me to see some load operations.
Thank you very much for your help
michele
1

#4 User is offline   Joseph Yiu 

  • Regular Contributor
  • PipPipPip
  • Group: Members.
  • Posts: 217
  • Joined: 01-March 10

Posted 01 March 2012 - 03:16 PM

Hi Michele,

Yes, you are correct.

The floating instructions operates on the floating point register bank. There are instructions to transfer floating point data to/from memory. So in theory the floating point data do not have to go through the integer register bank at all. But when mixing with IQ22 or fixed point, which (assumed) are processed in the integer registers, then it has to be transferred and converted between the two register bank. Instructions to convert between floating point and fixed point are available. So even the conversion is needed it shouldn't be too much worst.

The instruction set of the Cortex-M4 floating point unit can be found in this pdf document:
http://infocenter.ar...tex_m4_dgug.pdf
or from ARM Infocenter:
http://infocenter.ar.../help/index.jsp
-> Developer Guides and Articles
-> Software Development
-> Cortex-M4 Devices Generic User Guide

Potentially there are other areas that can make the performance worst
- accidentally used double precision data/functions
- Compiler/run-time library setting (e.g. hard VFP vs soft VFP)

regards,
Joseph
0

#5 User is offline   tuttoaldoc 

  • Member
  • Pip
  • Group: Members
  • Posts: 4
  • Joined: 01-March 12

Posted 02 March 2012 - 11:00 AM

Dear Joseph
I verified that there was a cast error in my algorythms moving from IQ math to float: now the floating code runs a 10-20% slower then the IQ one.
I notice some vmov, vstr operations .. I guess that explain what I said before.
I downloaded the ARM DSP library: is there a speed report about IQ, float operation?
Thanks a lot for your support
Michele
1

#6 User is offline   Joseph Yiu 

  • Regular Contributor
  • PipPipPip
  • Group: Members.
  • Posts: 217
  • Joined: 01-March 10

Posted 02 March 2012 - 01:03 PM

Hi Michele,

vmov can be moving data between FPU <-> Integer register, as well as FPU <->FPU.

Regarding speed, do you mean instruction timing? This is documented in Technical Reference Manual (TRM)
http://infocenter.ar...439c/index.html (table 7.1)

If you are referring to the speed of the DSP functions, I don't have this information.
(At far as I know we use Q15, Q31, single precision floating point in the DSP library).
regards,
Joseph
0

#7 User is offline   tuttoaldoc 

  • Member
  • Pip
  • Group: Members
  • Posts: 4
  • Joined: 01-March 12

Posted 05 March 2012 - 03:21 PM

Yes, I know ARM provides Q15,Q31 and single precision floating point libraries. I mean if there is any comparison of speed between the execution time of those library in Q15, Q31 and floating maybe on sinewave calculation or PID, ..
Thanks
Michele
0

#8 User is offline   Joseph Yiu 

  • Regular Contributor
  • PipPipPip
  • Group: Members.
  • Posts: 217
  • Joined: 01-March 10

Posted 09 March 2012 - 09:08 PM

Hi Michele,

The information available on public domain is limited.
There are some information available. For example:

http://www.emcu.it/S...journal_1_1.pdf

http://www.embedded-...Johnson_ARM.pdf

I know that this might not be exactly what you want, but you can generate the data using instruction set simulator in Keil MDK if needed.
regards,
Joseph
0

Share this topic:


Page 1 of 1
  • You cannot start a new topic
  • You cannot reply to this topic