Login

Important information

This site uses cookies to store information on your computer. By continuing to use our site, you consent to our cookies.

ARM websites use two types of cookie: (1) those that enable the site to function and perform as required; and (2) analytical cookies which anonymously track visitors only while using the site. If you are not happy with this use of these cookies please review our Privacy Policy to learn how they can be disabled. By disabling cookies some features of the site will not work.

ARM Community: NEON vdiv.f32 syntax - ARM Community

Jump to content

Page 1 of 1
  • You cannot start a new topic
  • You cannot reply to this topic

NEON vdiv.f32 syntax Rate Topic: ***-- 1 Votes

#1 User is offline   Microcan 

  • Member
  • Pip
  • Group: Members
  • Posts: 5
  • Joined: 17-April 12

Posted 17 April 2012 - 09:35 AM

I am (re)coding a 3D math library with inline NEON assembly for iOS using the Apple LLVM compiler 3.1.

I get an error message on the following instruction:

"vdiv.f32 q0, q1, q2 \n\t"

VFP single or double precision register expected -- `vdiv.f32 q0,q1,q2'

According to the 'Assembler Reference' page 4-76 you should specify a single precision register. The following code works:

"vdiv.f32 s0, s4, s8 \n\t"
"vdiv.f32 s1, s5, s9 \n\t"
"vdiv.f32 s2, s6, s10 \n\t"

I am confused because now the divide is not computed in parallel, which was the reason to use inline assembly.

Also the following instructions work as expected:

// component wise add
"vadd.f32 q0, q1, q2 \n\t"

// component wise subtract
"vsub.f32 q0, q1, q2 \n\t"

// component wise multiply
"vmul.f32 q0, q1, q2 \n\t"


Why do I get an error message on the vdiv and not on the vadd, vsub and vmul? Is this a compiler error?





0

#2 User is offline   webshaker 

  • Regular Contributor
  • PipPipPip
  • Group: Members
  • Posts: 220
  • Joined: 07-October 10

Posted 17 April 2012 - 10:10 AM

There is no NEON VDIV instruction !

for most NEON/Vpf instruction the register define the unit used,

Vxxx.f32 s0, s1, s2 	// Vpf 1 float operation
Vxxx.f64 d0, d1, d2 	// Vpf 1 double operation
Vxxx.f32 d0, d1, d2 	// NEON 2 float operation
Vxxx.f32 q0, q1, q2 	// NEON 4 float operation



But there is no NEON division.

You must have a look to VRECPE
this instruction return a estimation of the reciprocal value. The precision of the result is 8 bit.

Etienne.

This post has been edited by webshaker: 17 April 2012 - 12:04 PM

When you have eliminated the impossible, whatever remains, however improbable, must be the truth
1

#3 User is offline   Microcan 

  • Member
  • Pip
  • Group: Members
  • Posts: 5
  • Joined: 17-April 12

Posted 17 April 2012 - 10:38 AM

Do you mean that these instructions don't exist:
VDIV.f32 d0, d1, d2 // NEON 2 float operation
VDIV.f32 q0, q1, q2 // NEON 4 float operation

And these instruction exist:


VADD.f32 d0, d1, d2 // NEON 2 float operation
VADD.f32 q0, q1, q2 // NEON 4 float operation



Why is it not mentioned in the documentation? Does the divider use to much space on chip?


So you have to trade speed for accuracy?


0

#4 User is offline   webshaker 

  • Regular Contributor
  • PipPipPip
  • Group: Members
  • Posts: 220
  • Joined: 07-October 10

Posted 17 April 2012 - 12:04 PM

View PostMicrocan, on 17 April 2012 - 10:38 AM, said:

Do you mean that these instructions don't exist:
VDIV.f32 d0, d1, d2 // NEON 2 float operation
VDIV.f32 q0, q1, q2 // NEON 4 float operation


Yes that's exactly what I mean !

View PostMicrocan, on 17 April 2012 - 10:38 AM, said:

And these instruction exist:
VADD.f32 d0, d1, d2 // NEON 2 float operation
VADD.f32 q0, q1, q2 // NEON 4 float operation


Yes that's correct !

View PostMicrocan, on 17 April 2012 - 10:38 AM, said:

Why is it not mentioned in the documentation? Does the divider use to much space on chip?


This is clearly mentioned in the documentation.
To be exact, this os not mentioned that NEON have VDIV instruction !

View PostMicrocan, on 17 April 2012 - 10:38 AM, said:

So you have to trade speed for accuracy?


In fact Yes and No.
There is a small code that allow you to make very accurate division.

vrecpe.f32         	d1, d5 
vrecps.f32         	d2, d1, d5 
vmul.f32           	d1, d1, d2 
vrecps.f32         	d2, d1, d5 
vmul.f32           	d5, d1, d2



You can then decide if you want a very fast division, or a very accurate one !

Etienne

This post has been edited by webshaker: 17 April 2012 - 12:05 PM

When you have eliminated the impossible, whatever remains, however improbable, must be the truth
1

#5 User is offline   Microcan 

  • Member
  • Pip
  • Group: Members
  • Posts: 5
  • Joined: 17-April 12

Posted 17 April 2012 - 01:20 PM

Thanks,


Still don't see any difference in the documentation however. VADD, VSUB and VDIV are mentioned on the same page.
It does not mention you can use quad registers. Am I using the wrong documentation? Is NEON != VFP instructions?

See documentation

The following instructions compile and work on the iPhone and iPad hardware.

// component wise add
"vadd.f32 q0, q1, q2 \n\t"

// component wise subtract
"vsub.f32 q0, q1, q2 \n\t"



0

#6 User is offline   webshaker 

  • Regular Contributor
  • PipPipPip
  • Group: Members
  • Posts: 220
  • Joined: 07-October 10

Posted 17 April 2012 - 01:37 PM

I see the problem

Use this PDF documentation instead
http://infocenter.ar...406c/index.html

chapter A8.8.312
It's said "Encoding T1/A1 VFPv2, VFPv3, VFPv4"

VDIV is not a NEON instruction.

Vpf and NEON are not the same computing unit.
ARM have decided to unify the instruction syntax but the two unit are very different !!!

Etienne
When you have eliminated the impossible, whatever remains, however improbable, must be the truth
1

#7 User is offline   Microcan 

  • Member
  • Pip
  • Group: Members
  • Posts: 5
  • Joined: 17-April 12

Posted 17 April 2012 - 01:51 PM

Thanks, again, just started with NEON yesterday, its twenty years since I used assembly. Using inline NEON assembly for simple vector math already pays off. Also saw some very interesting instructions to code out a complete loops and make it even faster. Thinking I move some code from the GPU to NEON so they work in parallel.
1

#8 User is offline   webshaker 

  • Regular Contributor
  • PipPipPip
  • Group: Members
  • Posts: 220
  • Joined: 07-October 10

Posted 17 April 2012 - 03:21 PM

Did you had a Acorn Archimedes ?
When you have eliminated the impossible, whatever remains, however improbable, must be the truth
0

#9 User is offline   Microcan 

  • Member
  • Pip
  • Group: Members
  • Posts: 5
  • Joined: 17-April 12

Posted 17 April 2012 - 05:03 PM

View Postwebshaker, on 17 April 2012 - 03:21 PM, said:

Did you had a Acorn Archimedes ?



No I used to program microcontrollers in assembly to control all kinds of machines. Safety software for boilers etc. Those little guys with a minimal amount of ROM / RAM. To clean the program memory you had to give them a UV sunbath. I those days you had to program you own 16 bit multiply and divide etc. I did a good job because they still manufacture thousands of boilers with the majority of the code 15 years old.


0

Share this topic:


Page 1 of 1
  • You cannot start a new topic
  • You cannot reply to this topic