Quick Links
NEON vdiv.f32 syntax
#1
Posted 17 April 2012 - 09:35 AM
I get an error message on the following instruction:
"vdiv.f32 q0, q1, q2 \n\t"
VFP single or double precision register expected -- `vdiv.f32 q0,q1,q2'
According to the 'Assembler Reference' page 4-76 you should specify a single precision register. The following code works:
"vdiv.f32 s0, s4, s8 \n\t"
"vdiv.f32 s1, s5, s9 \n\t"
"vdiv.f32 s2, s6, s10 \n\t"
I am confused because now the divide is not computed in parallel, which was the reason to use inline assembly.
Also the following instructions work as expected:
// component wise add
"vadd.f32 q0, q1, q2 \n\t"
// component wise subtract
"vsub.f32 q0, q1, q2 \n\t"
// component wise multiply
"vmul.f32 q0, q1, q2 \n\t"
Why do I get an error message on the vdiv and not on the vadd, vsub and vmul? Is this a compiler error?
#2
Posted 17 April 2012 - 10:10 AM
for most NEON/Vpf instruction the register define the unit used,
Vxxx.f32 s0, s1, s2 // Vpf 1 float operation Vxxx.f64 d0, d1, d2 // Vpf 1 double operation Vxxx.f32 d0, d1, d2 // NEON 2 float operation Vxxx.f32 q0, q1, q2 // NEON 4 float operation
But there is no NEON division.
You must have a look to VRECPE
this instruction return a estimation of the reciprocal value. The precision of the result is 8 bit.
Etienne.
This post has been edited by webshaker: 17 April 2012 - 12:04 PM
#3
Posted 17 April 2012 - 10:38 AM
VDIV.f32 d0, d1, d2 // NEON 2 float operation
VDIV.f32 q0, q1, q2 // NEON 4 float operation
And these instruction exist:
VADD.f32 d0, d1, d2 // NEON 2 float operation
VADD.f32 q0, q1, q2 // NEON 4 float operation
Why is it not mentioned in the documentation? Does the divider use to much space on chip?
So you have to trade speed for accuracy?
#4
Posted 17 April 2012 - 12:04 PM
Microcan, on 17 April 2012 - 10:38 AM, said:
VDIV.f32 d0, d1, d2 // NEON 2 float operation
VDIV.f32 q0, q1, q2 // NEON 4 float operation
Yes that's exactly what I mean !
Microcan, on 17 April 2012 - 10:38 AM, said:
VADD.f32 d0, d1, d2 // NEON 2 float operation
VADD.f32 q0, q1, q2 // NEON 4 float operation
Yes that's correct !
Microcan, on 17 April 2012 - 10:38 AM, said:
This is clearly mentioned in the documentation.
To be exact, this os not mentioned that NEON have VDIV instruction !
Microcan, on 17 April 2012 - 10:38 AM, said:
In fact Yes and No.
There is a small code that allow you to make very accurate division.
vrecpe.f32 d1, d5 vrecps.f32 d2, d1, d5 vmul.f32 d1, d1, d2 vrecps.f32 d2, d1, d5 vmul.f32 d5, d1, d2
You can then decide if you want a very fast division, or a very accurate one !
Etienne
This post has been edited by webshaker: 17 April 2012 - 12:05 PM
#5
Posted 17 April 2012 - 01:20 PM
Still don't see any difference in the documentation however. VADD, VSUB and VDIV are mentioned on the same page.
It does not mention you can use quad registers. Am I using the wrong documentation? Is NEON != VFP instructions?
See documentation
The following instructions compile and work on the iPhone and iPad hardware.
// component wise add
"vadd.f32 q0, q1, q2 \n\t"
// component wise subtract
"vsub.f32 q0, q1, q2 \n\t"
#6
Posted 17 April 2012 - 01:37 PM
Use this PDF documentation instead
http://infocenter.ar...406c/index.html
chapter A8.8.312
It's said "Encoding T1/A1 VFPv2, VFPv3, VFPv4"
VDIV is not a NEON instruction.
Vpf and NEON are not the same computing unit.
ARM have decided to unify the instruction syntax but the two unit are very different !!!
Etienne
#7
Posted 17 April 2012 - 01:51 PM
#8
Posted 17 April 2012 - 03:21 PM
#9
Posted 17 April 2012 - 05:03 PM
webshaker, on 17 April 2012 - 03:21 PM, said:
No I used to program microcontrollers in assembly to control all kinds of machines. Safety software for boilers etc. Those little guys with a minimal amount of ROM / RAM. To clean the program memory you had to give them a UV sunbath. I those days you had to program you own 16 bit multiply and divide etc. I did a good job because they still manufacture thousands of boilers with the majority of the code 15 years old.
















