»
Quick Links
Page 1 of 1
VLD1 differences between each other
#1
Posted 30 October 2012 - 12:53 PM
Hi Everybody!
There are three instruction VLD1 in armeabi-v7a:
- VLD1 (multiple single elements) on page A8-898
- VLD1 (single element to one lane) on page A8-900
- VLD1 (single element to all lanes) on page A8-902
Does anybody know which differences between each others?
Also how compiler choose which type of VLD1 is it, because syntax seems completely equal.
Thanks in advance.
There are three instruction VLD1 in armeabi-v7a:
- VLD1 (multiple single elements) on page A8-898
- VLD1 (single element to one lane) on page A8-900
- VLD1 (single element to all lanes) on page A8-902
Does anybody know which differences between each others?
Also how compiler choose which type of VLD1 is it, because syntax seems completely equal.
Thanks in advance.
#2
Posted 30 October 2012 - 04:30 PM
VLD1 (multiple single elements) performs 1-4 sequential 64-bit loads to 1-4 64-bit NEON registers. It's like a normal load multiple instruction.
VLD1 (single element to one lane) loads a single 8, 16, or 32-bit value to one lane of a vector. A lane is one element.
VLD1 (single element to all lanes) is like the above but it copies the load into all of the lanes, so the entire vector is updated.
The syntax isn't really the same, because you use different notations for the registers in the register list. To update the entire vector with a vector load you use the vector name, like d0. To update one lane in the vector with a scalar load you subscript the lane number in the vector, like d0[1]. To update every lane with one scalar load you use the index notation without an index number, like d0[].
Let's say that the address you're loading from contains the following, and register r0 points to it (is set to 0x0):
So this is what the code would do:
VLD1 (single element to one lane) loads a single 8, 16, or 32-bit value to one lane of a vector. A lane is one element.
VLD1 (single element to all lanes) is like the above but it copies the load into all of the lanes, so the entire vector is updated.
The syntax isn't really the same, because you use different notations for the registers in the register list. To update the entire vector with a vector load you use the vector name, like d0. To update one lane in the vector with a scalar load you subscript the lane number in the vector, like d0[1]. To update every lane with one scalar load you use the index notation without an index number, like d0[].
Let's say that the address you're loading from contains the following, and register r0 points to it (is set to 0x0):
0x0: 0x01 0x1: 0x23 0x2: 0x45 0x3: 0x67 0x4: 0x89 0x5: 0xAB 0x6: 0xCD 0x7: 0xEF
So this is what the code would do:
// r0 = r1 = 0x0
mov r1, r0
vld1 { d0 }, [ r0 ]!
// d0 as an 8x8 vector = [ 0x01, 0x23, 0x45, 0x67, 0x89, 0xAB, 0xCD, 0xEF ]
// r0 = 0x8
vld1.u8 { d0[5] }, [ r1 ]!
// d0 as an 8x8 vector = [ 0x01, 0x23, 0x45, 0x67, 0x89, 0x01, 0xCD, 0xEF ]
// r1 = 0x1
vld1.u8 { d0[] }, [ r1 ]!
// d0 as an 8x8 vector = [ 0x23, 0x23, 0x23, 0x23, 0x23, 0x23, 0x23, 0x23 ]
// r1 = 0x2
Share this topic:
Page 1 of 1















