Login

Important information

This site uses cookies to store information on your computer. By continuing to use our site, you consent to our cookies.

ARM websites use two types of cookie: (1) those that enable the site to function and perform as required; and (2) analytical cookies which anonymously track visitors only while using the site. If you are not happy with this use of these cookies please review our Privacy Policy to learn how they can be disabled. By disabling cookies some features of the site will not work.

ARM Community: VLD1 differences between each other - ARM Community

Jump to content

Page 1 of 1
  • You cannot start a new topic
  • You cannot reply to this topic

VLD1 differences between each other Rate Topic: ****- 1 Votes

#1 User is offline   Green 

  • Member
  • Pip
  • Group: Members
  • Posts: 18
  • Joined: 08-October 12

Posted 30 October 2012 - 12:53 PM

Hi Everybody!

There are three instruction VLD1 in armeabi-v7a:
- VLD1 (multiple single elements) on page A8-898
- VLD1 (single element to one lane) on page A8-900
- VLD1 (single element to all lanes) on page A8-902

Does anybody know which differences between each others?
Also how compiler choose which type of VLD1 is it, because syntax seems completely equal.

Thanks in advance.
0

#2 User is offline   Exophase 

  • Regular Contributor
  • PipPipPip
  • Group: Members
  • Posts: 118
  • Joined: 20-July 10

Posted 30 October 2012 - 04:30 PM

VLD1 (multiple single elements) performs 1-4 sequential 64-bit loads to 1-4 64-bit NEON registers. It's like a normal load multiple instruction.
VLD1 (single element to one lane) loads a single 8, 16, or 32-bit value to one lane of a vector. A lane is one element.
VLD1 (single element to all lanes) is like the above but it copies the load into all of the lanes, so the entire vector is updated.

The syntax isn't really the same, because you use different notations for the registers in the register list. To update the entire vector with a vector load you use the vector name, like d0. To update one lane in the vector with a scalar load you subscript the lane number in the vector, like d0[1]. To update every lane with one scalar load you use the index notation without an index number, like d0[].

Let's say that the address you're loading from contains the following, and register r0 points to it (is set to 0x0):

0x0: 0x01
0x1: 0x23
0x2: 0x45
0x3: 0x67
0x4: 0x89
0x5: 0xAB
0x6: 0xCD
0x7: 0xEF



So this is what the code would do:

// r0 = r1 = 0x0
mov r1, r0

vld1 { d0 }, [ r0 ]!
// d0 as an 8x8 vector = [ 0x01, 0x23, 0x45, 0x67, 0x89, 0xAB, 0xCD, 0xEF ]
// r0 = 0x8

vld1.u8 { d0[5] }, [ r1 ]!

// d0 as an 8x8 vector = [ 0x01, 0x23, 0x45, 0x67, 0x89, 0x01, 0xCD, 0xEF ]
// r1 = 0x1

vld1.u8 { d0[] }, [ r1 ]!

// d0 as an 8x8 vector = [ 0x23, 0x23, 0x23, 0x23, 0x23, 0x23, 0x23, 0x23 ]
// r1 = 0x2


2

#3 User is offline   Green 

  • Member
  • Pip
  • Group: Members
  • Posts: 18
  • Joined: 08-October 12

Posted 31 October 2012 - 11:07 AM

Exophase!

Thnaks again for your fantastic answer!
1

Share this topic:


Page 1 of 1
  • You cannot start a new topic
  • You cannot reply to this topic