Login

Important information

This site uses cookies to store information on your computer. By continuing to use our site, you consent to our cookies.

ARM websites use two types of cookie: (1) those that enable the site to function and perform as required; and (2) analytical cookies which anonymously track visitors only while using the site. If you are not happy with this use of these cookies please review our Privacy Policy to learn how they can be disabled. By disabling cookies some features of the site will not work.

ARM Community: Vectorizing Compiler - ARM Community

Jump to content

Page 1 of 1
  • You cannot start a new topic
  • You cannot reply to this topic

Vectorizing Compiler CortexA8 Rate Topic: -----

#1 User is offline   Dave1024 

  • Member
  • Pip
  • Group: Members
  • Posts: 25
  • Joined: 30-April 10

Posted 29 June 2010 - 06:38 AM

Hi,

Please see the following tool chain

CPP=arm-none-linux-gnueabi-gcc
SWS=-march=armv7-a -mtune=cortex-a8 -mfpu=neon -mfloat-abi=softfp -flax-vector-conversions
Target is beegle board

How can i disable the vectorization.
If i give the above tool chain, it will create a default vectorized code for the given C source
if i write the NEON C intrinsics then will the compiler overrides its optimization and use the programmer neon direction.

Please help me to solve the doubts

This post has been edited by Dave1024: 29 June 2010 - 06:38 AM

0

#2 User is offline   hitlin37 

  • Member
  • Pip
  • Group: Members
  • Posts: 15
  • Joined: 18-December 09

Posted 30 June 2010 - 04:10 AM

well,just remove -vector or -flax-vector-conversions because unless and untill u don't mention your compiler to vectorize,by default ur comiler will never vectorize ur code.(as far as i know)

<<<<<<<<<<<


View PostDave1024, on Jun 29 2010, 07:38 AM, said:

Hi,

Please see the following tool chain

CPP=arm-none-linux-gnueabi-gcc
SWS=-march=armv7-a -mtune=cortex-a8 -mfpu=neon -mfloat-abi=softfp -flax-vector-conversions
Target is beegle board

How can i disable the vectorization.
If i give the above tool chain, it will create a default vectorized code for the given C source
if i write the NEON C intrinsics then will the compiler overrides its optimization and use the programmer neon direction.

Please help me to solve the doubts

0

#3 User is offline   scott 

  • Regular Contributor
  • PipPipPip
  • Group: Members.
  • Posts: 207
  • Joined: 05-October 06

Posted 30 June 2010 - 11:07 AM

View PostDave1024, on Jun 29 2010, 06:38 AM, said:

How can i disable the vectorization.


I can't tell what version of gcc you are using from the information above (gcc --version), but in recent versions, using '-O3' implies '-ftree-vectorize'. Are you using '-O3'?

If you want to disable vectorization then you probably want to use '-fno-tree-vectorize'.

I'm curious: why do you want to disable vectorization?
0

#4 User is offline   Dave1024 

  • Member
  • Pip
  • Group: Members
  • Posts: 25
  • Joined: 30-April 10

Posted 01 July 2010 - 11:43 AM

Dear scott,

I am looking for the solution of following problem

I try to develop an image viewer application to view RGB and Bitmap.

My target is Beegle board and Kernel is Angstrom

I have two set of source code (All versions are in Fixed Point)

Version 1 : Pure ANSI C

Only the C code is considered
The make file is given below

OBJFILES = # objfiles.o
INCLUDE = -I./../Header
ABC=arm-none-linux-gnueabi-gcc
PQR=-march=armv7-a -mtune=cortex-a8
CFLAGS = -O3 -Wall $(INCLUDE) $(PQR)
HOME = IMViewer.so
$(HOME) : $(OBJFILES)
$(ABC) -o $@ $^ $(CFLAGS) -fPIC -L. -shared
${ABC} -o IMView $(OBJFILES) -ldl -L. -lIMViewer
install $(HOME) ../lib/
mv IMView ../lib/
rm -rf IMViewer.so
@echo "C Version completed..."
%.o : %.c
$(ABC) -c $(CFLAGS) $< -o $@

Version 2 : C + Neon Intrinsics

In this version i use the neon intrinsics where ever applicable
and the resulting source is mixed with C and Neon intrinsics
The make file used for compiling this is given below

OBJFILES = # objfiles.o
INCLUDE = -I./../Header
ABC=arm-none-linux-gnueabi-gcc
PQR=-march=armv7-a -mtune=cortex-a8 -mfpu=neon -mfloat-abi=softfp -flax-vector-conversions
CFLAGS = -O3 -Wall $(INCLUDE) $(PQR)
HOME = IMViewer.so
$(HOME) : $(OBJFILES)
$(ABC) -o $@ $^ $(CFLAGS) -fPIC -L. -shared
${ABC} -o IMView $(OBJFILES) -ldl -L. -lIMViewer
install $(HOME) ../lib/
mv IMView ../lib/
rm -rf IMViewer.so
@echo "C Neon Version completed..."
%.o : %.c
$(ABC) -c $(CFLAGS) $< -o $@

Hope u get my real set up

Then in my IMViewer application i take the performence of both versions
the code fragment is given below

#include<stdio.h>
#include <sys/time.h>
long st = 0,et = 0;
struct timeval First, Last;
void main(int argc, char**argv)
{
gettimeofday(&First, NULL);
st = (First.tv_sec * 1000) + (First.tv_usec/1000) ; /* Time In Mill Second Unit */

IMViewer();

gettimeofday(&Last, NULL);
et = (Last.tv_sec * 1000) + (Last.tv_usec/1000) ; /* Time In Mill Second Unit */
printf("The Effective time in Millisecond is %d",(et - st));


}

This code fragment is working in common for two versions to take the time to complete .

But sadly the performance for version 2 is not good. It is near to C version. I don't spot
what is the problem here !

I did the checking the following cases and it is Ok

1. OS Kernel is NEON enabled (OMAP 3530)
2. In the generated assembly files of NEON code there is assembly instruction of neon intrinsics

Following doubts still exists

1. Will i can configure the L1 and L2 cache size of OS kernel?
2. Is there any hand written assembly is needed for enable the Neon processor of beegle board
3. My gcc version is Red Hat 3.4.4-2

Kindly look in to my issue and suggest one solution !

Rgds
Dave
0

#5 User is offline   scott 

  • Regular Contributor
  • PipPipPip
  • Group: Members.
  • Posts: 207
  • Joined: 05-October 06

Posted 01 July 2010 - 01:29 PM

View PostDave1024, on Jul 1 2010, 11:43 AM, said:

[...]
Then in my IMViewer application i take the performence of both versions
the code fragment is given below

[...]
void main(int argc, char**argv)
{
	 gettimeofday(&First, NULL);
	 [...]
}


I'd suggest using 'times()' or 'getrusage(RUSAGE_SELF, ...)' instead of 'gettimeofday()' since gettimeofday will be measuing other processes, too, not just yours. The other functions should be less suseptible to interference from outside sources and give you more consistent numbers. But it may not make much difference on a quiet system.

And the pedant in me says, that should be 'int main() { ... return 0; }' -- 'void main() { ... }' isn't really legal. But that's not causing any timing difference.

View PostDave1024, on Jul 1 2010, 11:43 AM, said:

But sadly the performance for version 2 is not good. It is near to C version. I don't spot
what is the problem here !


Since you're specifying -O3 for the C version, gcc may be doing vectoriztion. You can add -ftree-vectorizer-verbose=2 and look for 'LOOP VECTORIZED' in gcc's messages. Or you can 'arm-...-objdump -d' the .o file (or even the executable?) and look for the vector instructions.


View PostDave1024, on Jul 1 2010, 11:43 AM, said:

Following doubts still exists

1. Will i can configure the L1 and L2 cache size of OS kernel?

No, the kernel should enable and deal with the caches -- that's part of it's job.

View PostDave1024, on Jul 1 2010, 11:43 AM, said:

2. Is there any hand written assembly is needed for enable the Neon processor of beegle board

That's also the kernel's job. If you executed a NEON instruction with a kernel that had NEON disabled, I'd expect your process to killed by SIGILL. 'uname -a' will tell us the kernel version number.

View PostDave1024, on Jul 1 2010, 11:43 AM, said:

3. My gcc version is Red Hat 3.4.4-2

That looks like the host compiler. I should have said 'arm-none-linux-gnueabi-gcc --version'
0

#6 User is offline   Dave1024 

  • Member
  • Pip
  • Group: Members
  • Posts: 25
  • Joined: 30-April 10

Posted 07 July 2010 - 08:58 AM

Dear scott,

the tool chain version is given below

(2007q3-51) 4.2.1

will i get neon performance by this version of tool chain !

I have one doubt will my code can enter the cache memory..?

The OS critical module can use the cache all the time.?

Dave
0

#7 User is offline   scott 

  • Regular Contributor
  • PipPipPip
  • Group: Members.
  • Posts: 207
  • Joined: 05-October 06

Posted 12 July 2010 - 12:22 PM

View PostDave1024, on Jul 7 2010, 09:58 AM, said:

the tool chain version is given below

(2007q3-51) 4.2.1

will i get neon performance by this version of tool chain !


I expect that if you are using '-O3 -mcpu=cortex-a8 -mfpu=neon -mfloat-abi=softfp' that 2007q3-51 will try to vectorize. You can use objdump to find out how well it is doing. You should probably consider using 2010q1 as it's 2.5 years newer.

View PostDave1024, on Jul 7 2010, 09:58 AM, said:

I have one doubt will my code can enter the cache memory..?

The OS critical module can use the cache all the time.?


Your code will share the cache with other processes and the OS. If the OS and other processes aren't executing much then your code should stay in the cache (if it fits).
0

Share this topic:


Page 1 of 1
  • You cannot start a new topic
  • You cannot reply to this topic