Login

Important information

This site uses cookies to store information on your computer. By continuing to use our site, you consent to our cookies.

ARM websites use two types of cookie: (1) those that enable the site to function and perform as required; and (2) analytical cookies which anonymously track visitors only while using the site. If you are not happy with this use of these cookies please review our Privacy Policy to learn how they can be disabled. By disabling cookies some features of the site will not work.

ARM Community: DS-5 OR RVDS, which one to use for profiling code - ARM Community

Jump to content

Page 1 of 1
  • You cannot start a new topic
  • You cannot reply to this topic

DS-5 OR RVDS, which one to use for profiling code Rate Topic: -----

#1 User is offline   AmrNeonCoder 

  • Member
  • Pip
  • Group: Members
  • Posts: 5
  • Joined: 02-June 12

Posted 13 June 2012 - 09:19 PM

I used RVDS in the past and it was great for profiling test code. I was able to see how many cycles each instruction takes, data hazards etc. I was very satisfied with it. BUT, RVDS was pretty buggy in that regard: there was no way to profile code that uses unaligned memory access (it just hangs, there was no reply from arm at all if there was a way to fix it), and profiling for neon code was non-existent (every neon instruction takes 1 cycle in RVDS profiler).

I tried to use DS5 trial and CE version and wasn't even able to figure out how to even do any profiling at all. Debugging ... I must be dreaming, it was world of pain to get anything working and I think it wasn't working properly (followed all kinds of guides, pinned at the top of the forum for example). It's nothing even close to experience that I had with RVDS: I had profiling results within 20 minutes after I registered for trial. If it matters, even for android my primary dev environment is VS2009 and I debug native code on windows mobile devices if I need to, all that clunky eclipse feels like ... **censored** :)


THE QUESTION:

should I keep on wasting time trying to get DS5 profiling working (I would like to be able to profile on emulator, or on real device), Or its Streamline will be useless for me: does it show the same detail as profiler that comes with RVDS or not? For some reason I think that Streamline is more like profiler that comes with XCode and iPhone sdk: it shows sampling usage of the full app but not opcode level profiling info like RVDS (e.g. I could see each instruction and how much cycles it took and any register waits if there were any).

If DS5 isn't good for that, maybe somebody can recommend me alternative solution? My main target is the android phone, although I build my code almost for all devices that run on ARM.
Basically, what's the best tool for profiling arm code? I would prefer some RTSMs so I could profile for different CPUs (like with RVDS), but if there is no good alternative I could as well buy any development board or anything that could provide me opcode level profiling info. Please advise anybody! Thanks

Ideally, I would like something similar to RVDS but fully working: 1) unaligned memory access fixed, preferably running some kind of OS so that 2) I could use files that I use for testing (I had 250MB input files that I passed for my test runs and in RVDS I had to embed all that data to final executable, which was really annoying compared to all platforms where I run my code and where I was able to use files one way or the other). 3) Normal neon profiling info, and not that 1cpi nonsense that RVDS profiler shows. Something similar to ARM Cortex-A8 cycle counter online tool
1

#2 User is offline   AmrNeonCoder 

  • Member
  • Pip
  • Group: Members
  • Posts: 5
  • Joined: 02-June 12

Posted 13 June 2012 - 10:27 PM

On top of that I'd like to add... maybe I'm complete retard, but I find that "Setting up an Android target from "ARM DS-5 Using ARM Streamline" is the dumbest ever guide. It reminds of my friends that call for help and then they tell me what they see on on their screen and talk to me as if I had their screen in front of my eyes.

Quote

In the kernel configuration menu, use the arrow keys to navigate to the required submenu and press Enter


WTF IS THAT BS?! Seriously, I'm trying to press arrow keys, but all I see is the web page moving. Where the hell am I supposed to press arrow keys??? That rediculous mentioning of the location of gator source... WTF IS THAT??? In older version it mentioned installdir/arm... now it's something else, but I still don't get, where the hell it's supposed to be! Is that instal dir of DS-5, right? What about ds5-ce then!
To be able to use Streamline on any of devices on my desk (I have like 50 phones lying around), do I need to reflash phones and build android myself???!?!? Is that what that guide says??...
I've never built android or any kernel modules, but it's strange to assume that somebody who simply wants to use profiler needs to only rebuild kernel and no freaking info, like it's a helloworld task that everybody knows by heart... No wonder there is no singly clue on the web how to set it up and use it and get any results from it... At least I'm not able to find anything at all!

I'm very sorry, perhaps that guide missed to mention that mind reading class was a prerequisite.
0

#3 User is offline   SamEllis 

  • Contributor
  • PipPip
  • Group: Members.
  • Posts: 52
  • Joined: 28-November 11

Posted 14 June 2012 - 09:08 PM

Thank you for your feedback on the DS-5 product and its documentation. We'll take a look and see if any improvements can be made.

As to your questions about profiling, your guess that Streamline is sample based is correct. Streamline uses hardware performance monitors in the ARM CPU and in the rest of the system to count events in the system over a period of time.

The closest we have to the RVDS profiling functionality within DS-5 is the Trace view. This requires a hardware target that is capable of generating trace data, as well as a DSTREAM unit to allow the debugger to control the hardware. Here are some links to the Trace functionality that is provided within DS-5:

http://infocenter.ar...j/BABJCFCH.html
http://infocenter.ar...j/CHDHCGFH.html

From the screenshot you will see that the Trace view provides a heat map showing where execution time has been spent. This heat map is based on instruction counts rather than cycles, so may not provide as much detail as you would like. The debugger does allow for cycle accurate trace on some hardware targets, and in this case the table in the bottom of the view contains an additional Cycles column showing the number of cycles associated with each instruction. Sorry, the documentation and screenshot do not currently show that.

The RTSMs that are supplied with DS-5 simulate the processor at the instruction level rather than the micro-architectural level. This allows the model to run very fast, but at the cost of not being able to provide accurate timing information. It is not advised to try using these for profiling purposes, and you would be better to use real hardware instead.
0

#4 User is offline   SamEllis 

  • Contributor
  • PipPip
  • Group: Members.
  • Posts: 52
  • Joined: 28-November 11

Posted 14 June 2012 - 09:52 PM

You are developing applications for Android. That makes a difference and I need to correct my previous reply. Tracing instructions within an operating system that performs task switching requires tracking of which tasks are executing at any given time (so that the debugger can know which traced instructions belong to which task). DS-5 Debugger does not currently support process-specific tracing, and so it unfortunately it may not solve your problem. Tracing works well for bare-metal systems (no operating system) or for the operating system itself. Within DS-5, Streamline is the best tool we have for profiling Android applications.
0

#5 User is offline   scott 

  • Regular Contributor
  • PipPipPip
  • Group: Members.
  • Posts: 210
  • Joined: 05-October 06

Posted 15 June 2012 - 08:19 AM

As Sam says, Streamline is a sampling-based profiler. Trace-based profiling (as the RVDS profiler used) doesn’t scale up well to multi-core GHz+ usage and very few development boards have high-speed external trace ports. Streamline isn’t usually used on models because the models of advanced cores don’t have an accurate sense of time (especially memory access) or accurate performance counters. Streamline is based on the Linux kernel profiling hooks. It requires a Linux target that has the kernel configured correctly and the gator driver (kernel module) and daemon installed.

The documentation does rather assume that you are familiar with configuring and building a Linux kernel (this also applies to Android). It could be improved by pointing at some kernel building instructions (perhaps <http://infocenter.ar...aqs/ka4134.html>). Rebuilding the kernel requires a Linux host.

The documentation link you posted goes to an obsolete 5.6 version of the documents. The latest version <http://infocenter.ar...h/BABECIDJ.html> is not much different but does describe using Help > ARM Extras... to find the supplied files.

If you have a target that is supported, then using a Linaro Android build would probably be the easiest way to go since they already have gator built in (http://www.linaro.org/downloads/1205). There's also a possibly useful blog enrey about using Streamline on a Galaxy Nexus http://www.linaro.or...ing-aosp-4-0-4/. Using Streamline with an arbitrary production Android phone would require rebuilding and reflashing and is not trivial.

Streamline will show you down to the instruction level where the samples are being taken (where the time is being spent). It doesn’t show register interlocks explicitly but it does show where the time is being spent which depends on the the interlocks and much more, probably more important things.

While Streamline is open source it’s not trivial to port to other OSes or bare-metal usage.

The RVDS profiler is no longer under development in part because of the issues mentioned above.
0

#6 User is offline   AmrNeonCoder 

  • Member
  • Pip
  • Group: Members
  • Posts: 5
  • Joined: 02-June 12

Posted 15 June 2012 - 08:20 PM

Sam, Ellis, thank you very much for replies.


Quote

You are developing applications for Android. That makes a difference


I develop code that runs almost on all major mobile OSs (including other obscure targets like some set-top boxes etc). I know what parts of code take CPU (from sampling-based profiler). This way I know what I need to work on and I write simple test apps that take some test input files and run that CPU-intensive code on the data. This way I'm able to run that same test on every device including RVDS profiler. In RVDS I can see good stats about instructions and cycles. I know that cycle info isn't very correct, but it's more or less indicative for some parts of code. It is very useful to me at least. I remember a case where RVDS showed me some badly generated code with extreme register inter-dependencies in a very performance critical loop. I had to manually add temporary variables for intermediates and that gave me like 3-5% boost overall on entire encoder simply by changing c-code.


Quote

The documentation does rather assume that you are familiar with configuring and building a Linux kernel (this also applies to Android)


I've built linux or bsd kernels, but I've never built android. I'm using windows workstation, so probably it's a world of pain to build android on windows. From documentation it wasn't absolutely clear if I need to rebuild kernel, rebuild entire image and re-flash a phone, or if I needed src simply to be able to build that required module so that I could add it to existing phone. Now I understand that I need full rebuild to get it running. Is that correct? Or I simply can build a kernel and copy it to device and boot my kernel instead of the original one.


Quote

This requires a hardware target that is capable of generating trace data, as well as a DSTREAM unit to allow the debugger to control the hardware


Can you please give some suggestions on capable hw and what's the price of that DSTREAM unit (my guess it's like a few thousands, right?).


Also, I have a question about RTSMs. I downloaded eval version of FastModels and built myself Cortex-a8 example model using VS2008. The model that I built and models that come with RVDS have something in common:
I ABSOLUTELY can't find a way to load unaligned memory, e.g. this code will never work (it won't load unaligned int, and it won't load that using old-style unaligned load either):
__asm main(){ ldr r0, [sp, #2] }


First of all, when accessing unaligned memory with RTSMs it jumps to PC 0x00000010 and after that executes all these junk instructions showing millions of exceptions. Setting unaligned access and unaligned trap bits in cp15 doesn't make any difference. That seems like a bug with FastModels. I tried to run that unaligned access example in profiler (where it just starts showing millions of exceptions and nothing happens), and I tried to run that code in 2 available debuggers (one ghetto-looking debugger that comes with rvds and the other better one what comes with FastModels) and in these debuggers reading unaligned memory jumps to pc=0x10 or something like that.


Actually, that problem made me look for alternatives to RVDS: I'm tired to write junk code to avoid that alignment issue only for RVDS profiler, also, I completely can't work with some code that actually needs to use unaligned access for performance reasons (basically, it's faster to read two 16-bit shorts from unaligned address and then use top and bottom parts instead of loading two registers and using them).


Quote

There's also a possibly useful blog entry about using Streamline on a Galaxy Nexus http://www.linaro.or...ing-aosp-4-0-4/


Thanks for the link, seems like it's the best way for me to get running on Galaxy Nexus. I'll try that when I have some free time.
If it's not trivial to get Streamline on regular phone, what's the recommended HW to work with Streamline? I guess, it would be best if I could use some android phone for that so I could profile entire app instead of limited test apps. Is there any phone that has compatible gpu with streamline?
0

#7 User is offline   AmrNeonCoder 

  • Member
  • Pip
  • Group: Members
  • Posts: 5
  • Joined: 02-June 12

Posted 15 June 2012 - 08:33 PM

Isn't armcc should be the best tool to provide that kind of info? Internally, it needs to weigh alternative instruction sequences based on their execution speeds and also based on availability of dependent data that come from previous instructions. On top of that armcc does all that based on configured cpu or architecture.

That would be nice to have some kind of switch so that it could add extra instruction analysis in generated asm listing, or process asm file and generate similar info. Here I put simple asm example armcc should be able to give similar info!
0

#8 User is offline   scott 

  • Regular Contributor
  • PipPipPip
  • Group: Members.
  • Posts: 210
  • Joined: 05-October 06

Posted 18 June 2012 - 08:53 AM

View PostAmrNeonCoder, on 15 June 2012 - 08:33 PM, said:

That would be nice to have some kind of switch so that it could add extra instruction analysis in generated asm listing, or process asm file and generate similar info.


Armasm has a confugurable message about interlocks, see http://infocenter.ar...g/CIAGIDIH.html
0

#9 User is offline   scott 

  • Regular Contributor
  • PipPipPip
  • Group: Members.
  • Posts: 210
  • Joined: 05-October 06

Posted 18 June 2012 - 09:14 AM

View PostAmrNeonCoder, on 15 June 2012 - 08:20 PM, said:

Now I understand that I need full rebuild to get it running. Is that correct? Or I simply can build a kernel and copy it to device and boot my kernel instead of the original one.
...
If it's not trivial to get Streamline on regular phone, what's the recommended HW to work with Streamline? I guess, it would be best if I could use some android phone for that so I could profile entire app instead of limited test apps.


It's definitely non-trivial. It's possible that some production phone has a kernel that is correctly configured, but rebuilding and replacing the kernel with one correctly configured for gator (the target part of Streamline) is probably the only way to be sure. Also building gator requires matching kernel headers that may be difficult to find for a production phone. Installing/running gator requires root access. The details of how to copy the kernel, etc. to the phone will vary from phone to phone.

It's almost certainly easier to use Android on some development board. Linaro has Android for at least i.MX53, Pandaboard, Snowball and Origen. There are probably many other boards with Android support that I'm not as familiar with.

View PostAmrNeonCoder, on 15 June 2012 - 08:20 PM, said:

Is there any phone that has compatible gpu with streamline?


They are not phones but all of those boards I mention above have a GPU. With the correct drivers, Streamline can do GPU profiliing on Mali GPUs (for example, Snowball and Origen). [But I'm getting a bit "outside my area of expertise" here. (That's something my father used to say when he didn't know what the hell he talking about.)]
0

#10 User is offline   scott 

  • Regular Contributor
  • PipPipPip
  • Group: Members.
  • Posts: 210
  • Joined: 05-October 06

Posted 18 June 2012 - 09:20 AM

View PostAmrNeonCoder, on 15 June 2012 - 08:20 PM, said:

Also, I have a question about RTSMs.[...]


It's probably best to ask this in a separate thread (or ask support-sw@arm.com). My first impression is that 0x10 is the Data Abort exception vector and the MMU can be configured to cause data aborts on unaligned accesses.
0

Share this topic:


Page 1 of 1
  • You cannot start a new topic
  • You cannot reply to this topic