Could someone help me to explain that behavior :
I use a sequence of 4096 instructions (target is TMS570/Cortex-R4F) :
str r0, [r8~#0]
str r1, [r8~#4]
str r3, [r8~#8]
When "dual-issue" mode is enabled (bits 28-31 of Auxiliary Control Register and bits 18-20 of Secondary Auxiliary Control Register are reset), this code (plus a few instructions bordering it) executes in 5162 clock cycles.
When "dual-issue" mode is disabled (same bits are set), this code executes in 4146 clock cycles !!!
I observe this phenomenon for both ARM and Thumb2 modes.
So when "dual-issue" mode is enabled, it seems that one pipeline stage is "sometimes" (once out of 4) waiting for dual words (thus introducing extra wait states) in order to process them by pairs, but I can't find any description of it.
Could someone help me to understand, please ? This is quite important for me, because I have to produce highly deterministic real-time software, and this kind of feature is hard to model...
Thanks for any help.
This post has been edited by Christophe31: 01 August 2011 - 08:55 AM