»
Quick Links
Page 1 of 1
Cycle penalty
#1
Posted 10 December 2010 - 04:29 PM
I am wondering where i can find the information regarding cycle penalties for cache misses.
For instance, if an instruction that is going to be executed is not in cache, how many cycles more it will take to execute? (I'm thinking that its execution time will be the number of cycles in the reference manual for this particular instruction, plus the cache-miss penalty)
Further, in an instruction like "str r3, [fp, #-16]", if the memory location ([fp, #-16]) is not in the cache, how many cycles more will it add to the execution time?
Sorry, didn't find this information in the datasheet..
Thanks.
For instance, if an instruction that is going to be executed is not in cache, how many cycles more it will take to execute? (I'm thinking that its execution time will be the number of cycles in the reference manual for this particular instruction, plus the cache-miss penalty)
Further, in an instruction like "str r3, [fp, #-16]", if the memory location ([fp, #-16]) is not in the cache, how many cycles more will it add to the execution time?
Sorry, didn't find this information in the datasheet..
Thanks.
#2
Posted 10 December 2010 - 10:30 PM
I'm afraid this will have the immortal answer "it depends" ...
Memory latency depends on the internal bus topology of the device, the clock rate of the bus relative to the CPU, the type of memory connected to the device, bus contention, etc.
If it's not on the data sheet, I'd suggest writing some benchmarks to find out (if it is an A-profile core and you can run Linux then LMbench3 includes some good memory benchmarks which give latency and bandwidth).
Pete
Memory latency depends on the internal bus topology of the device, the clock rate of the bus relative to the CPU, the type of memory connected to the device, bus contention, etc.
If it's not on the data sheet, I'd suggest writing some benchmarks to find out (if it is an A-profile core and you can run Linux then LMbench3 includes some good memory benchmarks which give latency and bandwidth).
Pete
When optimizing software, consider that the quickest code to run is the bit you removed from the call path.
Share this topic:
Page 1 of 1














