Quick Links
Meaning of ACTLR.smp
#1
Posted 20 December 2011 - 07:09 PM
Is it really important to have the SMP bit in ACTLR set to 1 (Cortex a9 MPcore)? What is a purpose of this bit? It is written in ARM website that this bit indicates whether processor takes responsibility for cache coherence or not. I thought that SCU had something to do with cache coherence (I read that it uses MESI algorithm for that). So will the cache be coherent if ACTLR.smp =0?
Thanks
#2
Posted 21 December 2011 - 08:24 AM
To use the hardware coherency support you need to do several things...
* Enable the SCU, through the SCU Control Register (can be done by any of the SCUs)
* Enable coherency management in the CPU, through the ACTLR.SMP bit (must be done by EACH CPU you want coherency for)
* Enable the MMU on the CPU (again, do this on each CPU)
* Mark the appropriate address regions as Normal, WB/WA, shareable
* Optionally set the ACTLR.FW bit
The story is a little different on the A15
#3
Posted 21 December 2011 - 09:40 AM
#4
Posted 21 December 2011 - 12:30 PM
If you have a multi-core A9 then it is usual for the secure bootstrap to eitherL
(1) enable SMP for all cores before handing over to non-secure, leaving the SMP-bit read-only in the non-secure world. This means you cannot run AMP, but that's fairly uncommon.
(2) set the NSACR.NS_SMP bit to 1, and let non-secure decide how to use SMP/AMP across the cores. But any secure software running has to cope with SMP/AMP changing beneath it's feet.
#5
Posted 21 December 2011 - 12:54 PM
The NSACR.NS_SMP bit is only available if your part if based on the r1p0 (or later) A9
The SCU registers can also be restricted to Secure access only... So you potentially have the same problem with the SCU enable as with the ACTLR.
#6
Posted 22 March 2012 - 06:35 AM
If I enable SMP config in Linux, it will set ACTLR.SMP=1.
Is it correct to set SCTLR.SMP=1 with single core of CA9 MPCore?
#8
Posted 22 March 2012 - 04:04 PM
However, the problems is that if I don't set ACTLR.SMP=1 in Linux SMP mode for single core CA9 MPcore, the kernel will hang in atomic lock loop when doing LDREX/STREX operations.
Is this normal or I missed something?
#9
Posted 23 March 2012 - 02:38 PM
So what does setting the SCTLR.SMP bit actually do? Well (assuming the SCU is enabled) it configures the core as being part of the inner-shareable domain. This affects all the regions you mark as Write-back/Write-allocated inner cacheable + shared in the translation tables.
* SCU enabled + SCTLR.SMP bit set
Inner WB/WA + shared regions treated as cacheable at L1, SCU maintains coherency between cores in cluster
* SCU disabled and/or SCTLR.SMP bit not set
Inner WB/WA + shared regions treated as NON-CACHEABLE. This is the same behaviour as on the Cortex-A8.
The Shareable attribute tells the processor whether _other_ processors/masters access the region, or if it's just this processor. For cores without any coherncy logic, marking a region as shareable means the processor will NOT be cached by the integrated caches. Regardless of what you set the inner cache policy as. This is because you've told it that another master might modify the region, and the core would have no way to detect this. So to be "safe" it won't cache the shared region.
For the MPCores, the shared regions are cached because the coherency can be maintained with the other cores in the cluster.
Why does this matter to you????
Well I'm guess that by setting CONFIG_SMP you are causing the kernel to mark cacheable memory as shared. Which will work fine as long as the SCTLR.SMP bit is set (which the kernel should do) and the other cores are suitably configured.
For mutexes/semaphores, you use the special LDREX/STREX instructions. These are there to allow you implement mutex/semaphore lock functions. Basically when you do a STREX it "checks" whether the location has changed since you read it with a LDREX. This checking can be done inside the core (Local Monitor) or in the memory system (Global Monitor). Basically any region which gets cached will only use the Local Monitor, regions that are not cached use the Global Monitor. Flipping the SCTLR.SMP bit therefore changes which Monitor you use for Inner WB/WA + Shared regions...
Problem is not all chips actually have a Global Monitor. So the STREX instructions which try to use the Global Monitor will just fail.
#10
Posted 23 March 2012 - 03:02 PM
One more question, do you mean ACP here by "Global Monitor"? or other common implementations?
#11
Posted 23 March 2012 - 04:12 PM
Imagine you had two processors (say an A9 and R4) in one chip. They share some data, and you want to use a mutex to control which of the two processors can access the data at once. You need some hardware support for ensuring that STREXs from processor can detect if the other got there first. That is the job of the Global Monitor. In my experience, the Global Monitor is usually part of the memory controller. That is if you have one - not all chips do.
#12
Posted 02 April 2012 - 08:18 PM
"In the Cortex-A15 processor, the L1 data cache and L2 cache are always coherent, for shared or non-shared data, regardless of the value of the SMP bit."
Does this mean that page-table shareability bit doesn't have any effect for data accesses in A15 (with regard to coherency maintenance inside A15 cluster) ?
If that is the case, then does all LDREX and STREX are checked against local and global monitors irrespective of page-table shareability bit and ACTLR.SMP bit settings ?
#13
Posted 02 April 2012 - 09:51 PM
Quote
Hmm there was some thread on this a few weeks back, and the shared bit did make a difference. I can't remember the details, but if you want data to be shared then set it as shared in the MMU. Anything else is coding to the microarchitecture of the core, not the "ARM Architecture"; when you move that kind of code to a different core it may well break in confusing and hard to debug ways, so you should always try and conform to the ARM ARM if possible.
This post has been edited by isogen74: 02 April 2012 - 09:52 PM
#14
Posted 19 April 2012 - 07:00 AM
引用框(ttfn @ 23 March 2012 - 04:12 PM)
Imagine you had two processors (say an A9 and R4) in one chip. They share some data, and you want to use a mutex to control which of the two processors can access the data at once. You need some hardware support for ensuring that STREXs from processor can detect if the other got there first. That is the job of the Global Monitor. In my experience, the Global Monitor is usually part of the memory controller. That is if you have one - not all chips do.
Hi, do you happen to know if the ARM versatile express A9x4 board has global monitor ? Because if I do atomic operations ( ldrex/strex ) on non-cached shared normal memory regions under smp mode, it just stuck forever on those instructions. If I remove the shared attribute, then it succeeds. However the memory controller manual (pl341) claims that it has 2 exclusive access monitors. I suppose this should be the global monitor here ?















