Login

Important information

This site uses cookies to store information on your computer. By continuing to use our site, you consent to our cookies.

ARM websites use two types of cookie: (1) those that enable the site to function and perform as required; and (2) analytical cookies which anonymously track visitors only while using the site. If you are not happy with this use of these cookies please review our Privacy Policy to learn how they can be disabled. By disabling cookies some features of the site will not work.

ARM Community: ARM V7 memory barrier - ARM Community

Jump to content

Page 1 of 1
  • You cannot start a new topic
  • You cannot reply to this topic

ARM V7 memory barrier Rate Topic: -----

#1 User is offline   buyit 

  • Member
  • Pip
  • Group: Members
  • Posts: 17
  • Joined: 10-June 11

Posted 18 June 2011 - 02:09 AM

Hello everyone, i have questions about memory barrier which is implemented in Linux for ARM V7

first let's suppose that we are using the ARM cortex-A9 two core CPU for example.

what is the exact meaning of instruction DMB ?

global int a = 0; global int b = 0;

CPU 0

str #0x1, a

DMB

str #0x1, b

CPU 1

WAIT(b==1) ; wait on flag

DMB

ldr r0, a

the result should be: cpu1:r0 == 0x1 .

there are two DMB both on CPU0 and CPU1. i want to understand the deep hardware operations done by ARM for both of DMB instructions.
(1). for DMB of CPU0, does ARM guarantee "str #0x1, a" will be executed before "str #0x1, b" ? as i know, this can be guaranteed by DSB, but not DMB, right?
(2). the actions done by CPU0 are: CPU0 execute "str #0x1, a" , suppose "a" is already in cache, so the content in cache is updated for "a", at this moment, if there is interrupt occurs, maybe the cache line for "a" will be flushed to main memory. and let's suppose this is a very slow operation, and then DMB instruction is executed, what does this instruction do ??? does DMB wait for cache line flush for "a" complete? and at last CPU0 execute "str #0x1, b", suppse "b" is already in cache, so the content in cache is updated for "b".
after that , is it possible that "a" is in CPU0 's write buffer and does not reach to main memory yet, and "b" is in cache line and is actually ready for use. then CPU1 will get chance to get the old content of "a" in main memroy? as from ARM ARM , the DMB will not guarantee the write buffer operation, which will be done by DSB.
(3). for CPU1, use DMB will not guarantee execute " WAIT(b==1) " instruction before "ldr r0, a", right ? it only guarantee the memory access by "WAIT(b==1)"will be in front of
"ldr r0, a", right ? so if CPU1 out-of-order execute "ldr r0, a" before "WAIT(b==1) ", how can it wait for content of "a" after content of "b" by using DMB instruction? if the out-of-order is allowed for CPU1 here, "ldr r0,a " should get content of "a" directly because the DMB instruction has not even been issued yet.
(4). in linux kernel 2.6.35 bnx2.c, function bnx2_rx_int(), there is a memory barrier as below:

hw_cons = bnx2_get_hw_rx_cons(bnapi);
sw_cons = rxr->rx_cons;
rmb();
while (sw_cons != hw_cons) {
.....
....

};

the rmb() is to guarantee code inside while loop will not be speculative prefetched by cpu before we get hw_cons and sw_cons , right?
without this memory barrier, will cpu touch data which is inside while loop before the program get the right status permission to get into while loop?(i.e. sw_cons==hw_cons , but we have executed the instructions inside while loop and have already touched some data struct which is protected by sw_cons!=hw_cons check)?

the instruction rmb() is DSB for ARM V7. so i think DMB is not enough here because it can only guarantee the memory access order, right?

This post has been edited by buyit: 20 June 2011 - 02:10 AM

1

#2 User is offline   buyit 

  • Member
  • Pip
  • Group: Members
  • Posts: 17
  • Joined: 10-June 11

Posted 18 June 2011 - 02:16 AM

the format of my example code is not correct, i don't konw why, retry here,
it is just a very simple example which is from Barrier_Litmus_Tests_and_Cookbook_A08 , CPU0 write 0x1 to variable a, and DMB, and then write 0x1 to variable b, which is a flag polling by CPU1. CPU1 is pulling b, and then DMB, at last read content of variable a.

This post has been edited by buyit: 20 June 2011 - 02:12 AM

0

#3 User is offline   buyit 

  • Member
  • Pip
  • Group: Members
  • Posts: 17
  • Joined: 10-June 11

Posted 18 June 2011 - 02:19 AM

sorry for mistake,

global int a = 0;
global int b = 0;

CPU0 :
str #0x1, a
DMB
str #0x1, b

CPU1 :
WAIT(b==1) ; wait on flag
DMB
ldr r0, a



-1

#4 User is offline   buyit 

  • Member
  • Pip
  • Group: Members
  • Posts: 17
  • Joined: 10-June 11

Posted 20 June 2011 - 02:05 AM

any discussion is appreciated.
0

#5 User is offline   isogen74 

  • Super Contributor
  • PipPipPipPip
  • Group: Members
  • Posts: 1097
  • Joined: 20-March 07

Posted 22 June 2011 - 12:52 PM

The DMB instruction just ensures ordering of memory transactions to memory types where there are not normally any guarantees.

Memory transactions before the DMB must be committed before those after the DMB. What committed means depends on the memory type.

http://forums.arm.co...807-dmb-vs-dsb/

Quote

(2) Does DMB wait for cache line flush for "a" complete?


No. Once a cached location has hit cache it has been "committed" as far as the memory system is concerned, and the processor can keep executing.

Quote

Then CPU1 will get chance to get the old content of "a" in main memory?


No. Provided that the memory is "inner shared" and the cores are running in SMP mode the hardware enforces the coherency, so they will get consistent data.

Quote

(3) For CPU1, use DMB will not guarantee execute " WAIT(b==1) " instruction before "ldr r0, a", right ?


Any real implementation of "WAIT(b == 1)" must involve a memory load in a loop because you are testing a memory location written by another thread. The DMB will therefore guarantees that the WAIT runs first (because it contains a LDR) before the "LDR r0, a" occurs.

Quote

If CPU1 out-of-order execute "ldr r0, a" before "WAIT(b==1)"


Out of ordering in the hardware has to conform to the architectural requirements of any barrier instructions in the instruction stream.

Quote

(4) in linux kernel 2.6.35 bnx2.c, function bnx2_rx_int(), there is a memory barrier as below <snip>. The rmb() is to guarantee code inside while loop will not be speculative prefetched by cpu before we get hw_cons and sw_cons , right?


The RMB ensure that reads before barrier are complete before any code after the barrier makes a read. It is nothing to do with code prefecth as far as I can see,
althogh I'm not a Linux expert. Most importantly perhals, the rmb is also a compiler barrier so it stops the compiler "benig clever" and optimizing things away or reordering things when hardware is involved.
When optimizing software, consider that the quickest code to run is the bit you removed from the call path.
2

#6 User is offline   buyit 

  • Member
  • Pip
  • Group: Members
  • Posts: 17
  • Joined: 10-June 11

Posted 22 June 2011 - 02:11 PM

@isogen74

thank you very much.

0

#7 User is offline   wendyc 

  • Member
  • Pip
  • Group: Members
  • Posts: 1
  • Joined: 23-December 11

Posted 23 December 2011 - 06:56 AM

Quote

global int a = 0;
global int b = 0;

CPU0 :
str #0x1, a
DMB
str #0x1, b

CPU1 :
WAIT(b==1) ; wait on flag
DMB
ldr r0, a


(1)If CPU1 is an in-order execution CPU, is it necessary to add DMB before ldr r0, a ? I mean if there's no DMB, r0 = 0x1 in CPU1 still should be guaranteed, right?

(2)So adding DMB before ldr r0, a is because CPU1 is an out-of-order execution CPU?


0

#8 User is offline   buyit 

  • Member
  • Pip
  • Group: Members
  • Posts: 17
  • Joined: 10-June 11

Posted 23 December 2011 - 08:24 AM

View Postwendyc, on 23 December 2011 - 06:56 AM, said:

(1)If CPU1 is an in-order execution CPU, is it necessary to add DMB before ldr r0, a ? I mean if there's no DMB, r0 = 0x1 in CPU1 still should be guaranteed, right?

(2)So adding DMB before ldr r0, a is because CPU1 is an out-of-order execution CPU?





i use Cortex-A9 dual core CPU to test memory barrier instructions. this CPU has "out-of-order instruction execution" and "weakly order memory interface", i think both of these features need DMB instructinos, not only because of the out-of-order execution.
0

Share this topic:


Page 1 of 1
  • You cannot start a new topic
  • You cannot reply to this topic