Login

Important information

This site uses cookies to store information on your computer. By continuing to use our site, you consent to our cookies.

ARM websites use two types of cookie: (1) those that enable the site to function and perform as required; and (2) analytical cookies which anonymously track visitors only while using the site. If you are not happy with this use of these cookies please review our Privacy Policy to learn how they can be disabled. By disabling cookies some features of the site will not work.

ARM Community: Bootloader problem in ARM926ejs - ARM Community

Jump to content

Page 1 of 1
  • You cannot start a new topic
  • You cannot reply to this topic

Bootloader problem in ARM926ejs Rate Topic: ***** 1 Votes

#1 User is offline   FGirault 

  • Member
  • Pip
  • Group: Members
  • Posts: 9
  • Joined: 24-January 11

Posted 24 June 2011 - 12:24 PM

Hi all,
I am experiencing strange (quite) random issue while writing a bootloader for ARM926ejs.

The bootloader's work is as follow:
- it allows to update the application via a serial port, writing the application code to flash.
(By using checksums, I am sure that data that has been written to flash is correct)
- it loads the application from flash to RAm then starts it by calling the reset handler.

The basic memory map looks like the following in ram (when application has been loaded):
0x80000000 - 0x80040000: bootloader
0x80140000 - 0x80200000: application
The application program's reset handler is located at 0x80141000.

The bootloader's reset handler and application's reset handler perform the same work as follow
(except that adresses are different for MMU table):
1. clean D cache
2. flush D cache
3. flush I cache
4. Disable MMU table
5. Initialize MMU (with the TTB) mainly:
-> Code section to Read-Only / Write Through (RO+WT)
-> Stacks/Heaps Read-Write / Write Back (RW+WB)
-> Hardware registers to Read-Write / NonCache-NonBuffered (RW+NCNB)
-> Free memory space to Read-Write / Write Back (RW+WB)
MMU Translation Table located at 0x80800000 in both cases
6. Enable MMU
7. Set CPU to Supervisor Mode
8. Initialize stack pointer for each CPU mode (Supervisor, Abort, Undefined, Fast Interrupt etc...)
9. Go to system mode
10. Do some lowlevel initialization, then start C main function
11. Then run the program (initialize peripherals etc...)

Steps 1-10 are performed in assembly.

In order to start from the RESET handler, I declare a function pointer as follow:
typedef void (*RESET_HANDLER)();
RESET_HANDLER reset = 0x800141000; <- knowing that my reset handler's first instruction is here.
then start:
reset();

The interest in doing this way is that the application program should be completely independant
from the boot program. It completely reinitializes the board, stacks etc... for its purpose.
The merit is that if some low-level implementation is to be optimized later, we can update the
application easily via the bootloader, and the appli will benefit from lowlevel ameliorations
in the drivers etc.
The bootloader is just a little program that allows to update the application program.

Also, I can compile and run the application or the bootloader just fine in a debugger with
exactly the same memory mapping as they have after loading from flash.
The application program's update via the bootloader is doing fine.

Now, the problem is that when writing the application to flash, the application aborts when it
has been loaded to RAM then started by the bootloader.
The application is loaded into RAM, I can confirm it with the debugger and the checksums.

When I got the abort exception I took a pick at the R14 register, and it was pointing to the
first instruction that initializes an hardware peripheral register. I dont really know if it
is relevant, or even a true information, since the board since to be in a quite unstable state
when in the abort mode (though the abort handler is just an infinite loop).

Also, I touched the initialization assembly in order to find a solution, and actually at first
I was not performing the following steps:
1. clean D cache
2. flush D cache
3. flush I cache

Then by adding that, AND by changing the code's section attributes in the MMU table:
-> Code section to Read-Write / Write Through (RW+WT)

in both the bootloader program and the application, I was able to start the application from the
Flash.

I am not sure that adding cache cleaning/flushing and changing code section parameter really had
such a decisive impact (Id like to verify it, but i couldnt check before posting).
Actually at first I added some flushing/cleaning in the reset handler.
Then it seemed to work, and I was able to start application from flash.
THen I recompiled everything to be sure, and I got the same problem occuring again.
Then I modified the flush/clean code, because I discovered that the code was already there, so I
took my own flushing/cleaning away, and used the one I had been provided with.
When I used this new flush/clean code, I was not able to overwrite the application program
with the debugger in the same debug session ("error: Could not write at address 0x80147A94" or so...).
If I closed the debug program and restarted it, I could overwrite the program without problem.
Then I thought without believing it: "maybe this is because the code section is read-only?"
So I changed the code read only to read-write, and everything is working fine now.

I really don't understand, does someone have any idea of what is going on here?

I was also wondering if this could have something to do with an alignment problem?
All my stacks and sections are 8bytes aligned, since it seemed the easiest to handle with
ARM926, and the whole code is at the same memory location when executing from flash or with the
debugger. But maybe there is something else I should care about?

Any idea? Did I miss something?

(I tried to sumup the problem as much as I could so I didn't give much details.
Ill post some source tomorrow if someone wants to take a look)

This post has been edited by FGirault: 25 June 2011 - 03:05 PM

0

#2 User is offline   FGirault 

  • Member
  • Pip
  • Group: Members
  • Posts: 9
  • Joined: 24-January 11

Posted 24 June 2011 - 03:04 PM

(sorry this is the first time I use the forum, so at first I did post an empty message by mistake)
0

#3 User is offline   scott 

  • Regular Contributor
  • PipPipPip
  • Group: Members.
  • Posts: 210
  • Joined: 05-October 06

Posted 25 June 2011 - 12:03 PM

Are the caches on when your bootloader is copying the application from Flash to RAM?

If yes, then you should clean the D cache (if it's write-back) and flush the I-cache before you jump to the application, because the copying happens as data. During the copying the application area needs to be writeable. Inconsistent cache problems can be hard or impossible to see in a debugger, because a debugger probably reads the code as data which might not match what the processor sees from the I cache.

Also, when the application starts, is are the MMU and caches still on from the bootloader? If so the application's "reset handler" may need to take that into account -- for instance when you disable the MMU there needs to be code at the same physical address as the virtual address it's just been executing from.

Is your abort exception a data abort or prefetch abort? You can only tell the difference by which exception vector was used. If the abort was a data abort caused by the MMU then R14 will point a couple instructions past the offending instruction (see the TRM or ARM ARM for exact details) and more information (such as the address being accessed) is available in the DFAR and DFSR (again see the docs).
0

#4 User is offline   FGirault 

  • Member
  • Pip
  • Group: Members
  • Posts: 9
  • Joined: 24-January 11

Posted 25 June 2011 - 02:55 PM

Hi scott,
Thx for your reply.

The abort exception I get is a Data Abort exception so I located the offending instruction by looking at the address pointed by (R14-8).
It was a write access to a peripheral register.

I did some additional checks today, and I discovered that my MMU table is not always mapped as I expect it to be.
I fill up a table by giving
[size], [logical address], [physical address], [attributes]
in an MMU table, and I was told only to be careful that the TTB base address should be 16kB aligned, and the page addresses 4kB aligned, but after MMU table initialization, I took a pick at the entries in the TTB and Coarse TTB, and it is not always what I was expecting (for example, the entries concerning the stack memory do not appear in the TTB but it should point to an entry in the CTTB - but I have to double check that).
In any case, even if all my entries are there in the table, I also discovered that sections with size > 1MB have to be 1MB aligned.
Since I was not aware of that before, that might have lead to some problem (ill review the memory-map at first so that this kind of problem cant occur).

Concerning flushing the D cache, yes until now the caches were ON.
But today I tried to put the attributes "NCNB" on the Application Code destination area, but I still got the same problem.

The weirdest thing is that changing the Read-Only attribute of the code section to Read-Write seems to solve the issue (at least I could reproduce the issue by putting the code back to Read-Only).
What is weird is that this attribute applies to a memory area that is not the one where the offending abort exception seems to occur.
Also the peripheral registers are somewhere between 0x43F00000 - 0x7000000 and the code section is between 0x81400000 - 0x81800000 in RAM. It should be unrelated.
But it might not be so weird if the TTB and CTTB are wrongly set.
Still that does not explain why I dont get any trouble when running from the debugger.

I will investigate more, and post back the results.

The truth should be in there....

PS: Please note that I modified the first post's boot sequence, I had inverted a few things. Now it reflects well what the boot sequence is doing. There is no access to the stacks in MMU initialization.

This post has been edited by FGirault: 25 June 2011 - 03:09 PM

0

#5 User is offline   isogen74 

  • Super Contributor
  • PipPipPipPip
  • Group: Members
  • Posts: 1098
  • Joined: 20-March 07

Posted 26 June 2011 - 10:54 AM

> Still that does not explain why I dont get any trouble when running from the debugger.

Halting mode debug presents a "debug illusion" to the user which looks like it does the right thing, but the steps to get there are not always exactly what the processor would do if there was no debugger. There are normally a lot of hacks done by the ICE logic around cache flushing to make sure the debugger does the right thing, for example.

Because of these background operations needed to maintain the illusion it tends to rather intolerant of misconfiguration of the MMU / TLB settings.

This post has been edited by isogen74: 26 June 2011 - 10:55 AM

When optimizing software, consider that the quickest code to run is the bit you removed from the call path.
1

#6 User is offline   FGirault 

  • Member
  • Pip
  • Group: Members
  • Posts: 9
  • Joined: 24-January 11

Posted 27 June 2011 - 05:28 AM

I've been checking a little deeper today:

The offending instruction is:

LDR  	R12,[R3, #+0]


And the reason is that R3 contains 0x008021A9 where it should contain the address of a GPIO register.

When running from the debugger, at the same line, R3 contains 0x53FCC000 which is the expected value.



The following table:

static volatile struct gpio * gpioRegTbl[N_GPIO_CH] = { &GPIO1, &GPIO2, &GPIO3, &GPIO4};
is not properly initiated when starting from the starter. It is properly initialized when starting from the debugger.


I join my initialization code.

If you see any mistake, please comment.

Attached File(s)


1

#7 User is offline   FGirault 

  • Member
  • Pip
  • Group: Members
  • Posts: 9
  • Joined: 24-January 11

Posted 27 June 2011 - 09:14 AM

I am still gathering clues:
It seems that the __iar_data_init2 in initarm.s79 that should be initializing the .data section is not working properly (it didn't initialize anything).
Still searching why.
Still couldnt confirm if the MMU is completely properly set (but it seems OK now that all entries are 1MB aligned).

This post has been edited by FGirault: 27 June 2011 - 09:17 AM

0

#8 User is offline   FGirault 

  • Member
  • Pip
  • Group: Members
  • Posts: 9
  • Joined: 24-January 11

Posted 27 June 2011 - 10:24 AM

I took a pick at the memory map to find that "__iar_data_init2" function, and all I found is "__iar_data_init3".
So I did 2 modifications:
1. Use __iar_data_init3
2. Modified the linker file to add .data_init and other initialization sections next to the code section. It worked once. So in order to confirm that both modifications were needed, I modified back __iar_data_init3 to __iar_data_init2 and it didnt work anymore.

Then by doing only (2) it didnt work either. So I put back both, and now it doesnt work.

But it worked once, I saw that the variable had been initialized by the __iar_data_init3 function.





This post has been edited by FGirault: 27 June 2011 - 10:41 AM

0

#9 User is offline   FGirault 

  • Member
  • Pip
  • Group: Members
  • Posts: 9
  • Joined: 24-January 11

Posted 28 June 2011 - 09:17 AM

Hi again.<br>The problem seems to be coming from a bug in the program that loads the application to flash. I tried to write the application by another way, and it worked properly. I also tried to load the data directly to RAM from the serial port in the starter, and again it worked properly.<br>I am guessing that the bug is not obviously changing all data, but only occuring on particular cases, and without affecting the checksum. Maybe it is inverting bytes sometimes, and since the checksum is just a sum, that goes undetected.<br>Thank you for your answers, it helped to eliminate some interrogations.<br><br>
0

#10 User is offline   FGirault 

  • Member
  • Pip
  • Group: Members
  • Posts: 9
  • Joined: 24-January 11

Posted 29 June 2011 - 08:37 AM

And here it is: I got some misplaced code that is causing me some trouble. The loader had a bug and didnt take into account when there was a small non-continuity in the addresses. That put some code in the wrong place, but since everything had been written to flash, the checksum was right (it doesnt care about the code location).
It was happening just on a few bytes in the code (less than 128 bytes) so it was hard to detect .
Case solved
1

Share this topic:


Page 1 of 1
  • You cannot start a new topic
  • You cannot reply to this topic