When developing an operating system on top of seL4 you do not have
the luxury of using a debugger such as gdb. Your best bet is a combination
of dprintf
and objdump
. If you are doing the project on a
Linux machine (using our linux toolchain) you can also use the backtrace function,
combined with a script for parsing objdump output to obtain a primitive stack trace.
Debugging Faults
All exceptions result in a context switch to the seL4 kernel. The kernel exports fault handling policies to user level by sending an IPC to the appropriate IPC endpoint (see Chapter 5 of the seL4 manual). For applications such as console_test, the assigned endpoint will usually be the IPC endpoint that SOS waits on in the syscall_loop. SOS on the other hand is initialised with no fault endpoint and hence is expected to ensure that it does not cause a fault.
SOS faults
Below is an example of code that triggers a page fault from the SOS main
function.
int main(void) { init_muslc(); /* bootinfo was set as an environment variable in _sel4_start */ char *bi_string = getenv("bootinfo"); ZF_LOGF_IF(!bi_string, "Could not parse bootinfo from env."); /* register the location of the unwind_tables -- this is required for * backtrace() to work */ __register_frame(&__eh_frame_start); seL4_BootInfo *boot_info; if (sscanf(bi_string, "%p", &boot_info) != 1) { ZF_LOGF("bootinfo environment value '%s' was not valid.", bi_string); } debug_print_bootinfo(boot_info); printf("\nSOS Starting...\n"); NAME_THREAD(seL4_CapInitThreadTCB, "SOS:root"); // cause a fault at 0xDEADBEEF *((char*)0xDEADBEEF) = 0;
When this code runs, the following output is displayed in the picocom
terminal:
SOS Starting... Caught cap fault in send phase at address (nil) while trying to handle: vm fault on data at address 0xdeadbeef with status 0x92000045 in thread 0xffffff807efd6400 "SOS:root" at address 0x407c50 With stack: 0x4e59c0: 0x0 0x4e59c8: 0x48966c 0x4e59d0: 0x4e5a78 0x4e59d8: 0x0 0x4e59e0: 0x0 0x4e59e8: 0x0 0x4e59f0: 0x5ea000 0x4e59f8: 0x4895ec 0x4e5a00: 0x4e5a78 0x4e5a08: 0x4ac025 0x4e5a10: 0x48965c 0x4e5a18: 0x0 0x4e5a20: 0x4e5af0 0x4e5a28: 0x0 0x4e5a30: 0x0 0x4e5a38: 0x0
What's going on here? First, a page fault is triggered by the
dereference of invalid address 0xDEADBEEF
. Then, seL4
attempts to send a page fault IPC to the fault handling endpoint of
SOS. However, SOS doesn't have a fault handling endpoint, as it
is set to the default value of 0, an empty slot in the cspace. As a
result, a double fault is sent, which means the thread is
stopped by the kernel, and in debug mode, the above output is
produced.
From the output, observe that SOS caused a vm fault with the following important values:
- The fault address, 0xdeadbeef.
- The exception status register, 0x92000045
- The address of the instruction that faulted, 0x407c50
The fault address
The fault address is the invalid address which caused a vm fault (a TLB translation error). This is 0xdeadbeef, the address we tried to write to. Note that invalid addresses will not always trigger a vm fault: if you are unlucky enough to accidentally derefence a mapped address no fault will occur.
The exception status register
The exception status register is referred to as far_el1
on 64-bit ARM architectures (aarch64) and DFSR
on 32-bit
ARM architectures (aarch32). To decode the value, look at section
D.10.4 of the ARMv8 Architecture reference
manual
The fault address register
The most useful address is the fault address register, which contains
the value of the instruction pointer at the time of the fault. This
value can then be looked up with the objdump
tool to to determine the context of fault.
Application faults
When an application thread is created, an appropriate fault handler IPC endpoint should be
set in the thread's TCB (see Chapter 5 of the seL4 manual). If the application causes a fault
and IPC will be sent to the assigned fault endpoint. Message registers provide important
information regarding the cause and location of the fault.
objdump
may then be used to determine the context of
fault.
Using objdump
Whenever you cause a fault, you should be able to see the current program
counter printed out. You can use objdump
to find where the offending code is.
Pass to objdump the location of the binary that faulted: in this case, it's the sos binary, located in the build directory:
aarch64-linux-gnu-objdump -Dlx build/projects/aos/sos/sos | less
You can now use the searching facility in less
to search
for the faulting address. In this case I find the following fragment
of output:
407c44: 97fff6dc bl 4057b4/home/alyons/aos-2019-test/build/../projects/aos/sos/src/main.c:516 407c48: d297dde0 mov x0, #0xbeef // #48879 407c4c: f2bbd5a0 movk x0, #0xdead, lsl #16 407c50: 3900001f strb wzr, [x0] /home/alyons/aos-2019-test/build/../projects/aos/sos/src/main.c:519 407c54: f9401ba1 ldr x1, [x29,#48]
We can see that the error has occurred on code at line 516 of main.c, exactly where we triggered a fault by trying to write to 0xDEADBEEF.
If the fault occurs in an application (such as console_test or sosh) rather than SOS, the following command should be used, with APP_NAME substituted for the name of the application you are debugging:
aarch64-linux-gnu-objdump -Dlx projects/aos/apps/APP_NAME/APP_NAME | less
More on objdump
objdump
is a very handy utility for working out exactly what is
where in an executable so you can work out what exactly is going wrong.
The two standard incantations for objdump are:
aarch64-linux-gnu-objdump -dl binary_file | less
and
aarch64-linux-gnu-objdump -lx binary_file | less
The first command (-dl) disassembles the text segment and shows you all the instructions and at what address. Using this information you can find out things such as:
-
What does my function compile to?
eg. Am I compiling the right function? Do the instructions make sense?
-
Why is this instruction crashing?
Look at the fsr for the type of fault, and use objdump to find the exact instruction causing the fault.
NB: In case it's not yet obvious, you will need to get up to speed on your ARM aarch64 assembly and not be afraid to get your hands dirty if you want to minimise the time you spend debugging.
The second objdump command (-lx) is useful for when addresses appear inside an object file but outside of the text segment. This is especially useful when debugging ELF loading. The -lx option displays section and symbol information. Further options can be added to dump data segments etc. man objdump is your friend.
Generating a Stack Trace
Warning: Backtrace only supported for the Linux toolchain
We have provided a function print_backtrace
in
SOS
which allows you to print a primitive stacktrace from
anywhere in SOS. Note that backtrace does not work across random
jumps, including from within muslc system calls.
You can call print_backtrace()
from within any arbitrary
place in your code. Backtrace will traverse the eh_frame
constructed by gcc
during compilation, which provides
stack offsets, and print out the addresses of each function in the
call stack (as long as gcc can recognise the function call) This can
be useful if you need to know what your code is doing.
Consider this example, where print_backtrace
is called in
the network_init
function. The output is as follows:
SOS Started! test_cspace@tests.c:53 Test cspace libsel4muslcsys: Error attempting syscall 233 run_tests@tests.c:116 Root CSpace test passed! test_cspace@tests.c:53 Test cspace run_tests@tests.c:124 Single level cspace test passed! test_cspace@tests.c:53 Test cspace libsel4muslcsys: Error attempting syscall 233 run_tests@tests.c:137 Double level cspace test passed! run_tests@tests.c:141 DMA test passed! Network init Backtracing stack PCs: 0x408fa4 0x409288 0x407a24 0x418fec
Usage of the backtrace script is shown below.
$ ./projects/aos/backtrace.py build/projects/aos/sos/sos Enter whitespace separated addresses, <Ctrl>+D to process and exit > > 0x408fa4 0x409288 0x407a24 0x418fec > > > > Processing.... -------------------------------------------------------------------------------- 0 408fa4 in print_backtrace 408f9c: 52800141 mov w1, #0xa // #10 408fa0: 94000868 bl 40b140408fa4: b9006ba0 str w0, [x29,#104] 408fa8: b9406ba0 ldr w0, [x29,#104] 408fac: 7100001f cmp w0, #0x0 -------------------------------------------------------------------------------- 1 409288 in network_init 409280: f9000fa0 str x0, [x29,#24] 409284: 97ffff38 bl 408f64 409288: d2a00022 mov x2, #0x10000 // #65536 40928c: d2b92821 mov x1, #0xc9410000 // #3376480256 409290: f9400fa0 ldr x0, [x29,#24] -------------------------------------------------------------------------------- 2 407a24 in main_continued 407a1c: 911f8000 add x0, x0, #0x7e0 407a20: 94000616 bl 409278 407a24: 900004a0 adrp x0, 49b000 407a28: 91112000 add x0, x0, #0x448 407a2c: 94021680 bl 48d42c -------------------------------------------------------------------------------- 3 418fec in utils_run_on_stack 418fe4: aa0203e0 mov x0, x2 418fe8: d63f0020 blr x1 418fec: 9100029f mov sp, x20 418ff0: aa0003e0 mov x0, x0 418ff4: f90027a0 str x0, [x29,#72]