This guide first introduces some low-level techniques for debugging an OS personality on the seL4 kernel. The second section describes the GDB support we have added to SOS to enable debugging of itself.
seL4 Low-level Debugging
When developing an operating system on top of seL4 you may not have
the luxury of using a debugger such as gdb. In this case, your best bet is a combination
of dprintf
and objdump
. If you are doing the project on a
Linux machine (using our linux toolchain) you can also use the backtrace function,
combined with a script for parsing objdump output to obtain a primitive stack trace.
Debugging Faults
All exceptions result in a context switch to the seL4 kernel. The kernel exports fault handling policies to user level by sending an IPC to the appropriate IPC endpoint (see Chapter 5 of the seL4 manual). For applications such as console_test, the assigned endpoint will usually be the IPC endpoint that SOS waits on in the syscall_loop. SOS on the other hand is initialised with no fault endpoint and hence is expected to ensure that it does not cause a fault.
SOS faults
Below is an example of code that triggers a page fault from the SOS main
function.
int main(void) { init_muslc(); /* bootinfo was set as an environment variable in _sel4_start */ char *bi_string = getenv("bootinfo"); ZF_LOGF_IF(!bi_string, "Could not parse bootinfo from env."); /* register the location of the unwind_tables -- this is required for * backtrace() to work */ __register_frame(&__eh_frame_start); seL4_BootInfo *boot_info; if (sscanf(bi_string, "%p", &boot_info) != 1) { ZF_LOGF("bootinfo environment value '%s' was not valid.", bi_string); } debug_print_bootinfo(boot_info); printf("\nSOS Starting...\n"); NAME_THREAD(seL4_CapInitThreadTCB, "SOS:root"); // cause a fault at 0xDEADBEEF *((char*)0xDEADBEEF) = 0;
When this code runs, the following output is displayed in the serial
terminal:
SOS Starting... Caught cap fault in send phase at address (nil) while trying to handle: vm fault on data at address 0xdeadbeef with status 0x92000045 in thread 0xffffff807efd6400 "SOS:root" at address 0x407c50 With stack: 0x4e59c0: 0x0 0x4e59c8: 0x48966c 0x4e59d0: 0x4e5a78 0x4e59d8: 0x0 0x4e59e0: 0x0 0x4e59e8: 0x0 0x4e59f0: 0x5ea000 0x4e59f8: 0x4895ec 0x4e5a00: 0x4e5a78 0x4e5a08: 0x4ac025 0x4e5a10: 0x48965c 0x4e5a18: 0x0 0x4e5a20: 0x4e5af0 0x4e5a28: 0x0 0x4e5a30: 0x0 0x4e5a38: 0x0
What's going on here? First, a page fault is triggered by the
dereference of invalid address 0xDEADBEEF
. Then, seL4
attempts to send a page fault IPC to the fault handling endpoint of
SOS. However, SOS doesn't have a fault handling endpoint, as it
is set to the default value of 0, an empty slot in the cspace. As a
result, a double fault is sent, which means the thread is
stopped by the kernel, and in debug mode, the above output is
produced.
From the output, observe that SOS caused a vm fault with the following important values:
- The fault address, 0xdeadbeef.
- The exception status register, 0x92000045
- The address of the instruction that faulted, 0x407c50
The fault address
The fault address is the invalid address which caused a vm fault (a TLB translation error). This is 0xdeadbeef, the address we tried to write to. Note that invalid addresses will not always trigger a vm fault: if you are unlucky enough to accidentally derefence a mapped address no fault will occur.
The exception status register
The exception status register is referred to as far_el1
on 64-bit ARM architectures (aarch64) and DFSR
on 32-bit
ARM architectures (aarch32). To decode the value, look at section
D.10.4 of the ARMv8 Architecture reference
manual
The fault address register
The most useful address is the fault address register, which contains
the value of the instruction pointer at the time of the fault. This
value can then be looked up with the objdump
tool to to determine the context of fault.
Application faults
When an application thread is created, an appropriate fault handler IPC endpoint should be
set in the thread's TCB (see Chapter 5 of the seL4 manual). If the application causes a fault
and IPC will be sent to the assigned fault endpoint. Message registers provide important
information regarding the cause and location of the fault.
objdump
may then be used to determine the context of
fault.
Using objdump
Whenever you cause a fault, you should be able to see the current program
counter printed out. You can use objdump
to find where the offending code is.
Pass to objdump the location of the binary that faulted: in this case, it's the sos binary, located in the build directory:
aarch64-linux-gnu-objdump -Dlx build/projects/aos/sos/sos | less
You can now use the searching facility in less
to search
for the faulting address. In this case I find the following fragment
of output:
407c44: 97fff6dc bl 4057b4/home/alyons/aos-2019-test/build/../projects/aos/sos/src/main.c:516 407c48: d297dde0 mov x0, #0xbeef // #48879 407c4c: f2bbd5a0 movk x0, #0xdead, lsl #16 407c50: 3900001f strb wzr, [x0] /home/alyons/aos-2019-test/build/../projects/aos/sos/src/main.c:519 407c54: f9401ba1 ldr x1, [x29,#48]
We can see that the error has occurred on code at line 516 of main.c, exactly where we triggered a fault by trying to write to 0xDEADBEEF.
If the fault occurs in an application (such as console_test or sosh) rather than SOS, the following command should be used, with APP_NAME substituted for the name of the application you are debugging:
aarch64-linux-gnu-objdump -Dlx projects/aos/apps/APP_NAME/APP_NAME | less
More on objdump
objdump
is a very handy utility for working out exactly what is
where in an executable so you can work out what exactly is going wrong.
The two standard incantations for objdump are:
aarch64-linux-gnu-objdump -dl binary_file | less
and
aarch64-linux-gnu-objdump -lx binary_file | less
The first command (-dl) disassembles the text segment and shows you all the instructions and at what address. Using this information you can find out things such as:
-
What does my function compile to?
eg. Am I compiling the right function? Do the instructions make sense?
-
Why is this instruction crashing?
Look at the fsr for the type of fault, and use objdump to find the exact instruction causing the fault.
NB: In case it's not yet obvious, you will need to get up to speed on your ARM aarch64 assembly and not be afraid to get your hands dirty if you want to minimise the time you spend debugging.
The second objdump command (-lx) is useful for when addresses appear inside an object file but outside of the text segment. This is especially useful when debugging ELF loading. The -lx option displays section and symbol information. Further options can be added to dump data segments etc. man objdump is your friend.
Generating a Stack Trace
Warning: Backtrace only supported for the Linux toolchain
We have provided a function print_backtrace
in
SOS
which allows you to print a primitive stacktrace from
anywhere in SOS. Note that backtrace does not work across random
jumps, including from within muslc system calls.
You can call print_backtrace()
from within any arbitrary
place in your code. Backtrace will traverse the eh_frame
constructed by gcc
during compilation, which provides
stack offsets, and print out the addresses of each function in the
call stack (as long as gcc can recognise the function call) This can
be useful if you need to know what your code is doing.
Consider this example, where print_backtrace
is called in
the network_init
function. The output is as follows:
SOS Started! test_cspace@tests.c:53 Test cspace libsel4muslcsys: Error attempting syscall 233 run_tests@tests.c:116 Root CSpace test passed! test_cspace@tests.c:53 Test cspace run_tests@tests.c:124 Single level cspace test passed! test_cspace@tests.c:53 Test cspace libsel4muslcsys: Error attempting syscall 233 run_tests@tests.c:137 Double level cspace test passed! run_tests@tests.c:141 DMA test passed! Network init Backtracing stack PCs: 0x408fa4 0x409288 0x407a24 0x418fec
Usage of the backtrace script is shown below.
$ ./projects/aos/backtrace.py build/projects/aos/sos/sos Enter whitespace separated addresses, <Ctrl>+D to process and exit > > 0x408fa4 0x409288 0x407a24 0x418fec > > > > Processing.... -------------------------------------------------------------------------------- 0 408fa4 in print_backtrace 408f9c: 52800141 mov w1, #0xa // #10 408fa0: 94000868 bl 40b140408fa4: b9006ba0 str w0, [x29,#104] 408fa8: b9406ba0 ldr w0, [x29,#104] 408fac: 7100001f cmp w0, #0x0 -------------------------------------------------------------------------------- 1 409288 in network_init 409280: f9000fa0 str x0, [x29,#24] 409284: 97ffff38 bl 408f64 409288: d2a00022 mov x2, #0x10000 // #65536 40928c: d2b92821 mov x1, #0xc9410000 // #3376480256 409290: f9400fa0 ldr x0, [x29,#24] -------------------------------------------------------------------------------- 2 407a24 in main_continued 407a1c: 911f8000 add x0, x0, #0x7e0 407a20: 94000616 bl 409278 407a24: 900004a0 adrp x0, 49b000 407a28: 91112000 add x0, x0, #0x448 407a2c: 94021680 bl 48d42c -------------------------------------------------------------------------------- 3 418fec in utils_run_on_stack 418fe4: aa0203e0 mov x0, x2 418fe8: d63f0020 blr x1 418fec: 9100029f mov sp, x20 418ff0: aa0003e0 mov x0, x0 418ff4: f90027a0 str x0, [x29,#72]
GDB-based SOS Debugging
We have built some support for GDB-based debugging of SOS itself. The following describes how to enable support and connect GDB to a gdbserver running within sos itself on the ODroid-C2. The later section describe a little of the software architecture used that might be relevant to your debugging.
Enabling and Connecting GDB
Building for GDB
The gdbserver support is compiled into the system itself. To include the support, ensure the following flag is ON
in settings.cmake
.
# Enable debugger set(SosGDBSupport ON CACHE BOOL "" FORCE)
In addition to the following variables remaining as supplied (and shown below).
set(CMAKE_BUILD_TYPE "Debug" CACHE STRING "" FORCE) set(KernelDebugBuild ON CACHE BOOL "" FORCE) set(HardwareDebugAPI ON CACHE BOOL "" FORCE)
The codebase is then built and run as per usual.
Running with GDB support
When GDB support is enabled, the SOS boot sequence pauses and waits for GDB to connect before completing the usual startup activities. An example of the boot sequence waiting for a GDB connection is shown below.
SOS Starting... SOS Started! test_cspace@tests.c:59 Test cspace run_tests@tests.c:173 Root CSpace test passed! test_cspace@tests.c:59 Test cspace run_tests@tests.c:181 Single level cspace test passed! test_cspace@tests.c:59 Test cspace cspace_alloc_slot@cspace.c:335 Cspace is full! run_tests@tests.c:194 Double level cspace test passed! run_tests@tests.c:198 DMA test passed! run_tests@tests.c:202 Frame table test passed! Network init dwmac.b0002000 Waiting for PHY auto negotiation to complete....... done Speed: 1000, full duplex Protocol ethernet registered (layer: 2). Protocol ipv4 registered (layer: 3). Protocol ipv6 registered (layer: 3). Protocol icmp4 registered (layer: 4). Protocol icmp6 registered (layer: 4). Protocol igmp registered (layer: 4). Protocol udp registered (layer: 4). Protocol tcp registered (layer: 4). Assigned ipv4 0.0.0.0 to device sos picotcp DHCP client: update link Assigned ipv4 192.168.168.44 to device sos picotcp DHCP client: renewal time (T1) 3163553 DHCP client: rebinding time (T2) 1241251 DHCP client: lease time 6327107 DHCP client: netmask 255.255.255.0 DHCP client: gateway 192.168.168.1 Awaiting GDB connection...
Connecting GDB
Start a new terminal window and change to your build directory, run the aarch64-linux-gnu-gdb
version of GDB, with projects/aos/sos/sos
as the executable to debug. Note: if you have your own development environment, you need version 15.1 of GDB and a version that supports aarch64. Older versions may not work correctly.
To connect GDB to the gdbserver on the Odroid-C2, enter target remote | ../odroid serial_raw
in GDB. GDB should connect to your SOS instance in a similar manner to the example shown below. Set the GDB scheduler-locking mode with set scheduler-locking step
. You can then enter c
to continue your system.
~/.../build$ aarch64-linux-gnu-gdb projects/aos/sos/sos GNU gdb (GDB) 15.1 Copyright (C) 2024 Free Software Foundation, Inc. License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html> This is free software: you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law. Type "show copying" and "show warranty" for details. This GDB was configured as "--host=x86_64-pc-linux-gnu --target=aarch64-none-linux-gnu". Type "show configuration" for configuration details. For bug reporting instructions, please see: <https://www.gnu.org/software/gdb/bugs/>. Find the GDB manual and other documentation resources online at: <http://www.gnu.org/software/gdb/documentation/>. For help, type "help". Type "apropos word" to search for commands related to "word"... Reading symbols from projects/aos/sos/sos... (gdb) target remote | ../odroid serial_raw Remote debugging using | ../odroid serial_raw arm_sys_send_recv (sys=18446744073709551615, dest=903898, out_badge=0xc0063c78, info_arg=12326, out_info=0xc0063ca8, in_out_mr0=0xc0063ca0, in_out_mr1=0xc0063c98, in_out_mr2=0xc0063c90, in_out_mr3=0xc0063c88, reply=0) at ../kernel/libsel4/sel4_arch_include/aarch64/sel4/sel4_arch/syscalls.h:169 169 *out_info = info; (gdb) set scheduler-locking step (gdb)
A sample .gdbinit file
The following is a sample build/.gdbinit
file that simplifies startup. It simply allows you to type connect
each time.
define connect target remote | odroid serial_raw set scheduler-locking step end
GDB Connection Architecture
The following diagram illustrates the overall system architecture of a SOS being debugged with GDB.
Development Host
The dev. host is the machine hosting your work (e.g. a CSE machine or your personal laptop). It connects to one of the Odroid-C2s via aos-2
.
odroid serial
connects a one-way web socket to aos-2 that shows the serial output from an Odroid-C2- GDB's
target remote | odroid serial_raw
also connects to the same serial port. GDB filters and acts upon the various GDB related content in the serial stream.
aos-2
aos-2.cse.unsw.edu.au
is a host that authenticates and manages all the connections from students to Odroid-C2s in a cluster numbering approximately 60. It attempts to filter out gdb related content from the odroid serial
stream to make the stream more intelligible. Note that it is possible that some artifacts may appear occasionally.
aos-2
also provides a private NFS export and TFTP service to each Odroid-C2
Odroid-C2
The Odroid-C2 is where the action is. It boots a system image fromaos-2
containing seL4 and SOS.
- seL4 is a custom version for the course that adds two features:
- It makes the serial stream available to the gdbserver
- It listens for
reset
on the serial line and reboots the Odroid-C2 if seen.
-
When GDB is enabled, SOS (as provided) consists of two threads:
- The
main()
thread which runs your SOS OS personality. - A gdbserver thread which runs within the SOS task itself and supports debugging of the
main()
thread and any additional threads in SOS itself.
- The
Observations
The gdbserver and SOS debug printing share the same serial port. There is the potential to cross the streams resulting in unexpected output for intrepid Ghostbusters. That said, sharing seems to work well with GDB and the odroid mulitplexer keeping the streams separate.
The gdbserver and SOS share the same address space. The gdbserver has access to everything SOS has in order to debug SOS, however SOS can also corrupt the gdbserver. In practice, the gdbserver seems reliable, it's just not completely immune to SOS memory safety violations.
The gdbserver only has access to SOS. It cannot be used to debug clients of SOS, i.e. SOS processes.