Week 01 Lab Exercise

Sanitizers, Makefiles and C Revision

Objectives

In this lab, you will:

  • Get reacquainted with C programming
  • Practice programming with arrays and linked lists
  • Learn how to read and understand sanitizer error messages
  • Learn how to use the make command

Admin

Marks
5 (see the Assessment section for more details)
Demo
in the Week 1, 2 or 3 lab session
Submit
see the Submission section
Deadline to submit to give
12pm Monday of Week 2
Late penalty
0.2% per hour or part thereof deducted from the attained mark, submissions later than 5 days not accepted

Background

Welcome to COMP2521!

The aim of this lab exercise is to introduce you to the tooling used in COMP2521, as well as give you practice programming with arrays and linked lists, as many of the data structures we will be covering are built upon them.

Setting Up

To keep your files manageable, it's worth doing each lab exercise in a separate directory (folder). We suggest creating a subdirectory in your home directory called "COMP2521", "cs2521" or "2521", and then creating subdirectories under that called "lab01", "lab02", etc.

Create a directory for this lab, change into it, and run the following command:

unzip /web/cs2521/24T3/labs/week01/downloads/files.zip

If you're working at home, download files.zip by clicking on the above link and then unzip the downloaded file.

If you've done the above correctly, you should now have three subdirectories in your lab01 directory:

  • sanitizers, which is for Part 1
  • makefiles, which is for Part 2
  • revision, which is for Part 3

Part 1 - Sanitizers 💧️👏️

clang

In COMP1511/COMP1911, you used dcc, a compiler created at CSE which is intended to help novice programmers by providing easy-to-understand explanations.

In COMP2521, since you are no longer a novice programmer, we will be using clang, a real-world compiler! (Fun fact: dcc actually uses clang.)

When you used dcc, you always compiled your programs like this:

dcc -o prog prog.c

When using clang, there are three other flags which we always compile with: -Wall, -Werror and -g. -Wall causes the compiler to warn us about (almost) all possible syntax mistakes in our program, -Werror causes warnings to be treated like errors (which prevents the program from being compiled), and -g causes the compiler to generate debug information (such as line numbers and variable names), which can be used by debuggers such as gdb.

Sanitizers

In this course, we often compile with sanitizers, which are tools that insert extra code into the program to detect various errors at runtime. Each sanitizer detects different kinds of errors. We use the following sanitizers:

  • AddressSanitizer (ASan): detects invalid memory accesses
  • MemorySanitizer (MSan): detects uses of uninitialized memory
  • UndefinedBehaviourSanitizer (UBSan): detects various kinds of undefined behaviour (e.g., integer overflow during addition)
  • LeakSanitizer (LSan): detects memory leaks

To compile with sanitizers, we use the -fsanitize= flag followed by a comma-separated list of sanitizers. For example, to compile a program with AddressSanitizer and UndefinedBehaviourSanitizer, we would use the following command:

clang -Wall -Werror -g -fsanitize=address,undefined -o prog prog.c

While it would be nice if we could compile our program with all the sanitizers at once, not all the sanitizers are compatible with each other. For example, AddressSanitizer is not compatible with MemorySanitizer. Thus, to ensure our programs do not contain any errors detected by either sanitizer, we must compile and run our programs twice: once with AddressSanitizer, and once with MemorySanitizer.

Since the error messages produced by sanitizers are not as friendly or as easily comprehensible, the goal of this activity is to teach you the meaning of common error types and how to locate useful information in error messages.

Example

Sanitizers produce detailed error messages which describe the type of error, the line of code that triggered the error, and more.

Here is an example error message produced by AddressSanitizer (with important parts bolded):

=================================================================
==796003==ERROR: AddressSanitizer: heap-use-after-free on address 0x602000000018 at pc 0x0000004c877a bp 0x7ffc206af420 sp 0x7ffc206af418
READ of size 8 at 0x602000000018 thread T0
    #0 0x4c8779 in func /tmp/sanitizer-example/prog.c:49:15
    #1 0x4c8359 in main /tmp/sanitizer-example/prog.c:35:9
    #2 0x7fa437b12d09 in __libc_start_main csu/../csu/libc-start.c:308:16
    #3 0x41e2b9 in _start (/tmp/sanitizer-example/prog+0x41e2b9)

0x602000000018 is located 8 bytes inside of 16-byte region [0x602000000010,0x602000000020)
freed by thread T0 here:
    #0 0x4980ad in free (/tmp/sanitizer-example/prog+0x4980ad)
    #1 0x4c86cf in func /tmp/sanitizer-example/prog.c:48:2
    #2 0x4c8359 in main /tmp/sanitizer-example/prog.c:35:9
    #3 0x7fa437b12d09 in __libc_start_main csu/../csu/libc-start.c:308:16

previously allocated by thread T0 here:
    #0 0x49832d in malloc (/tmp/sanitizer-example/prog+0x49832d)
    #1 0x4c8066 in main /tmp/sanitizer-example/prog.c:21:20
    #2 0x7fa437b12d09 in __libc_start_main csu/../csu/libc-start.c:308:16

SUMMARY: AddressSanitizer: heap-use-after-free /tmp/sanitizer-example/prog.c:49:15 in func
Shadow bytes around the buggy address:
...
==796003==ABORTING

Even without the program's source code, there are a number of things we can deduce from this message:

  • The type of error is "heap-use-after-free". This means the program freed a block of memory, and then tried to access it afterwards. It is an error to access a memory block after it has been freed, because by freeing it, the program is declaring that it can be reused for other purposes, and it has potentially been re-allocated for another purpose and overwritten with other data already.
  • The error occurred when the program tried to read from a memory address.
  • The error occurred in the func function, on line 49 of prog.c. This function was called by the main function on line 35 of prog.c.
  • The memory block was 16 bytes, and was freed in the func function on line 48 of prog.c. This function was called by the main function on line 35 of prog.c.
  • The memory block was allocated by malloc, which was called from the main function on line 21 of prog.c.

Tip: The groups of lines starting with # are stack traces. Recall that when a program calls a function, a stack frame is created for it on the stack which remains until the function returns. A stack trace lists all the function calls that were on the stack at a particular point in time while the program was running, as well as the line that the program was up to in each function. Simply put, a stack trace is used to describe where a program was up to.

For example, consider the first stack trace in the above error message. This tells us that when the heap-use-after-free error occurred, the function calls on the stack (from earliest to latest) were:

  • _start
  • __libc_start_main
  • main (line 35)
  • func (line 49)

This tells us that the main function called the func function on line 35, and then the error occurred on line 49. You can ignore _start and __libc_start_main, as these are not part of the program's source code (these functions/procedures perform some initialisation for the program before the main function is actually called).

Sanitizers guide

The COMP2521 Sanitisers Guide (written by Franco Reyes, one of our awesome tutors) is a helpful guide that explains how to read error messages, and contains detailed explanations and examples for all common error types.

Your task

The sanitizers/ directory contains several buggy programs. For each of the programs below, your task is to:

  • Compile the program as instructed
  • Run the program until you get an error message from AddressSanitizer or MemorySanitizer
  • With help from the Sanitisers Guide, identify the following pieces of information from the error message:
    • The type of error, and what it means
    • The name of the function in which the error occurred, and the source file and line number of the instruction that triggered the error
    • The series of functions that were on the function call stack when the error occurred
    • The meaning of any other stack traces in the error message (these usually have a line of pink text above them)
  • Deduce why the error occurred
  • Deduce how to fix the program (but don't actually fix it)
  • In the file part-1-answers.txt (in the revision directory), under the heading for the program, paste the error message given by the program, briefly explain why the error occurred and briefly describe how it can be fixed. During lab marking, your tutor will ask you about some of the programs, so the notes you write in this file will be helpful.
    Note: The part-1-answers.txt file is in the revision directory because when you submit the lab, all the files you submit will need to be in the same directory.
Program 1: listAddStart.c

The purpose of this program is to repeatedly read values and insert them at the beginning of a linked list. Here is the intended behaviour of the program:

./listAddStart
Enter values:

[]
./listAddStart
Enter values: 3 1 4 1 5

[5, 1, 4, 1, 3]

Compile the program with the following command:

clang -Wall -Werror -g -fsanitize=address,leak,undefined -o listAddStart listAddStart.c
Reminder: Do not fix the programs - the purpose of this task is to learn how to interpret error messages. During lab marking, your tutor will ask you to reproduce the error messages, which won't be possible if you have fixed the programs.
Program 2: shuffleArray.c

The purpose of this program is to read values into an array and then shuffle the array. Here is the intended behaviour of the program:

./shuffleArray
Enter array size: 5
Enter array values: 3 1 4 1 5
Original array: [3, 1, 4, 1, 5]
Shuffled array: [1, 3, 1, 5, 4]

Note that the program randomly shuffles the array, so it is expected to produce different outputs each time it is run.

Compile the program with the following command:

clang -Wall -Werror -g -fsanitize=address,leak,undefined -o shuffleArray shuffleArray.c
Hint: Note that unlike Program 1, the error in this program will occur in a function that is not in the source code (as there is no function called scanf_common in shuffleArray.c). This suggests that the error occurred in a library function. In this case, you should look further down the function call stack until you find a function that exists in the source code.
Program 3: reverseArray.c

The purpose of this program is to read values into an array and then reverse the array. Here is the intended behaviour of the program:

./reverseArray
Enter array size: 5
Enter array values: 3 1 4 1 5
Original array: [3, 1, 4, 1, 5]
Reversed array: [5, 1, 4, 1, 3]

Compile the program with the following command:

clang -Wall -Werror -g -fsanitize=address,leak,undefined -o reverseArray reverseArray.c
Program 4: listPrepend.c

The purpose of this program is to repeatedly read values and insert them at the beginning of a linked list. Here is the intended behaviour of the program:

./listPrepend
Enter values: 

[]
./listPrepend
Enter values: 3 1 4 1 5

[5, 1, 4, 1, 3]

Compile the program with the following command:

clang -Wall -Werror -g -fsanitize=memory,undefined -fsanitize-memory-track-origins -o listPrepend listPrepend.c
Program 5: listAppend.c

The purpose of this program is to read values and insert them at the end of a list. Here is the intended behaviour of the program:

./listAppend
Enter values: 

[]
./listAppend
Enter values: 3 1 4 1 5

[3, 1, 4, 1, 5]

Compile the program with the following command:

clang -Wall -Werror -g -fsanitize=address,leak,undefined -o listAppend listAppend.c
Program 6: listDeleteFirst.c

The purpose of this program is to read values into a list and then delete the first value. Here is the intended behaviour of the program:

./listDeleteFirst
Enter values: 

Original list: []
List after deleting first value: []
./listDeleteFirst
Enter values: 3 1 4 1 5

Original list: [3, 1, 4, 1, 5]
List after deleting first value: [1, 4, 1, 5]

Compile the program with the following command:

clang -Wall -Werror -g -fsanitize=address,leak,undefined -o listDeleteFirst listDeleteFirst.c
Program 7: listNumEvens.c

The purpose of this program is to read values into a list and then count the number of even values. Here is the intended behaviour of the program:

./listNumEvens
Enter values: 

Number of even values: 0
./listNumEvens
Enter values: 3 1 4 1 5

Number of even values: 1

This program does not contain any errors that would be detected by AddressSanitizer, but it does contain errors that would be detected by MemorySanitizer. Compile the program with the following command:

clang -Wall -Werror -g -fsanitize=memory,undefined -fsanitize-memory-track-origins -o listNumEvens listNumEvens.c
Program 8: mostFrequentLetter.c

The purpose of this program is to read a line of text and then determine the most frequent letter. Here is the intended behaviour of the program:

./mostFrequentLetter
Enter some text: hello world
The most frequent letter is: l

Compile the program with the following command:

clang -Wall -Werror -g -fsanitize=memory,undefined -fsanitize-memory-track-origins -o mostFrequentLetter mostFrequentLetter.c
Reminder: Do not fix the programs - the purpose of this task is to learn how to interpret error messages. During lab marking, your tutor will ask you to reproduce the error messages, which won't be possible if you have fixed the programs.

Part 2 - Makefiles 🛠️

The purpose of this section is to explain how Makefiles will be used in this course and give you enough background on Makefiles for you to be able to use the make command. You won't be assessed on your understanding of Makefiles and make.

If you got tired of typing out the compilation commands in Part 1, you're not alone! In COMP2521, we will usually not type out the compilation commands to compile our programs; instead, we will use Makefiles and the make command.

A Makefile is a file that specifies the build products of a project, their dependencies (e.g., source code files) and instructions for building them (e.g., compilation commands). If a Makefile exists, the make command will use the Makefile to determine what needs to be built (or rebuilt), and then run the commands needed to build them.

The main benefit of using Makefiles is that make uses the modification timestamps of files to decide whether something needs to be rebuilt, and does not rebuild something if its dependencies have not been modified since it was last built. This means, for example, if we have only modified one file since we last ran make, and then we run make again, only parts of the project that depend on the file we modified will be rebuilt. This saves a lot of build time in larger projects.

In COMP2521, we will always provide Makefiles for any coding activity, including in labs, assignments and the exam, so you don't need to know how to write a Makefile. However, you should know how to run the make command.

Let's take a look at a simple Makefile. Here is the Makefile from the makefiles directory of this lab:


CC = clang
CFLAGS = -Wall -Wvla -Werror -g

########################################################################

asan: CFLAGS += -fsanitize=address,leak,undefined
asan: all

msan: CFLAGS += -fsanitize=memory,undefined -fsanitize-memory-track-origins
msan: all

nosan: all

########################################################################

all: concatStrings getMax

concatStrings: concatStrings.c
	$(CC) $(CFLAGS) -o concatStrings concatStrings.c

getMax: getMax.c
	$(CC) $(CFLAGS) -o getMax getMax.c

########################################################################

clean:
	rm -f concatStrings getMax

The important thing to note here are the targets, which are the words that appear before a colon. The targets in this Makefile are: asan, msan, nosan, all, concatStrings, getMax and clean. Note that some targets refer to actual files that will be produced (concatStrings and getMax), while the other targets don't refer to actual files. When we run make, we can either specify one or more targets to build, or not specify any targets, in which case make will simply build the first target in the Makefile.

The dependencies of a target are listed after the colon on the same line. For example, the asan target has one dependency, which is the all target. Meanwhile, the all target has two dependencies: concatStrings and getMax.

It's time to try out the make command! Change to the makefiles/ directory, then run make. You should get the following output:

make
clang -Wall -Wvla -Werror -g -fsanitize=address,leak,undefined -o concatStrings concatStrings.c
clang -Wall -Wvla -Werror -g -fsanitize=address,leak,undefined -o getMax getMax.c

You may be wondering why make compiled concatStrings and getMax when the first target in the Makefile is asan. That is because concatStrings and getMax are indirect dependencies of the asan target - the asan target depends on the all target, which depends on concatStrings and getMax.

Let's see what happens if we try to run make again. You should get the following output:

make
make: Nothing to be done for 'asan'.

Since the dependencies of asan have not been modified since we last ran make, make deduces that nothing needs to be rebuilt, and thus outputs "Nothing to be done".

As we mentioned before, in Makefiles, you can create targets which don't actually refer to real files. This can be useful because (1) you may want to automate tasks that don't involve any compilation, but are still related to the compilation process or (2) you may want to compile the same program in different ways. For example, in all of our Makefiles, we will provide a target called "clean" which removes all build products from the directory. To use it, specify the clean target. You should get the following output:

make clean
rm -f concatStrings getMax

We will also include a few targets to let you specify what sanitizers you want to compile with, as not all sanitizers are compatible with each other. Since AddressSanitizer is the most useful sanitizer, this is what make will compile with if you run it without any arguments (which you've seen above). However, you can explicitly state that you want to compile with AddressSanitizer like so:

make asan

If you want to compile with MemorySanitizer, then you must specify the msan target, like so:

make msan

If you wish to compile without any sanitizers, which you should only need to do if you want to use valgrind, then you must specify the nosan target like so:

make nosan

Note that if you build each of these targets one after the other without modifying any files in between, make will not rebuild the programs and instead output "Nothing to be done for [target]". This is because even though asan, msan and nosan are different targets, they all have the same base dependencies (concatStrings.c and getMax.c), which have not been modified. If you want to compile a program with AddressSanitizer, and then compile it again with MemorySanitizer, you will need to run make clean in between.

Your task

The makefiles/ directory contains two buggy programs. One of these programs contains an error that will be caught by AddressSanitizer, while the other program contains an error that will be caught by MemorySanitizer. Follow these steps:

  1. Use make to compile the programs with AddressSanitizer.
  2. Run each of the programs. Observe that one of the programs produces an error, while the other exits normally.
    • In the file part-2-answers.txt (in the revision directory), paste the error message under the AddressSanitizer heading.
  3. Now use make to compile the programs with MemorySanitizer.
  4. Run each of the programs again. Observe that the program that produced an error in step 2 now exits normally, while the other program now produces an error.
    • In the file part-2-answers.txt (in the revision directory), paste the error message under the MemorySanitizer heading.

Part 3 - Revision Exercises 💪️

The remainder of this lab exercise consists of some array and linked list exercises to give you some practice programming with them again. These exercises are in the revision/ directory.

Task 1 - Shifting an Array

Your task is to implement the following function in arrayShift.c:

void shift(int arr[], int size, int n);

This function shifts an array of the given size to the right \( n \) times. When an array is shifted to the right, each element is moved one index to the right (i.e., if an element was at index \(i\), it will now be at index \(i + 1\)), and the element at the end of the array (at index \(N - 1\) where \(N\) is the size of the array) is moved to the start (index \(0\)).

\(n\) can be assumed to be non-negative.

Testing

arrayShift.c contains a main function which allows you to test your shift function. The main function:

  • Reads the size of the array
  • Reads the values of the array
  • Displays the array
  • Reads the value of \(n\)
  • Calls shift
  • Displays the updated array

Here are some examples of its usage:

make
...
./arrayShift
Enter array size: 3
Enter array values: 16 5 2
Array: [16, 5, 2]
Enter shift: 1
Array after shifting 1 time(s): [2, 16, 5]
./arrayShift
Enter array size: 5
Enter array values: 16 5 2 -3 4
Array: [16, 5, 2, -3, 4]
Enter shift: 3
Array after shifting 3 time(s): [2, -3, 4, 16, 5]
./arrayShift
Enter array size: 5
Enter array values: 16 5 2 -3 4
Array: [16, 5, 2, -3, 4]
Enter shift: 10
Array after shifting 10 time(s): [16, 5, 2, -3, 4]

Task 2 - Shifting a Linked List

Your task is to implement the following function in listShift.c:

struct node *shift(struct node *list, int n);

This function shifts a linked list to the right \( n \) times and returns a pointer to the start of the updated list. When a linked list is shifted to the right, the node at the end of the list is moved to the start.

\(n\) can be assumed to be non-negative.

Constraints
  • You must not use malloc or any of its variants
  • You must not modify the integer values in any nodes
Testing

listShift.c contains a main function which allows you to test your shift function. The main function:

  • Reads the size of the list
  • Reads the values of the list
  • Displays the list
  • Reads the value of \(n\)
  • Calls shift
  • Displays the updated list

Here are some examples of its usage:

make
...
./listShift
Enter list size: 3
Enter list values: 16 5 2
List: [16, 5, 2]
Enter shift: 1
List after shifting 1 time(s): [2, 16, 5]
./listShift
Enter list size: 5
Enter list values: 16 5 2 -3 4
List: [16, 5, 2, -3, 4]
Enter shift: 3
List after shifting 3 time(s): [2, -3, 4, 16, 5]
./listShift
Enter list size: 5
Enter list values: 16 5 2 -3 4
List: [16, 5, 2, -3, 4]
Enter shift: 10
List after shifting 10 time(s): [16, 5, 2, -3, 4]

Task 3 - Inserting into an Ordered Array

Your task is to implement the following function in arrayInsertOrdered.c:

void insertOrdered(int arr[], int size, int value);

This function takes an array which is sorted in increasing order, and inserts the given value into the array such that the array remains ordered.

The function can assume that it can store an element at index size. For example, suppose that the inputs are:

Then the value should be inserted like so:

Testing

arrayInsertOrdered.c contains a main function which allows you to test your insertOrdered function. The main function:

  • Reads the size of the array
  • Reads the values of the array
  • Displays the array
  • Reads the value to be inserted
  • Calls insertOrdered
  • Displays the updated array

Here are some examples of its usage:

make
...
./arrayInsertOrdered
Enter array size: 4
Enter array values (must be in ascending order): 2 5 7 8
Array: [2, 5, 7, 8]
Enter value to insert: 6
Array after inserting 6: [2, 5, 6, 7, 8]
./arrayInsertOrdered
Enter array size: 5
Enter array values (must be in ascending order): -8 -3 -1 2 4
Array: [-8, -3, -1, 2, 4]
Enter value to insert: -9
Array after inserting -9: [-9, -8, -3, -1, 2, 4]
./arrayInsertOrdered
Enter array size: 6
Enter array values (must be in ascending order): -5 0 2 6 9 14
Array: [-5, 0, 2, 6, 9, 14]
Enter value to insert: 2
Array after inserting 2: [-5, 0, 2, 2, 6, 9, 14]

Task 4 - Inserting into an Ordered Linked List

Your task is to implement the following function in listInsertOrdered.c:

struct node *insertOrdered(struct node *list, int value);

This function takes a linked list which is sorted in increasing order, inserts the given value into the list such that the list remains ordered, and then returns a pointer to the start of the updated list.

Testing

listInsertOrdered.c contains a main function which allows you to test your insertOrdered function. The main function:

  • Reads the size of the list
  • Reads the values of the list
  • Displays the list
  • Reads the value to be inserted
  • Calls insertOrdered
  • Displays the updated list

Here are some examples of its usage:

make
...
./listInsertOrdered
Enter list size: 4
Enter list values (must be in ascending order): 2 5 7 8
List: [2, 5, 7, 8]
Enter value to insert: 6
List after inserting 6: [2, 5, 6, 7, 8]
./listInsertOrdered
Enter list size: 5
Enter list values (must be in ascending order): -8 -3 -1 2 4
List: [-8, -3, -1, 2, 4]
Enter value to insert: -9
List after inserting -9: [-9, -8, -3, -1, 2, 4]
./listInsertOrdered
Enter list size: 6
Enter list values (must be in ascending order): -5 0 2 6 9 14
List: [-5, 0, 2, 6, 9, 14]
Enter value to insert: 2
List after inserting 2: [-5, 0, 2, 2, 6, 9, 14]

Optional Challenge Task

Note: This task is optional. It is not marked.

In tasks 1 and 2, implement the shift function without using any nested loops (non-nested loops are fine) and without using malloc, calloc or any temporary arrays.

Submission

Submit via the command line using the give command (make sure you are in the revision directory first):

give cs2521 lab01

You can also submit via give's web interface. You can submit multiple times. Only your last submission will be marked. You can check the files you have submitted here.

Assessment

Parts 1 and 2 will be handmarked, and Part 3 will be automarked. Please note that handmarking must be completed in Weeks 1, 2 or 3; you cannot ask your tutor to handmark your lab01 work in Week 4 or later (they won't be able to enter your mark). The marking criteria are as follows:

Part Marks Criteria
Part 1 - Sanitizers 2 Your tutor will pick two programs from the list of programs in Part 1. For each program, they will ask you to run the program until AddressSanitizer or MemorySanitizer produces an error message, and then ask you identify the following:
  • The type of error, and what it means
  • The name of the function in which the error occurred, and the source file and line number of the instruction that triggered the error
  • The series of functions on the function call stack when the error occurred
  • Why the error occurred
  • How the program can be fixed
Part 2 - Makefiles 1 Your tutor will ask you to use the make command to compile the programs in the makefiles directory, once with AddressSanitizer, and once with MemorySanitizer. They will also check your answers in part-2-answers.txt.
Part 3 - Revision Exercises 2 0.5 marks per task. Automarking will be run after submissions have closed. After automarking is run you will be able to view your results here.