Programming Fundamentals

Information

  • This page contains extra challenge exercises for week 10.
  • These exercises are not compulsory, nor do they provide any marks in the course.
  • These exercises are intended for students that want more challenge in the course.
  • You cannot submit any of these exercises, however autotests are available for them (Command included at bottom of each exercise).

Exercise
(●◌◌)
:

Overflow

Disclaimer and warning

C is an amazing language, as you have very fine control over memory and computer operations that are not found in many other languages. This allows us to write more efficient programs. Incorrect use of the language can result in security vulnerabilities that can be exploited.

The most common method of exploitation of C programs is memory manipulation. If an attacker can overwrite parts of memory, they can cause the program to deviate from its expected behaviour, which allows exploitation.

This exercise is designed to teach you about the dangers of writing vulnerable C, with the goal of providing a deeper understand of the language and how to avoid such vulnerabilities.

Buffers

A buffer is a piece of memory that has been allocated to store some contiguous data. In C these are called arrays (or strings (char[n])). While you allocate a specific amount of memory, say n = 10, C will not stop you from writing more than 10 values to the buffer. If we try to write more space than we have allocated, we could possibly overwrite other values that are stored nearby, resulting in undefined behaviour. This is the basis of a buffer overflow vulnerability.

Dangerous functions

[DEPRECATED] char *gets(char *s)

char *gets() reads a line from stdin into the buffer (another name for an array) pointed to by s until either a terminating newline or EOF, which it replaces with a null byte.

Have a think about why this function is extremely dangerous and continue reading.

When we run man 3 gets in our terminal, we are opening the manual page for the libc library function gets(). Reading this description, we see "Never use this function."

Now you might be wondering, if this function is so dangerous, why is it still able to be used?

  • Old language, naive, backwards compatibility
  • Compilers such as gcc have default behaviour to warn against using it, as we will see later

Spoiler?

gets() is so dangerous because it will read characters continuously from stdin, placing them in the buffer we have provided, regardless of the buffer size. This means an input that is larger than the buffer will simply write over the memory after the buffer, creating unexpected behaviour. This will often result in what is called a "Segmentation Fault".

Here is how gets is implemented in libc (comments have been added for clarity):

char *gets(char *buf) {
	int c;
	char *s;

	// Don't need to understand what this does, but run `man FLOCKFILE` to find out more if you are curious
	FLOCKFILE(stdin);

	// Loop continuously, getting the next char from stdin until a newline or EOF is encountered
	for (s = buf; (c = getchar_unlocked()) != '\n';)
		if (c == EOF)
			if (s == buf) {
				FUNLOCKFILE(stdin);
				return (NULL);
			} else
				break;
		else
			/*
			Writing the character to the pointed to location, and update the pointer
			to point to the next char sized piece of memory.
			You can find more about pointer arithmetic here:
			https://en.wikipedia.org/wiki/Pointer_(computer_programming)
			*/
			*s++ = c;
	// Replace the last character, EOF or newline, with a null byte.
	*s = '\0';
	FUNLOCKFILE(stdin);
	return (buf);
}

Even functions you are using frequently can be vulnerable when used incorrectly. A well written blog post regarding scanf can be found here

Example

Now we will examine what a buffer overflow vulnerability looks like.

Here we have some code that is intended to read in a series of characters to a buffer. The program never updates int x, but if for some reason x becomes non-zero, it will print out an alternative output.

#include 

int main (void) {
    char buf[10];
    int x = 0;

    printf("Enter your payload: ");

    gets(buf);

	if (x != 0) {
		printf("How did you get in here!?\n");
    	printf("x = %d\n", x);
	} else {
		printf("HAHAHA you will never pass my if condition!\n");
	}

    return 0;
}

Try compiling this code yourself with gcc -fno-stack-protector example.c -o example (Note: make sure you use gcc) and see what output you get.

We need to use the option -fno-stack-protector as the compiler will add protection after the stack that will cause the program to crash if we overflow the buffer. This is good for normal applications (you should never turn this option on unless you know exactly why), but in our case we want to demonstrate a simple vulnerability.

The compilation output should look something like this:

gets.c: In function ‘main’:
gets.c:13:5: warning: implicit declaration of function ‘gets’; did you mean ‘fgets’? [-Wimplicit-function-declaration]
   13 |     gets(name);
      |     ^~~~
      |     fgets
gets.c:(.text+0x51): warning: the `gets' function is dangerous and should not be used.

Try running the program and playing around it, seeing if you can get the alternative output to print.

If you aren't able to print the alternative output and/or want an explanation of what is happening, keep reading.

Spoiler

If you disobeyed the initial output and entered more than 10 characters, you would have been able to enter the if statement. Entering 11 As gives the following output:

Enter your payload: AAAAAAAAAAA
How did you get in here!?
x = 65

We can see that x = 65, which is the decimal representation of the 'A' ASCII character. This is an example of a buffer overflow that allowed us to overwrite memory we weren't supposed to, giving us access to branches of code that the developer didn't intend.

Below is a visualisation of what the memory looks like after entering 11 As.

Memory Diagram

Now, edit the supplied code to use a safe method of reading in input, such as reading N - 1 individual characters. Don't forget to append a null byte! Recompile your edited code using gcc -fno-stack-protector example.c and see if the same buffer overflow exists.

Challenge

There are four challenge files that ramp up in difficulty, so please don't worry if you can't complete them all.

There are no autotests provided for this challenge.

For all 4 files your challenge is to get the program to tell you that "You are the best student ever". You should also fix the vulnerability, such that your previous input that solved the challenge no longer works. To recompile the binary use gcc -fno-stack-protector <filename>.c -o <filename>. Additionally, you should try to compile the binary without the flag and see if you can solve the challenge with the same input.

Keep reading for some hints for the challenges:

If you found this interesting, I suggest looking in to cybersecurity. This is just a little taster of the other amazing things you can learn about!

Exercise
(●●◌)
:

Command Line Words

Write a program called command_line_words.c that takes in comand line arguments and prints out the total number of words that appear in them.

A word is defined as any collection of characters that do not contain a space. For example, "Today I Slept" contains 3 words by that definition.

However, the twist with this exercise is that we are going to input command line arguments in a special way such that a single argument can contain spaces.

So far, we have seen the use of command line arguments as such:

./program Here are my arguments

We know that argc in this case is 5. We can also visualise the layout of argv like so:

However, there is actually a way to group words together into a single argument! This can be done by surrounding these words in double quotes. If we adjust the above example to instead be:

./program Here "are my" arguments

Then argc will now be 4. We can also visualise the new layout of argv like so:

It is important to see here that the number of command line arguments changes, but the number of words stays the same (4, excluding ./program)!

Here are some examples for how your program should work (Note that we ignore ./command_line_words in all outputs):

dcc command_line_words.c -o command_line_words
./command_line_words Here "are my" arguments
There are 3 command line arguments (Excluding program)!
There were 4 total words!
./command_line_words "All words in one argument"
There are 1 command line arguments (Excluding program)!
There were 5 total words!
./command_line_words "Mixture of" "words" in "Command line arguments"
There are 4 command line arguments (Excluding program)!
There were 7 total words!
./command_line_words "Empty Arguments" " " " " "End"
There are 4 command line arguments (Excluding program)!
There were 3 total words!

Assumptions/Restrictions/Clarifications

  • You will only be given letters, quotes and spaces as input
  • Autotests will use single quotes to group arguments. There is no fundamental difference in this exercise between using single and double quotes

Exercise
(●●●)
:

Valid C Brackets

Download valid_c_brackets.c here

Or, copy these file(s) to your CSE account using the following command:

1511 fetch-activity valid_c_brackets

Your task is to add code to these functions in valid_c_brackets.c:

// Given a string containing the contents of a C file, print out whether it has
// correct matching brackets. If it does not, print out which line that didn't
// have a correct matching bracket.
void valid_c_brackets(char *file_contents) {
    // TODO: COMPLETE THIS FUNCTION AND REMOVE THE PRINTF BELOW
    printf("valid_c_brackets() has not been implemented yet.\n");
}

In this program you will provide the name of a C file in the command line arguments and the program will print whether it has valid matching brackets.

When we compile our code, the compiler (dcc in our case) will check if your code is correct before it does so and prints errors if it cannot compile. One thing a compiler will check for is if all your brackets match properly.

This can get quite complex when you have brackets nested in other brackets (such as putting a while loop inside an if statement where we print out an array)

In this program, we have handled all file input and you have the write the provided function to test bracket matching. In the function, you will be provided the file contents as a string so that you do not need to worry about the file aspect

Some example files are provided for you below which you can download to use. Make sure they are in the same directory as valid_c_brackets.c when you are testing

Example files

Download basic_valid.c here, or copy it to your CSE account using the following command:

cp -n /import/adams/A/cs1511/public_html/24T3/activities/valid_c_brackets/files/basic_valid.c .

Download basic_invalid.c here, or copy it to your CSE account using the following command:

cp -n /import/adams/A/cs1511/public_html/24T3/activities/valid_c_brackets/files/basic_invalid.c .

Download medium_valid.c here, or copy it to your CSE account using the following command:

cp -n /import/adams/A/cs1511/public_html/24T3/activities/valid_c_brackets/files/medium_valid.c .

Download medium_invalid.c here, or copy it to your CSE account using the following command:

cp -n /import/adams/A/cs1511/public_html/24T3/activities/valid_c_brackets/files/medium_invalid.c .

Download complex_valid.c here, or copy it to your CSE account using the following command:

cp -n /import/adams/A/cs1511/public_html/24T3/activities/valid_c_brackets/files/complex_valid.c .

Download complex_invalid.c here, or copy it to your CSE account using the following command:

cp -n /import/adams/A/cs1511/public_html/24T3/activities/valid_c_brackets/files/complex_invalid.c .

Examples

./valid_c_brackets basic_valid.c
File has valid matching brackets!
./valid_c_brackets basic_invalid.c
Non-matching bracket found on line 5. Was expecting a ')' but got a '}'
./valid_c_brackets medium_valid.c
File has valid matching brackets!
./valid_c_brackets medium_invalid.c
There was a missing '}' bracket in this program

There are essentially 3 cases here:

  • File is valid - Print as such
  • Non-matching brackets - When searching for a match to an opening bracket, a non-matching closing bracket was found first
  • Not enough brackets - The end of the file was reached and the last seen, non-matched opening bracket was never matched

The key idea with this exercise is that you need to consider the most recent opening bracket and try to match it before matching other un-matched opening brackets before it.

Assumptions/Clarifications/Restrictions

  • How could you use a stack to model this problem?
  • You can assume that the only bracket pairs you need to match are:
    • ()
    • {}
    • []
  • You can assume these brackets will only appear in actual code, meaning they can't appear in comments or strings (such as printing the bracket)
  • You will need to keep track of the current line. This can simply be done by looking for each new line character in the given string
  • As a note from the above, there can still appear new line characters in the file such as in printfs, but this will appear as 2 characters in the string so you do not need to worry about it (since actual new lines are 1 character)