Programming Fundamentals

Information

  • This page contains extra challenge exercises for week 10.
  • These exercises are not compulsory, nor do they provide any marks in the course.
  • These exercises are intended for students that want more challenge in the course.
  • You cannot submit any of these exercises, however autotests are available for them (Command included at bottom of each exercise).

Exercise
(●◌◌)
:

Overflow

Disclaimer and warning

C is an amazing language, as you have very fine control over memory and computer operations that are not found in many other languages. This allows us to write more efficient programs. Incorrect use of the language can result in security vulnerabilities that can be exploited.

The most common method of exploitation of C programs is memory manipulation. If an attacker can overwrite parts of memory, they can cause the program to deviate from its expected behaviour, which allows exploitation.

This exercise is designed to teach you about the dangers of writing vulnerable C, with the goal of providing a deeper understand of the language and how to avoid such vulnerabilities.

Buffers

A buffer is a piece of memory that has been allocated to store some contiguous data. In C these are called arrays (or strings (char[n])). While you allocate a specific amount of memory, say n = 10, C will not stop you from writing more than 10 values to the buffer. If we try to write more space than we have allocated, we could possibly overwrite other values that are stored nearby, resulting in undefined behaviour. This is the basis of a buffer overflow vulnerability.

Dangerous functions

[DEPRECATED] char *gets(char *s)

char *gets() reads a line from stdin into the buffer (another name for an array) pointed to by s until either a terminating newline or EOF, which it replaces with a null byte.

Have a think about why this function is extremely dangerous and continue reading.

When we run man 3 gets in our terminal, we are opening the manual page for the libc library function gets(). Reading this description, we see "Never use this function."

Now you might be wondering, if this function is so dangerous, why is it still able to be used?

  • Old language, naive, backwards compatibility
  • Compilers such as gcc have default behaviour to warn against using it, as we will see later

Spoiler?

gets() is so dangerous because it will read characters continuously from stdin, placing them in the buffer we have provided, regardless of the buffer size. This means an input that is larger than the buffer will simply write over the memory after the buffer, creating unexpected behaviour. This will often result in what is called a "Segmentation Fault".

Here is how gets is implemented in libc (comments have been added for clarity):

char *gets(char *buf) {
	int c;
	char *s;

	// Don't need to understand what this does, but run `man FLOCKFILE` to find out more if you are curious
	FLOCKFILE(stdin);

	// Loop continuously, getting the next char from stdin until a newline or EOF is encountered
	for (s = buf; (c = getchar_unlocked()) != '\n';)
		if (c == EOF)
			if (s == buf) {
				FUNLOCKFILE(stdin);
				return (NULL);
			} else
				break;
		else
			/*
			Writing the character to the pointed to location, and update the pointer
			to point to the next char sized piece of memory.
			You can find more about pointer arithmetic here:
			https://en.wikipedia.org/wiki/Pointer_(computer_programming)
			*/
			*s++ = c;
	// Replace the last character, EOF or newline, with a null byte.
	*s = '\0';
	FUNLOCKFILE(stdin);
	return (buf);
}

Even functions you are using frequently can be vulnerable when used incorrectly. A well written blog post regarding scanf can be found here

Example

Now we will examine what a buffer overflow vulnerability looks like.

Here we have some code that is intended to read in a series of characters to a buffer. The program never updates int x, but if for some reason x becomes non-zero, it will print out an alternative output.

#include 

int main (void) {
    char buf[10];
    int x = 0;

    printf("Enter your payload: ");

    gets(buf);

	if (x != 0) {
		printf("How did you get in here!?\n");
    	printf("x = %d\n", x);
	} else {
		printf("HAHAHA you will never pass my if condition!\n");
	}

    return 0;
}

Try compiling this code yourself with gcc -fno-stack-protector example.c -o example (Note: make sure you use gcc) and see what output you get.

We need to use the option -fno-stack-protector as the compiler will add protection after the stack that will cause the program to crash if we overflow the buffer. This is good for normal applications (you should never turn this option on unless you know exactly why), but in our case we want to demonstrate a simple vulnerability.

The compilation output should look something like this:

gets.c: In function ‘main’:
gets.c:13:5: warning: implicit declaration of function ‘gets’; did you mean ‘fgets’? [-Wimplicit-function-declaration]
   13 |     gets(name);
      |     ^~~~
      |     fgets
gets.c:(.text+0x51): warning: the `gets' function is dangerous and should not be used.

Try running the program and playing around it, seeing if you can get the alternative output to print.

If you aren't able to print the alternative output and/or want an explanation of what is happening, keep reading.

Spoiler

If you disobeyed the initial output and entered more than 10 characters, you would have been able to enter the if statement. Entering 11 As gives the following output:

Enter your payload: AAAAAAAAAAA
How did you get in here!?
x = 65

We can see that x = 65, which is the decimal representation of the 'A' ASCII character. This is an example of a buffer overflow that allowed us to overwrite memory we weren't supposed to, giving us access to branches of code that the developer didn't intend.

Below is a visualisation of what the memory looks like after entering 11 As.

Memory Diagram

Now, edit the supplied code to use a safe method of reading in input, such as reading N - 1 individual characters. Don't forget to append a null byte! Recompile your edited code using gcc -fno-stack-protector example.c and see if the same buffer overflow exists.

Challenge

There are four challenge files that ramp up in difficulty, so please don't worry if you can't complete them all.

There are no autotests provided for this challenge.

For all 4 files your challenge is to get the program to tell you that "You are the best student ever". You should also fix the vulnerability, such that your previous input that solved the challenge no longer works. To recompile the binary use gcc -fno-stack-protector <FILENAME>.c -o <FILENAME>. Additionally, you should try to compile the binary without the flag and see if you can solve the challenge with the same input.

Keep reading for some hints for the challenges:

If you found this interesting, I suggest looking in to cybersecurity. This is just a little taster of the other amazing things you can learn about!

Exercise
(●●◌)
:

Command Line Words

Write a program called command_line_words.c that takes in comand line arguments and prints out the total number of words that appear in them.

A word is defined as any collection of characters that do not contain a space. For example, "Today I Slept" contains 3 words by that definition.

However, the twist with this exercise is that we are going to input command line arguments in a special way such that a single argument can contain spaces.

So far, we have seen the use of command line arguments as such:

./program Here are my arguments

We know that argc in this case is 5. We can also visualise the layout of argv like so:

However, there is actually a way to group words together into a single argument! This can be done by surrounding these words in double quotes. If we adjust the above example to instead be:

./program Here "are my" arguments

Then argc will now be 4. We can also visualise the new layout of argv like so:

It is important to see here that the number of command line arguments changes, but the number of words stays the same (4, excluding ./program)!

Here are some examples for how your program should work (Note that we ignore ./command_line_words in all outputs):

dcc command_line_words.c -o command_line_words
./command_line_words Here "are my" arguments
There are 3 command line arguments (Excluding program)!
There were 4 total words!
./command_line_words "All words in one argument"
There are 1 command line arguments (Excluding program)!
There were 5 total words!
./command_line_words "Mixture of" "words" in "Command line arguments"
There are 4 command line arguments (Excluding program)!
There were 7 total words!
./command_line_words "Empty Arguments" " " " " "End"
There are 4 command line arguments (Excluding program)!
There were 3 total words!

Assumptions/Restrictions/Clarifications

  • You will only be given letters, quotes and spaces as input
  • Autotests will use single quotes to group arguments. There is no fundamental difference in this exercise between using single and double quotes
Sample solution for command_line_words.c
//
// Program to count to total number of words in the command line arguments.
// Separate arguments can contain multiple words!
//
// Written by Rory Golledge (z5308772) on 25-03-2022
//

#include <stdio.h>

#define FALSE 0
#define TRUE 1

int main(int argc, char *argv[]) {
    int total_words = 0;

    int arg_index = 1;

    // Loops for each argument
    while (arg_index < argc) {
        int letter_index = 0;

        int expecting_space = FALSE;
        // Loops for each character in the current argument
        while (argv[arg_index][letter_index] != '\0') {
            // Word is found if a space is found and a space is expected
            if (argv[arg_index][letter_index] == ' ' && expecting_space) {
                total_words++;
                expecting_space = FALSE;
            // When a non-space is found, we are in a word, hence we are
            // now expecting a space to complete the word
            } else if (argv[arg_index][letter_index] != ' ') {
                expecting_space = TRUE;
            }

            letter_index++;
        }

        // Standard case when there is no space at end of argument, as a word
        // still needs to be added
        if (expecting_space) {
            total_words++;
        }

        arg_index++;
    }

    // Print out final results
    printf(
        "There are %d command line arguments (Excluding program)!\n", argc - 1
    );
    printf("There were %d total words!\n", total_words);
}

Exercise
(●●●)
:

Valid C Brackets

Download valid_c_brackets.c here

Or, copy these file(s) to your CSE account using the following command:

1511 fetch-activity valid_c_brackets

Your task is to add code to these functions in valid_c_brackets.c:

// Given a string containing the contents of a C file, print out whether it has
// correct matching brackets. If it does not, print out which line that didn't
// have a correct matching bracket.
void valid_c_brackets(char *file_contents) {
    // TODO: COMPLETE THIS FUNCTION AND REMOVE THE PRINTF BELOW
    printf("valid_c_brackets() has not been implemented yet.\n");
}

In this program you will provide the name of a C file in the command line arguments and the program will print whether it has valid matching brackets.

When we compile our code, the compiler (dcc in our case) will check if your code is correct before it does so and prints errors if it cannot compile. One thing a compiler will check for is if all your brackets match properly.

This can get quite complex when you have brackets nested in other brackets (such as putting a while loop inside an if statement where we print out an array)

In this program, we have handled all file input and you have the write the provided function to test bracket matching. In the function, you will be provided the file contents as a string so that you do not need to worry about the file aspect

Some example files are provided for you below which you can download to use. Make sure they are in the same directory as valid_c_brackets.c when you are testing

Example files

Download basic_valid.c here, or copy it to your CSE account using the following command:

cp -n /import/adams/A/cs1511/public_html/24T3/activities/valid_c_brackets/files/basic_valid.c .

Download basic_invalid.c here, or copy it to your CSE account using the following command:

cp -n /import/adams/A/cs1511/public_html/24T3/activities/valid_c_brackets/files/basic_invalid.c .

Download medium_valid.c here, or copy it to your CSE account using the following command:

cp -n /import/adams/A/cs1511/public_html/24T3/activities/valid_c_brackets/files/medium_valid.c .

Download medium_invalid.c here, or copy it to your CSE account using the following command:

cp -n /import/adams/A/cs1511/public_html/24T3/activities/valid_c_brackets/files/medium_invalid.c .

Download complex_valid.c here, or copy it to your CSE account using the following command:

cp -n /import/adams/A/cs1511/public_html/24T3/activities/valid_c_brackets/files/complex_valid.c .

Download complex_invalid.c here, or copy it to your CSE account using the following command:

cp -n /import/adams/A/cs1511/public_html/24T3/activities/valid_c_brackets/files/complex_invalid.c .

Examples

./valid_c_brackets basic_valid.c
File has valid matching brackets!
./valid_c_brackets basic_invalid.c
Non-matching bracket found on line 5. Was expecting a ')' but got a '}'
./valid_c_brackets medium_valid.c
File has valid matching brackets!
./valid_c_brackets medium_invalid.c
There was a missing '}' bracket in this program

There are essentially 3 cases here:

  • File is valid - Print as such
  • Non-matching brackets - When searching for a match to an opening bracket, a non-matching closing bracket was found first
  • Not enough brackets - The end of the file was reached and the last seen, non-matched opening bracket was never matched

The key idea with this exercise is that you need to consider the most recent opening bracket and try to match it before matching other un-matched opening brackets before it.

Assumptions/Clarifications/Restrictions

  • How could you use a stack to model this problem?
  • You can assume that the only bracket pairs you need to match are:
    • ()
    • {}
    • []
  • You can assume these brackets will only appear in actual code, meaning they can't appear in comments or strings (such as printing the bracket)
  • You will need to keep track of the current line. This can simply be done by looking for each new line character in the given string
  • As a note from the above, there can still appear new line characters in the file such as in printfs, but this will appear as 2 characters in the string so you do not need to worry about it (since actual new lines are 1 character)
Sample solution for valid_c_brackets.c
// Program to determine if a given C file has valid brackets in it.
// Written by Rory Golledge (z5308772) on 17-04-2022

#include <stdio.h>
#include <stdlib.h>

#define RETURN_INVALID_ARGUMENTS 1
#define RETURN_FILE_NOT_FOUND 2
#define RETURN_NOT_ENOUGH_MEMORY 3

struct node {
    char data;
    struct node *next;
};

struct stack {
    int length;
    struct node *nodes;
};

void valid_c_brackets(char *file_contents);

////////////////////////////////////////////////////////////////////////////////
/////// DO NOT CHANGE THIS MAIN FUNCTION. YOU DO NOT NEED TO UNDERSTAND ////////
///////                         WHAT IT DOES                            ////////
////////////////////////////////////////////////////////////////////////////////
int main(int argc, char *argv[]) {
    // Must provide a single argument specifying file
    if (argc != 2) {
        fprintf(stderr, "Usage: ./valid_c_brackets <file.c>\n");
        return RETURN_INVALID_ARGUMENTS;
    }

    // Open the file for reading
    FILE *file = fopen(argv[1], "rb");

    // File must exist to run the program
    if (file == NULL) {
        fprintf(stderr, "File %s not found\n", argv[1]);
        return RETURN_FILE_NOT_FOUND;
    }

    // Gets the length of the file
    fseek(file, 0, SEEK_END);
    long int file_length = ftell(file);

    // Allocate a string large enough to hold the file
    char *file_contents = malloc(sizeof(char) * (file_length + 1));

    if (file_contents == NULL) {
        fprintf(stderr, "There was not enough memory to store the file.");
        return RETURN_NOT_ENOUGH_MEMORY;
    }

    // Rewind back to the start of the file
    fseek(file, 0, SEEK_SET);

    // Read file into `file_contents` array
    fread(file_contents, 1, file_length, file);

    file_contents[file_length] = '\0';

    valid_c_brackets(file_contents);    

    fclose(file);
}

// Malloc a new stack and return a pointer to it
struct stack *create_stack() {
    struct stack *stack = malloc(sizeof(struct stack));
    stack->length = 0;
    stack->nodes = NULL;

    return stack;
}

// Mallocs a new node given the provided `data` and return a pointer to it
struct node *create_node(char data) {
    struct node *node = malloc(sizeof(struct node));
    node->data = data;
    node->next = NULL;

    return node;
}

// Given a `stack`, push a new node onto it with the given `data`
void push(struct stack *stack, char data) {
    struct node *new = create_node(data);
    new->next = stack->nodes;
    stack->nodes = new;

    stack->length++;
}

// Given a `stack`, pop the top of that stack and return the data inside of it
char pop(struct stack *stack) {
    if (stack->length == 0) {
        return '\0';
    }

    struct node *popped = stack->nodes;
    stack->nodes = stack->nodes->next;
    stack->length--;

    char return_data = popped->data;
    free(popped);

    return return_data;
}

// Returns whether the given `character` is an opening bracket or not
int is_opening_bracket(char character) {
    return character == '(' || character == '[' || character == '{';
}

// Returns whether the given `character` is an closing bracket or not
int is_closing_bracket(char character) {
    return character == ')' || character == ']' || character == '}';
}

// Given an opening bracket, returns the corresponding closing bracket
char corresponding_closing_bracket(char opening) {
    if (opening == '(') {
        return ')';
    }
    if (opening == '[') {
        return ']';
    }
    if (opening == '{') {
        return '}';
    }

    return '\0';
}

// Given a string containing the contents of a C file, print out whether it has
// correct matching brackets. If it does not, print out which line that didn't
// have a correct matching bracket.
void valid_c_brackets(char *file_contents) {
    struct stack *bracket_stack = create_stack();

    int line_number = 1;
    int valid_file = 1;

    // Data used in case an invalid bracket match it found
    char expected = '\0';
    char found = '\0';

    int i = 0;
    while (file_contents[i] != '\0' && valid_file) {
        if (is_opening_bracket(file_contents[i])) {
            push(bracket_stack, file_contents[i]);
        } else if (is_closing_bracket(file_contents[i])) {
            char bracket = pop(bracket_stack);
            char expected_closing = corresponding_closing_bracket(bracket);
            if (expected_closing != file_contents[i]) {
                valid_file = 0;
                expected = expected_closing;
                found = file_contents[i];
            }
        } else if (file_contents[i] == '\n') {
            line_number++;
        }
        i++;
    }

    // If there are still some brackets left to be matched, file is invalid.
    if (bracket_stack->length != 0) {
        valid_file = 0;
    }

    // Print outcome
    if (expected != '\0') {
        printf(
            "Non-matching bracket found on line %d. Was expecting a "
            "'%c' but got a '%c'\n", line_number, expected, found
        );
    } else if (!valid_file) {
        printf(
            "There was a missing '%c' bracket in this program\n",
            corresponding_closing_bracket(pop(bracket_stack))
        );
    } else {
        printf("File has valid matching brackets!\n");
    }
}