Week 08 Laboratory Exercises

Objectives

  • learning how to access files
  • learning how to work with binary data
  • learning how to use fseek

Preparation

Before the lab you should re-read the relevant lecture slides and their accompanying examples.

Getting Started

Set up for the lab by creating a new directory called lab08 and changing to this directory.
mkdir lab08
cd lab08

There are some provided files for this lab which you can fetch with this command:

1092 fetch lab08

If you're not working at CSE, you can download the provided files as a zip file or a tar file.

Exercise — individual:
Create a File of Integers

Write a C program, create_integers_file, which takes 3 arguments:

  1. a filename,
  2. the beginning of a range of integers, and
  3. the end of a range of integers;

and which creates a file of this name containing the specified integers. You can assume the range will always be strictly ascending. For example:

./create_integers_file fortytwo.txt 40 42
cat fortytwo.txt
40
41
42
./create_integers_file a.txt 1 5
cat a.txt
1
2
3
4
5
./create_integers_file 1000.txt 1 1000
wc 1000.txt
1000 1000 3893 1000.txt

Your program should print a suitable error message if given the wrong number of arguments, or if the file can not be created.

When you think your program is working, you can use autotest to run some simple automated tests:

1092 autotest create_integers_file 

When you are finished working on this exercise, you must submit your work by running give:

give dp1092 lab08_create_integers_file create_integers_file.c

You must run give before Monday 01 January 00:00 (midnight) (Saturday 01 January 00:00) to obtain the marks for this lab exercise. Note that this is an individual exercise, the work you submit with give must be entirely your own.

Exercise — individual:
Print the Bytes of A File

Write a C program, print_bytes, which takes one argument, a filename, and which should should read the specifed file and print one line for each byte of the file. The line should show the byte in decimal and hexadecimal. If that byte is a an ASCII printable character, its ASCII value should also be printed.

Assume ASCII printable characters are those for which the ctype.h function isprint(3) returns a non-zero value.

Follow the format in this example exactly:

echo "Hello Andrew!" >hello.txt
./print_bytes hello.txt
byte    0:  72 0x48 'H'
byte    1: 101 0x65 'e'
byte    2: 108 0x6c 'l'
byte    3: 108 0x6c 'l'
byte    4: 111 0x6f 'o'
byte    5:  32 0x20 ' '
byte    6:  65 0x41 'A'
byte    7: 110 0x6e 'n'
byte    8: 100 0x64 'd'
byte    9: 114 0x72 'r'
byte   10: 101 0x65 'e'
byte   11: 119 0x77 'w'
byte   12:  33 0x21 '!'
byte   13:  10 0x0a

When you think your program is working, you can use autotest to run some simple automated tests:

1092 autotest print_bytes 

When you are finished working on this exercise, you must submit your work by running give:

give dp1092 lab08_print_bytes print_bytes.c

You must run give before Monday 01 January 00:00 (midnight) (Saturday 01 January 00:00) to obtain the marks for this lab exercise. Note that this is an individual exercise, the work you submit with give must be entirely your own.

Exercise — individual:
Create a Binary File

Write a C program, create_binary_file, which takes at least one argument: a filename, and subsequently, integers in the range 0…255 inclusive specifying byte values. It should create a file of the specified name, containing the specified bytes. For example:

./create_binary_file hello.txt 72 101 108 108 111 33 10
cat hello.txt
Hello!
./create_binary_file count.binary 1 2 3 251 252 253 254 255
./print_bytes count.binary
byte    0:   1 0x01
byte    1:   2 0x02
byte    2:   3 0x03
byte    3: 251 0xfb
byte    4: 252 0xfc
byte    5: 253 0xfd
byte    6: 254 0xfe
byte    7: 255 0xff
./create_binary_file 4_bytes.binary 222 173 190 239
./print_bytes 4_bytes.binary
byte    0: 222 0xde
byte    1: 173 0xad
byte    2: 190 0xbe
byte    3: 239 0xef

Your program should print a suitable error message if given the wrong number of arguments, or if the file can not be created.

When you think your program is working, you can use autotest to run some simple automated tests:

1092 autotest create_binary_file 

When you are finished working on this exercise, you must submit your work by running give:

give dp1092 lab08_create_binary_file create_binary_file.c

You must run give before Monday 01 January 00:00 (midnight) (Saturday 01 January 00:00) to obtain the marks for this lab exercise. Note that this is an individual exercise, the work you submit with give must be entirely your own.

Exercise — individual:
Create Borts File

Write a C program, create_borts_file, which takes 3 arguments:

  1. a filename,
  2. the beginning of a range of integers, and
  3. the end of a range of integers;

and which creates a file of this name containing the borts in the range provided. You can assume that the range will always be strictly ascending.

A bort is an unsigned two-byte big-endian integer (bort is a contraction of big-endian short).

The possible bort values are 0..65535, so each bort can be represented as a uint16_t, or unsigned short. As borts are big-endian, they need to be written to the output file with their most significant byte first, followed by the least significant byte.

This means for a number such as 0x1234, the first byte to be written should be 0x12, as it is the most significant byte, followed by 0x34, the least-significant.

As a borts file is not a text file, we cannot use cat to inspect its contents. We can use the print_bytes program from the previous lab exercises to print files containing borts.

For example:

dcc -o create_borts_file create_borts_file.c
./create_borts_file fortytwo.bort 40 42
ls -l fortytwo.bort
-rw-r--r-- 1 andrewt andrewt 6 Nov  1 15:21 fortytwo.bort
./print_bytes fortytwo.bort 
byte    0:   0 0x00
byte    1:  40 0x28 '('
byte    2:   0 0x00
byte    3:  41 0x29 ')'
byte    4:   0 0x00
byte    5:  42 0x2a '*'
The linux utilities, xxd and od are also good ways to print a binary file, as hexadecimal and octal respectively. Note that the zero byte is printed first for each number here, as it is the most significant byte in each bort.
xxd fortytwo.bort 
00000000: 0028 0029 002a                           .(.).*
od fortytwo.bort 
0000000 024000 024400 025000
0000006
Another example, creating a file containing the five biggest borts possible:
./create_borts_file biggest.bort 65530 65535
ls -l biggest.bort
-rw-r--r-- 1 andrewt andrewt 12 Nov  1 15:26 biggest.bort
./print_bytes biggest.bort
byte    0: 255 0xff
byte    1: 250 0xfa
byte    2: 255 0xff
byte    3: 251 0xfb
byte    4: 255 0xff
byte    5: 252 0xfc
byte    6: 255 0xff
byte    7: 253 0xfd
byte    8: 255 0xff
byte    9: 254 0xfe
byte   10: 255 0xff
byte   11: 255 0xff
We can give od command-line options to decode borts, for example:
od --endian=big -t u2 -A d -w2  biggest.bort
0000000 65530
0000002 65531
0000004 65532
0000006 65533
0000008 65534
0000010 65535
0000012

Your program should print a suitable error message if given the wrong number of arguments, or if the file can not be created.

When you think your program is working, you can use autotest to run some simple automated tests:

1092 autotest create_borts_file 

When you are finished working on this exercise, you must submit your work by running give:

give dp1092 lab08_create_borts_file create_borts_file.c

You must run give before Monday 01 January 00:00 (midnight) (Saturday 01 January 00:00) to obtain the marks for this lab exercise. Note that this is an individual exercise, the work you submit with give must be entirely your own.

Challenge Exercise — individual:
Extract ASCII from a Binary File

We are distributing programs as binaries, and would like to know what if the C compiler is leaving any confidential information in the binaries as ASCII strings.

Only 95 of 256 byte values correspond to printable ASCII characters, so several byte values in a row corresponding to printable characters probably will occur infrequently in non-ASCII data. There is only a 2% chance that four (independent, uniform) random byte values will correspond to ASCII printable characters.

Write a C program, hidden_strings, which takes one argument, a filename; it should read that file, and print all sequences of length 4 or longer of consecutive byte values corresponding to printable ASCII characters. In other words, your program should read through the bytes of the file, and if it finds 4 bytes in a row containing printable characters, it should print those bytes, and any following bytes containing ASCII printable characters.

Print each sequence on a separate line.

Assume ASCII printable characters are those for which the ctype.h function isprint(3) returns a non-zero value.

Do not read the entire file into an array.

Use the create_binary_file program from the previous exercise to create simple test data. For example:

dcc hidden_strings.c -o hidden_strings
./create_binary_file test_file 72 101 108 108 111 255 255 65 110 100 114 101 119
./hidden_strings test_file
Hello
Andrew

When you think your program is working, try extracting strings from a compiled binary. For example:

cat secret.c
#define secret_hash_define 1

// secret comment

int secret_global_variable;

int main(void) {
    int secret_local_variable;
    char *s = "secret string";
}

int secret_function_name() {
}
gcc secret.c -o binary1
gcc secret.c -g -o binary2
gcc secret.c -s -o binary3
./hidden_strings binary1
/lib64/ld-linux-x86-64.so.2
libc.so.6
__cxa_finalize
__libc_start_main
GLIBC_2.2.5
...
./hidden_strings binary1|grep secret
secret string
secret.c
secret_function_name
secret_global_variable
./hidden_strings binary2|grep secret
secret string
secret.c
secret.c
secret_global_variable
secret_function_name
secret_local_variable
secret.c
secret_function_name
secret_global_variable
./hidden_strings binary3|grep secret
secret string

The above example shows that, by default, gcc(1) leaves function names, global variables names and the filename in the binary.

If you specify the -g command line option, variable names are also left in the binary. This is part of information left for debuggers such as gdb(1) (which dcc uses). This information allows debuggers to print the current value of variables.

If you specify the -s command line option, all names are stripped from the binary but the string remains.

When you think your program is working, you can use autotest to run some simple automated tests:

1092 autotest hidden_strings 

When you are finished working on this exercise, you must submit your work by running give:

give dp1092 lab08_hidden_strings hidden_strings.c

You must run give before Monday 01 January 00:00 (midnight) (Saturday 01 January 00:00) to obtain the marks for this lab exercise. Note that this is an individual exercise, the work you submit with give must be entirely your own.

Challenge Exercise — individual:
Print The Last line of Huge Files

Write a C program, last_line, which takes one argument, a filename, and which should print the last line of that file. For example:

dcc last_line.c -o last_line
echo -e 'hello\ngood bye' >hello_goodbye.txt
cat hello_goodbye.txt
hello
good bye
./last_line hello_goodbye.txt
good bye

You program should not assume the last byte of the file is a newline character.

echo -n -e 'hello\ngoodbye' >no_last_newline.txt
hello
goodbye./last_line no_last_newline.txt
goodbye

Your program should handle extremely large files. It should not read the entire file. As this is a challenge exercise, marks will not be awarded for programs which read the entire file.

For example, it should be able to print the last line of a one-terabyte file:

echo -e 'Hello\nGood Bye'|dd status=none seek=1T bs=1 of=/tmp/gigantic_file$$
ls -l /tmp/gigantic_file$$
-rw-r--r-- 1 z5555555 z5555555 1099511627791 Oct 26 17:27 gigantic_file12345
./last_line /tmp/gigantic_file$$
Good Bye

The gigantic file created above is a sparse file, consisting almost entirely of zero bytes: it uses little actual disk space, but, to be safe, remove it when you finish the exercise.

The commands above create the sparse file in /tmp to avoid it accidentally being backed up or otherwise copied.

Sparse files can create problems if they are accidentally copied by a program which doesn't handle them specially — and most programs don't.

BTW the $$ in the above command is replaced by the shell process id. This is because /tmp is shared so we'd like to use a filename that is (more or less) unique.

When you think your program is working, you can use autotest to run some simple automated tests:

1092 autotest last_line 

When you are finished working on this exercise, you must submit your work by running give:

give dp1092 lab08_last_line last_line.c

You must run give before Monday 01 January 00:00 (midnight) (Saturday 01 January 00:00) to obtain the marks for this lab exercise. Note that this is an individual exercise, the work you submit with give must be entirely your own.

Submission

When you are finished each exercises make sure you submit your work by running give.

You can run give multiple times. Only your last submission will be marked.

Don't submit any exercises you haven't attempted.

If you are working at home, you may find it more convenient to upload your work via give's web interface.

Remember you have until Saturday 01 January 00:00 to submit your work.

You cannot obtain marks by e-mailing your code to tutors or lecturers.

You check the files you have submitted here.

Automarking will be run by the lecturer several days after the submission deadline, using test cases different to those autotest runs for you. (Hint: do your own testing as well as running autotest.)

After automarking is run by the lecturer you can view your results here. The resulting mark will also be available via give's web interface.

Lab Marks

When all components of a lab are automarked you should be able to view the the marks via give's web interface or by running this command on a CSE machine:

1092 classrun -sturec