Week 08 Laboratory Exercises
Objectives
- learning how to access files
- learning how to work with binary data
- learning how to use fseek
Preparation
Before the lab you should re-read the relevant lecture slides and their accompanying examples.
Getting Started
lab08
and changing to this directory.
mkdir lab08 cd lab08
There are some provided files for this lab which you can fetch with this command:
1092 fetch lab08
If you're not working at CSE, you can download the provided files as a zip file or a tar file.
Exercise — individual:
Create a File of Integers
Write a C program, create_integers_file
,
which takes 3 arguments:
- a filename,
- the beginning of a range of integers, and
- the end of a range of integers;
and which creates a file of this name containing the specified integers. You can assume the range will always be strictly ascending. For example:
./create_integers_file fortytwo.txt 40 42 cat fortytwo.txt 40 41 42 ./create_integers_file a.txt 1 5 cat a.txt 1 2 3 4 5 ./create_integers_file 1000.txt 1 1000 wc 1000.txt 1000 1000 3893 1000.txt
Your program should print a suitable error message if given the wrong number of arguments, or if the file can not be created.
When you think your program is working,
you can use autotest
to run some simple automated tests:
1092 autotest create_integers_file
When you are finished working on this exercise,
you must
submit your work by running give
:
give dp1092 lab08_create_integers_file create_integers_file.c
You must run give
before Tuesday 22 October 09:00 (2024-10-22 09:00:00)
to obtain the marks for this lab exercise.
Note that this is an individual exercise,
the work you submit with give
must be entirely your own.
Exercise — individual:
Print the Bytes of A File
Write a C program, print_bytes
,
which takes one argument, a filename,
and which should should read the specifed file
and print one line for each byte of the file.
The line should show the byte in decimal and hexadecimal.
If that byte is a an ASCII printable character,
its ASCII value should also be printed.
Assume ASCII printable characters are those for which
the ctype.h
function isprint(3) returns
a non-zero value.
Follow the format in this example exactly:
echo "Hello Andrew!" >hello.txt ./print_bytes hello.txt byte 0: 72 0x48 'H' byte 1: 101 0x65 'e' byte 2: 108 0x6c 'l' byte 3: 108 0x6c 'l' byte 4: 111 0x6f 'o' byte 5: 32 0x20 ' ' byte 6: 65 0x41 'A' byte 7: 110 0x6e 'n' byte 8: 100 0x64 'd' byte 9: 114 0x72 'r' byte 10: 101 0x65 'e' byte 11: 119 0x77 'w' byte 12: 33 0x21 '!' byte 13: 10 0x0a
When you think your program is working,
you can use autotest
to run some simple automated tests:
1092 autotest print_bytes
When you are finished working on this exercise,
you must
submit your work by running give
:
give dp1092 lab08_print_bytes print_bytes.c
You must run give
before Tuesday 22 October 09:00 (2024-10-22 09:00:00)
to obtain the marks for this lab exercise.
Note that this is an individual exercise,
the work you submit with give
must be entirely your own.
Exercise — individual:
Create a Binary File
Write a C program, create_binary_file
,
which takes at least one argument: a filename,
and subsequently, integers in the range 0…255 inclusive
specifying byte values.
It should create a file of the specified name,
containing the specified bytes.
For example:
./create_binary_file hello.txt 72 101 108 108 111 33 10 cat hello.txt Hello! ./create_binary_file count.binary 1 2 3 251 252 253 254 255 ./print_bytes count.binary byte 0: 1 0x01 byte 1: 2 0x02 byte 2: 3 0x03 byte 3: 251 0xfb byte 4: 252 0xfc byte 5: 253 0xfd byte 6: 254 0xfe byte 7: 255 0xff ./create_binary_file 4_bytes.binary 222 173 190 239 ./print_bytes 4_bytes.binary byte 0: 222 0xde byte 1: 173 0xad byte 2: 190 0xbe byte 3: 239 0xef
Your program should print a suitable error message if given the wrong number of arguments, or if the file can not be created.
When you think your program is working,
you can use autotest
to run some simple automated tests:
1092 autotest create_binary_file
When you are finished working on this exercise,
you must
submit your work by running give
:
give dp1092 lab08_create_binary_file create_binary_file.c
You must run give
before Tuesday 22 October 09:00 (2024-10-22 09:00:00)
to obtain the marks for this lab exercise.
Note that this is an individual exercise,
the work you submit with give
must be entirely your own.
Exercise — individual:
Create Borts File
Write a C program, create_borts_file
,
which takes 3 arguments:
- a filename,
- the beginning of a range of integers, and
- the end of a range of integers;
and which creates a file of this name containing the borts in the range provided. You can assume that the range will always be strictly ascending.
A bort is an unsigned two-byte big-endian integer (bort is a contraction of big-endian short).
The possible bort values are 0..65535, so each bort can be represented as a uint16_t
, or
unsigned short
. As borts are big-endian, they need to be written to the output
file with their most significant byte first, followed by the least significant byte.
This means for a number such as 0x1234
, the first byte to be written should be 0x12
, as it
is the most significant byte, followed by 0x34
, the least-significant.
As a borts file is not a text file, we cannot use cat
to inspect its contents.
We can use the print_bytes
program from the previous lab exercises
to print files containing borts.
For example:
dcc -o create_borts_file create_borts_file.c ./create_borts_file fortytwo.bort 40 42 ls -l fortytwo.bort -rw-r--r-- 1 andrewt andrewt 6 Nov 1 15:21 fortytwo.bort ./print_bytes fortytwo.bort byte 0: 0 0x00 byte 1: 40 0x28 '(' byte 2: 0 0x00 byte 3: 41 0x29 ')' byte 4: 0 0x00 byte 5: 42 0x2a '*'The linux utilities,
xxd
and od
are also good ways to print a binary file, as hexadecimal and
octal respectively. Note that the zero byte is printed first for each number here, as it is the most significant
byte in each bort.
xxd fortytwo.bort 00000000: 0028 0029 002a .(.).* od fortytwo.bort 0000000 024000 024400 025000 0000006Another example, creating a file containing the five biggest borts possible:
./create_borts_file biggest.bort 65530 65535 ls -l biggest.bort -rw-r--r-- 1 andrewt andrewt 12 Nov 1 15:26 biggest.bort ./print_bytes biggest.bort byte 0: 255 0xff byte 1: 250 0xfa byte 2: 255 0xff byte 3: 251 0xfb byte 4: 255 0xff byte 5: 252 0xfc byte 6: 255 0xff byte 7: 253 0xfd byte 8: 255 0xff byte 9: 254 0xfe byte 10: 255 0xff byte 11: 255 0xffWe can give
od
command-line options to decode borts, for example:
od --endian=big -t u2 -A d -w2 biggest.bort 0000000 65530 0000002 65531 0000004 65532 0000006 65533 0000008 65534 0000010 65535 0000012
Your program should print a suitable error message if given the wrong number of arguments, or if the file can not be created.
When you think your program is working,
you can use autotest
to run some simple automated tests:
1092 autotest create_borts_file
When you are finished working on this exercise,
you must
submit your work by running give
:
give dp1092 lab08_create_borts_file create_borts_file.c
You must run give
before Tuesday 22 October 09:00 (2024-10-22 09:00:00)
to obtain the marks for this lab exercise.
Note that this is an individual exercise,
the work you submit with give
must be entirely your own.
Exercise — individual:
Read a file of little-endian integers
Write a C program, read_lit_file
,
which takes one argument: a filename,
specifying the path to a file using the LIT
file format.
A LIT
, or Little-endian Integer file is
a format for storing a sequence of integers in a file which we have defined.
The format for a LIT
file starts off with a header. The header
format is as follows:
name | length | type | description |
---|---|---|---|
magic number | 3 B | characters sequence | The magic number for LIT files, which is the sequence of bytes
0x4C, 0x49, 0x54 (ASCII LIT ).
|
After the header, the file contains a sequence of records. Each record uses the following format:
name | length | type | description |
---|---|---|---|
number of bytes | 1 B | ASCII character | One of the characters '1' , '2' , '3' ,
'4' , '5' , '6' , '7' or
'8' , indicating the number of bytes forming the integer
stored in this record.
|
value | num-bytes | little-endian integer | The integer value stored in this record, using the specified number of bytes. |
Your program should read the file specified by the argument, and print out the integers stored in the file, one per line, in the order they appear in the file.
Your program should print a suitable error message to stderr(3) and then exit(3) with status 1
if:
- the user does not provide exactly one argument
- the file does not exist
- the magic number is not correct, or the file is not long enough to contain a complete header
- the file contains a record with an invalid number of bytes
- the file ends before the end of a record
unzip examples.zip ... extracting ... dcc -o read_lit_file read_lit_file.c xxd examples/one_record.lit 00000000: 4c49 5433 5432 32 LIT3T22 ./read_lit_file examples/one_record.lit 3289684 ./read_lit_file examples/assortment.lit 123 250 4242 4242 424242424242424242 18374686479671623680 71776119061217280 71776119061217280 49 12345678987654321 4242424242424242424 ./read_lit_file examples/completely_empty.lit Failed to read magic ./read_lit_file examples/too_short_1.lit 42 Failed to read record ./read_lit_file examples/invalid_num_bytes_1.lit 42 Invalid record length
When you think your program is working,
you can use autotest
to run some simple automated tests:
1092 autotest read_lit_file
When you are finished working on this exercise,
you must
submit your work by running give
:
give dp1092 lab08_read_lit_file read_lit_file.c
You must run give
before Tuesday 22 October 09:00 (2024-10-22 09:00:00)
to obtain the marks for this lab exercise.
Note that this is an individual exercise,
the work you submit with give
must be entirely your own.
Challenge Exercise — individual:
Extract ASCII from a Binary File
We are distributing programs as binaries, and would like to know what if the C compiler is leaving any confidential information in the binaries as ASCII strings.
Only 95 of 256 byte values correspond to printable ASCII characters, so several byte values in a row corresponding to printable characters probably will occur infrequently in non-ASCII data. There is only a 2% chance that four (independent, uniform) random byte values will correspond to ASCII printable characters.
Write a C program, hidden_strings
,
which takes one argument, a filename;
it should read that file,
and print all sequences of length 4 or longer
of consecutive byte values
corresponding to printable ASCII characters.
In other words,
your program should read through the bytes of the file,
and if it finds 4 bytes in a row containing printable characters,
it should print those bytes,
and any following bytes containing ASCII printable characters.
Print each sequence on a separate line.
Assume ASCII printable characters are those for which
the ctype.h
function isprint(3)
returns a non-zero value.
Do not read the entire file into an array.
Use the create_binary_file
program from the previous exercise
to create simple test data. For example:
dcc hidden_strings.c -o hidden_strings ./create_binary_file test_file 72 101 108 108 111 255 255 65 110 100 114 101 119 ./hidden_strings test_file Hello Andrew
When you think your program is working, try extracting strings from a compiled binary. For example:
cat secret.c #define secret_hash_define 1 // secret comment int secret_global_variable; int main(void) { int secret_local_variable; char *s = "secret string"; } int secret_function_name() { } gcc secret.c -o binary1 gcc secret.c -g -o binary2 gcc secret.c -s -o binary3 ./hidden_strings binary1 /lib64/ld-linux-x86-64.so.2 libc.so.6 __cxa_finalize __libc_start_main GLIBC_2.2.5 ... ./hidden_strings binary1|grep secret secret string secret.c secret_function_name secret_global_variable ./hidden_strings binary2|grep secret secret string secret.c secret.c secret_global_variable secret_function_name secret_local_variable secret.c secret_function_name secret_global_variable ./hidden_strings binary3|grep secret secret string
The above example shows that, by default, gcc(1) leaves function names, global variables names and the filename in the binary.
If you specify the -g
command line option,
variable names are also left in the binary.
This is part of information left for debuggers
such as gdb(1) (which dcc uses).
This information allows debuggers to print the current value of variables.
If you specify the -s
command line option,
all names are stripped from the binary but the string remains.
When you think your program is working,
you can use autotest
to run some simple automated tests:
1092 autotest hidden_strings
When you are finished working on this exercise,
you must
submit your work by running give
:
give dp1092 lab08_hidden_strings hidden_strings.c
You must run give
before Tuesday 22 October 09:00 (2024-10-22 09:00:00)
to obtain the marks for this lab exercise.
Note that this is an individual exercise,
the work you submit with give
must be entirely your own.
Challenge Exercise — individual:
Print The Last line of Huge Files
Write a C program, last_line
,
which takes one argument, a filename,
and which should print the last line of that file.
For example:
dcc last_line.c -o last_line echo -e 'hello\ngood bye' >hello_goodbye.txt cat hello_goodbye.txt hello good bye ./last_line hello_goodbye.txt good bye
You program should not assume the last byte of the file is a newline character.
echo -n -e 'hello\ngoodbye' >no_last_newline.txt hello goodbye./last_line no_last_newline.txt goodbye
Your program should handle extremely large files. It should not read the entire file. As this is a challenge exercise, marks will not be awarded for programs which read the entire file.
For example, it should be able to print the last line of a one-terabyte file:
echo -e 'Hello\nGood Bye'|dd status=none seek=1T bs=1 of=/tmp/gigantic_file$$ ls -l /tmp/gigantic_file$$ -rw-r--r-- 1 z5555555 z5555555 1099511627791 Oct 26 17:27 gigantic_file12345 ./last_line /tmp/gigantic_file$$ Good Bye
The gigantic file created above is a sparse file, consisting almost entirely of zero bytes: it uses little actual disk space, but, to be safe, remove it when you finish the exercise.
The commands above create the sparse file in /tmp to avoid it accidentally being backed up or otherwise copied.
Sparse files can create problems if they are accidentally copied by a program which doesn't handle them specially — and most programs don't.
BTW the $$ in the above command is replaced by the shell process id. This is because /tmp is shared so we'd like to use a filename that is (more or less) unique.
When you think your program is working,
you can use autotest
to run some simple automated tests:
1092 autotest last_line
When you are finished working on this exercise,
you must
submit your work by running give
:
give dp1092 lab08_last_line last_line.c
You must run give
before Tuesday 22 October 09:00 (2024-10-22 09:00:00)
to obtain the marks for this lab exercise.
Note that this is an individual exercise,
the work you submit with give
must be entirely your own.
Submission
give
.
You can run give
multiple times.
Only your last submission will be marked.
Don't submit any exercises you haven't attempted.
If you are working at home, you may find it more convenient to upload your work via give's web interface.
Remember you have until Week 9 Tuesday 09:00:00 to submit your work.
You cannot obtain marks by e-mailing your code to tutors or lecturers.
You check the files you have submitted here.
Automarking will be run by the lecturer several days after the submission deadline,
using test cases different to those autotest
runs for you.
(Hint: do your own testing as well as running autotest
.)
After automarking is run by the lecturer you can view your results here. The resulting mark will also be available via give's web interface.
Lab Marks
When all components of a lab are automarked you should be able to view the the marks via give's web interface or by running this command on a CSE machine:
1092 classrun -sturec