Assignment 2: a simple MIPS emulator

version: 1.0 last updated: 2024-10-25 18:00:00

Aims

  • Understanding encoding and semantics of MIPS instructions
  • Practising file operations in C
  • Practising C, including bit operations
  • Understanding UNIX file system syscalls

Assignment Overview

In COMP1521 you have been using mipsy to run your MIPS programs. mipsy is a MIPS emulator written in Rust. In this assignment your task is to write IMPS, a simple MIPS emulator, written in C.

Don't panic! You only need to implement a small subset of MIPS instructions.

Getting Started

Create a new directory for this assignment, change to this directory, and fetch the provided code by running

mkdir -m 700 imps
cd imps
1521 fetch imps

If you're not working at CSE, you can download the provided files as a zip file or a tar file.

This will give you the following files:

imps.c
is where you will put your code to implement IMPS.
imps.mk
contains a Makefile fragment for IMPS.
Makefile
contains a Makefile for compiling your code.
examples
is a directory containing example assembly programs you can use to test IMPS.
test_subset_1.c
contains code that is run during subset 1 autotests. You do not need to undertsand or modify it.

You can run make(1) to compile the provided code, and you should be able to run the result.

make
dcc   imps.c -o imps
./imps
Usage: imps [-t] <executable>

You may optionally create extra .c or .h files. You can modify the provided Makefile fragment if you choose to do so.

Subsets

The assignment is split into four subsets. In subset 1 you will implement a function that reads in an IMPS program from a file. In subsets 2 and 3 you will implement various instructions and syscalls. In subset 4 you will add a tracing mode to IMPS, as well as implement some advanced syscalls.

Subset 1

IMPS programs are stored in IMPS executable files, which contain the instructions in the program, where the program starts, the initial state of the data segment, and other information.

In subset 1 your task is to implement the function read_imps_file in imps.c, which will read an IMPS executable file into a struct imps_file. This function is provided two arguments:

  • char *path is the path to an IMPS executable.
  • struct imps_file *executable is a pointer to the struct imps_file where you will put the information from the IMPS executable.

An IMPS executable file always has 7 sections of varying lengths in this format:

section name length in bytes type description
magic number 4 8-bit Should always be 0x49, 0x4d, 0x50, 0x53. This is a magic number identifying IMPS executables (ASCII for IMPS).
num_instructions 4 32-bit, little-endian The number of instructions in this executable.
entry_point 4 32-bit, little-endian Where the program starts.
instructions 4 * num_instructions 32-bit, little-endian The instructions of the program.
debug_offsets 4 * num_instructions 32-bit, little-endian One value for each instruction, used in subset 4.
memory_size 2 16-bit, little-endian The number of bytes in the data segment.
initial_data memory_size 8-bit The initial contents of the data segment
Your task task is to read bytes e.g. using fgetc from file path, and then store the appropriate information in the 7 fields of the struct imps_file to which executable points.

An struct imps_file has this definition:

struct imps_file {
    uint32_t num_instructions;
    uint32_t entry_point;
    uint32_t *instructions;
    uint32_t *debug_offsets;
    uint16_t memory_size;
    uint8_t  *initial_data;
};

You should allocate space on the heap for instructions, debug_offsets and initial_data using malloc.

You will find starting code and some hints in imps.c.

Unless there is an error, read_imps_file doesn't print anything.

The subset 1 autotests call your read_imps_file function directly and use extra code to print the contents of the struct imps_file afterwards.

Use the following command to run all the autotests for subset 1:

1521 autotest imps S1 [optionally: any extra .c or .h files]

There are a number of ways in which a file might not follow the specified format for IMPS executables file and be invalid.

Although it would be good practice to check all of them (and you can), you are only required to make two checks that the file format is correct.

If the file at path cannot be opened you should use perror(path) to output an error message and then exit with status 1 (using the exit function).

If the magic number is incorrect you should output the line Invalid IMPS file to stderr and then exit with status 1.

For example:

./imps file-that-does-not-exist
file-that-does-not-exist: No such file or directory
./imps Makefile
Invalid IMPS file

Subset 2

In subset 2 your task is to implement the function execute_imps in imps.c, which will execute an IMPS program. In subset 2 you can assume the IMPS program contains only the following instructions:

Instruction Description Encoding
ADDI Rt, Rs, Imm16 Rt = Rs + Imm16 001000ssssstttttIIIIIIIIIIIIIIII
SYSCALL perform a system call 000000cccccccccccccccccccc001100
ADD Rd, Rs, Rt Rd = Rs + Rt 000000ssssstttttddddd00000100000
ORI Rt, Rs, Imm16 Rt = Rs | Imm16 001101ssssstttttIIIIIIIIIIIIIIII
LUI Rt, Imm16 Rt = Imm16 << 16 00111100000tttttIIIIIIIIIIIIIIII

Execution of the program starts from the entry_point (which is an index into the array of instructions).

Every time you execute an instruction you should first fetch it from the instructions array in the executable. You will then need to determine what type of instruction it is, using bitwise operations.

If the instruction has operands (e.g. registers, an immediate) you should extract those operands using bitwise operations.

You will need to keep track of the values of the 32 general-purpose registers. When the program starts each register should be set to 0.

The $0 ($zero) register always has the value 0, and instructions that attempt to change its value have no effect.

For this subset you only need to implement these three syscalls:

  • syscall 1: print the integer in $a0 to stdout. You should use the provided print_int32_in_decimal (or equivalent).

  • syscall 10: exit the program, with status 0. You may use exit(0) for this.

  • syscall 11: print the character in $a0 to stdout. You should use putchar (or equivalent).

Error handling

In subset 2 your IMPS implementation should handle the following errors:

  • If a syscall is run with $v0 not equal to a valid syscall number, you should output the error message IMPS error: bad syscall number.

  • If an invalid instruction is run you should output the error message IMPS error: bad instruction <instruction>, where <instruction> is the instruction, printed out using print_uint32_in_hexadecimal.

  • If execution reaches past the end of the instructions array (which will occur if there's no exit syscall) you should output the error message IMPS error: execution past the end of instructions.

All error message should be printed to stderr, and after an error occurs execution should halt and IMPS should exit with status 1.

Autotests

Use the following command to run all of the autotests for subset 2:

1521 autotest imps S2 [optionally: any extra .c or .h files]

For tests that only involve the ADDI and SYSCALL instructions, run:

1521 autotest imps S2_1 [optionally: any extra .c or .h files]

Subset 3

In this subset you need to also implement the following instructions:

Instruction Description Encoding
CLO Rd, Rs Rd = count_leading_ones(Rs) 000000ssssstttttddddd00001010001
CLZ Rd, Rs Rd = count_leading_zeroes(Rs) 000000ssssstttttddddd00001010000
ADDU Rd, Rs, Rt Rd = Rs + Rt 000000ssssstttttddddd00000100001
ADDIU Rt, Rs, Imm16 Rt = Rs + Imm16 001001ssssstttttIIIIIIIIIIIIIIII
MUL Rd, Rs, Rt Rd = Rs * Rt 011100ssssstttttddddd00000000010
BEQ Rs, Rt, Offset16 IF Rs == Rt THEN
PC += Offset16
000100ssssstttttOOOOOOOOOOOOOOOO
BNE Rs, Rt, Offset16 IF Rs != Rt THEN
PC += Offset16
000101ssssstttttOOOOOOOOOOOOOOOO
SLT Rd, Rs, Rt Rd = Rs < Rt 000000ssssstttttddddd00000101010
LB Rt, Offset16(Rb) Rt = RAM[Rb + Offset16]
100000bbbbbtttttOOOOOOOOOOOOOOOO
LH Rt, Offset16(Rb) Rt = RAM[Rb + Offset16]
100001bbbbbtttttOOOOOOOOOOOOOOOO
LW Rt, Offset16(Rb) Rt = RAM[Rb + Offset16] 100011bbbbbtttttOOOOOOOOOOOOOOOO
SB Rt, Offset16(Rb) RAM[Rb + Offset16] = Rt 101000bbbbbtttttOOOOOOOOOOOOOOOO
SH Rt, Offset16(Rb) RAM[Rb + Offset16] = Rt 101001bbbbbtttttOOOOOOOOOOOOOOOO
SW Rt, Offset16(Rb) RAM[Rb + Offset16] = Rt 101011bbbbbtttttOOOOOOOOOOOOOOOO

The CLZ (count leading zeros) instruction counts the number of zero bits which are to the left of all one bits. For example if Rs = 0b00000001010001011101010101110011 there are 7 leading zeros, so Rd should be set to 7.

The CLO instruction counts the number of leading one bits, for example if Rs = 0b11111110101110100010101010001100 there are 7 leading ones, so Rd should be set to 7.

The ADDU and ADDIU instructions have similar semantics to the ADD and ADDI instructions, except that they do not perform error checking when overflow occurs (see the error checking part of this subset). They are included in this assignment since some psuedo-instruction expansions use ADDU and ADDIU.

The BEQ and BNE instructions perform conditional branches. If the branch condition is satisfied execution continues at instruction index PC + Offset16, where PC is the index of the current instruction and Offset16 is the offset encoded in the lower 16 bits of the branch instruction.

The SLT instruction sets Rd to 1 if Rs < Rt, otherwise Rd gets set to 0.

The memory access instructions only need to handle access to the data segment. The initial contents of the data segment are specified by the initial_data from the executable. Access to the data segment starts from address 0x10010000, so for example the initial value of the byte at 0x10010004 is equal to initial_data[4].

Memory accesses should be little-endian (the same as mipsy).

For subset 3 you also need to implement two more syscalls:

  • syscall 4: print the nul-terminated string at address $a0 to stdout.

  • syscall 12: read a single character from stdin (e.g. via getchar) and place that character in $v0. If EOF is reached $v0 should be set to -1.

Error handling

In subset 3 your IMPS implementation should additionally handle the following errors:

  • If the result of an ADD or ADDI instruction would result in a signed overflow you should output the error IMPS error: addition would overflow. There are various ways to detect overflow, including using a wider type to compute the result and then checking if the result is too large or too small, or by examining the sign bits of the inputs and the result.

  • If a memory access is not correctly aligned, or if it is outside the range of the initial data segment [0x10010000, 0x10010000 + memory_size) you should output the error IMPS error: bad address for <size> access: <address> where <size> is byte, half or word, and <address> is the address being accessed, printed using print_uint32_in_hexadecimal.

As in subset 1, error messages should be printed to stderr and errors should result in IMPS exiting with status 1.

Autotests

Use the following command to run all of the autotests for subset 3:

1521 autotest imps S3 [optionally: any extra .c or .h files]

For tests involving the CLO and CLZ instructions, run:

1521 autotest imps S3_1 [optionally: any extra .c or .h files]

For tests involving the branch and SLT instructions, run:

1521 autotest imps S3_3 [optionally: any extra .c or .h files]

For tests involving memory accesses, run:

1521 autotest imps S3_4 [optionally: any extra .c or .h files]

For tests involving error checking in this subset, run:

1521 autotest imps S3_5 [optionally: any extra .c or .h files]

Subset 4

This subset is split into two parts. In the first part you will add a tracing mode to IMPS. In the second part you will implement file system syscalls in IMPS.

Tracing mode

Usually IMPS is run as ./imps <executable.imps>, however in this subset you will add a tracing mode, which is enabled by running IMPS with the -t flag before the executable:

./imps -t <executable.imps>

The following changes occur in tracing mode:

  • Before execution starts IMPS should open the assembly source file. You can assume that the assembly source has the same path as the IMPS executable, except with the .imps extension replaced with .s.

  • Just prior to an instruction executing, you should use the debug offset from the IMPS executable for the current instruction to seek to the corresponding position in the assembly source using fseek. You should then print out the line starting from that position to stdout, stopping at a newline byte or EOF. If the debug offset is past the end of the assembly file no line is printed.

  • Just after each instruction is executed, if that instruction modified a register, you should output the following line:

        $<reg>: <old val> -> <new val>
    
    where <reg> is the human-friendly name for the register that changed, and <old val> and <new val> are the old and new values of the register respectively, printed using print_uint32_in_hexadecimal.

The following is an example of running IMPS in tracing mode:

./imps -t examples/hi_addi_1.imps
        addi    $v0, $zero, 11
   $v0: 0x00000000 -> 0x0000000b
        addi    $a0, $zero, 'h'
   $a0: 0x00000000 -> 0x00000068
        syscall
h       addi    $v0, $zero, 11
        addi    $a0, $zero, 'i'
   $a0: 0x00000068 -> 0x00000069
        syscall
i       addi    $v0, $zero, 11
        addi    $a0, $zero, '\n'
   $a0: 0x00000069 -> 0x0000000a
        syscall

        addi    $v0, $zero, 10
   $v0: 0x0000000b -> 0x0000000a
        syscall

File syscalls

For the last part of this assignment, you will implement the open, read, write and close syscalls, with an in-memory emulated filesystem. These syscalls should not result in IMPS performing actual file operations.

You must implement the following syscalls:

  • syscall 13: open the file with path $a0 (which should be the address of a nul-terminated string).

    If $a1 is equal to 0 the file is opened for reading, otherwise if $a1 is 1 the file is opened for writing. Real world open syscalls have a more complicated set of flags.

    If the file is opened for writing and the file does not exist it is created.

    If the file does exist it is not truncated.

    The lowest unused file descriptor is allocated for this open file and returned via $v0.

    If an error occurs (such as the file not existing but being opened for reading) $v0 is set to -1.

  • syscall 14: read from file descriptor $a0 into a buffer with address $a1 up to $a2 many bytes.

    If an error occurs (such as the file not being opened for reading) $v0 is set to -1, otherwise $v0 gets set to the number of bytes read.

  • syscall 15: write to file descriptor $a0 with the contents of a buffer with address $a1 up to $a2 many bytes.

    If an error occurs (such as the file not being opened for writing) $v0 is set to -1, otherwise $v0 gets set to the number of bytes written.

  • syscall 16: close file descriptor $a0.

    If an error occurs (such as the file descriptor being invalid) $v0 is set to -1, otherwise $v0 is set to 0.

You may assume the following limits are in place:

  • The maximum number of files you need to handle is 6.
  • The maximum number of open files at any given time is 8.
  • The maximum file size is 128 bytes.

You are encouraged to use the reference implementation to check your understanding of the behaviour of these syscalls.

Autotests

Use the following command to run all of the autotests for subset 4:

1521 autotest imps S4 [optionally: any extra .c or .h files]

For tests involving tracing mode, run:

1521 autotest imps S4_1 [optionally: any extra .c or .h files]

For tests involving the file syscalls, run:

1521 autotest imps S4_2 [optionally: any extra .c or .h files]

Reference implementation

A reference implementation is available as 1521 imps. Use it to find the correct output for any input like this:

1521 imps examples/hi_addi_1.imps
hi

Provision of a reference implementation is a common, efficient and effective method to provide or define an operational specification, and it's something you will likely need to work with after you leave UNSW.

Where any aspect of this assignment is undefined in this specification you should match the reference implementation's behaviour.

Discovering and matching the reference implementation's behaviour is deliberately part of the assignment.

If you discover what you believe to be a bug in the reference implementation, report it in the class forum. If it is a bug, we may fix the bug, or indicate that you do not need to match the reference implementation's behaviour in this case.

Examples

Some example MIPS programs are available in the provided examples directory. You will also need to do your own testing.

The 1521 imps-asm command can be used to create IMPS executables from assembly source code. For example you can use the command 1521 imps-asm my_example.s to create an IMPS executable my_example.imps.

If you pass the --also-disassemble flag to 1521 imps-asm a .disasm file will also be produced which contains the disassembly of the executable.

When creating your own test cases make sure to not accidentally include instructions or syscalls which you have not implemented. For example, to end your test case you should use syscall 10 rather than jr $ra.

Assumptions and Clarifications

Like all good programmers, you should make as few assumptions as possible.

If in doubt, match the output of the reference implementation.

  • You do not have to implement MIPS instructions, system calls, or features that are not explicitly mentioned in the specification above.

  • You will not be penalized if you implement extra MIPS instructions beyond the ones specified above.

  • You do not need to handle pseudo-instructions. 1521 imps-asm translates these into the appropriate real instructions.

  • Your submitted code must be C only. You may call functions from the standard C library (e.g., functions from stdio.h, stdlib.h, string.h, etc.) and the mathematics library (math.h). You may use assert.h.

    You may not submit code in other languages. You may not use the system function, or other C functions to run external programs. You may not use functions from other libraries; in other words, you cannot use dcc's -l flag.

  • Your program must not require extra compile options. It must compile with dcc *.c -o imps, and it will be run with dcc when marking. Run-time errors from illegal C will cause your code to fail automarking.

  • If you need clarification on what you can and cannot use or do for this assignment, ask in the class forum.

  • If your program writes out debugging output, it will fail automarking tests: make sure you disable debugging output before submission.

  • You are required to submit intermediate versions of your assignment. See below for details.

Change Log

Version 1.0
(2024-10-25 18:00:00)
  • Initial release

Assessment

Testing

When you think your program is working, you can use autotest to run some simple automated tests:

1521 autotest imps [optionally: any extra .c or .h files]
You can also run autotests for a specific subset. For example, to run all tests from subset 1:
1521 autotest imps S1 [optionally: any extra .c or .h files]
Some tests are more complex than others. If you are failing more than one test, you are encouraged to focus on solving the first of those failing tests. To do so, you can run a specific test by giving its name to the autotest command:
1521 autotest imps S1_1_0 [optionally: any extra .c or .h files]

1521 autotest will not test everything.
Always do your own testing.

Automarking will be run by the lecturer after the submission deadline, using a superset of tests to those autotest runs for you.

Emulated tests

To help test the portability of your code in other environments, you can use the 1521 imps-emulated-tests command. This will run your code in an emulator, so if your code is incorrectly making assumptions about the system such as endianness or word size these tests can help detect this.

There are two modes you can use with imps-emulated-tests, you can either run

1521 imps-emulated-tests user [optionally: any extra .c or .h files]

which will run your code against the full set of autotests using QEMU user mode emulation of a 32-bit (big endian) MIPS system.

Alternatively you can run

1521 imps-emulated-tests vm [optionally: any extra .c or .h files]

which will boot up a 32-bit SPARC virtual machine running NetBSD (also using QEMU), and then run your code against a subset of these tests.

Submission

When you are finished working on the assignment, you must submit your work by running give:

give cs1521 ass2_imps imps.c [optionally: any extra .c or .h files]

You must run give before Week 10 Saturday 12:00:00 to obtain the marks for this assignment. Note that this is an individual exercise, the work you submit with give must be entirely your own.

You can run give multiple times.
Only your last submission will be marked.

If you are working at home, you may find it more convenient to upload your work via give's web interface.

You cannot obtain marks by emailing your code to tutors or lecturers.

You can check your latest submission on CSE servers with:

1521 classrun check ass2_imps

You can check the files you have submitted here.

Manual marking will be done by your tutor, who will mark for style and readability, as described in the Assessment section below. After your tutor has assessed your work, you can view your results here; The resulting mark will also be available via give's web interface.

Due Date

This assignment is due Week 10 Saturday 12:00:00 (2024-11-16 12:00:00).

The UNSW standard late penalty for assessment is 5% per day for 5 days - this is implemented hourly for this assignment.

Your assignment mark will be reduced by 0.2% for each hour (or part thereof) late past the submission deadline.

For example, if an assignment worth 60% was submitted half an hour late, it would be awarded 59.8%, whereas if it was submitted past 10 hours late, it would be awarded 57.8%.

Beware - submissions 5 or more days late will receive zero marks. This again is the UNSW standard assessment policy.

Assessment Scheme

This assignment will contribute 15 marks to your final COMP1521 mark.

80% of the marks for assignment 2 will come from the performance of your code on a large series of tests.

20% of the marks for assignment 2 will come from hand marking. These marks will be awarded on the basis of clarity, commenting, elegance and style. In other words, you will be assessed on how easy it is for a human to read and understand your program.

An indicative assessment scheme for performance follows. The lecturer may vary the assessment scheme after inspecting the assignment submissions, but it is likely to be broadly similar to the following:

100% for performance completely working subsets 1, 2, 3 & 4 - everything works!
85% for performance completely working subsets 1, 2 & 3.
70% for performance completely working subsets 1 & 2.
55% for performance completely working subset 1.
40-50% for performance good progress, but not passing subset 1 autotests.
0% knowingly providing your work to anyone
and it is subsequently submitted (by anyone).
0 FL for
COMP1521
submitting any other person's work;
this includes joint work.
academic
misconduct
submitting another person's work without their consent;
paying another person to do work for you.

An indicative assessment scheme for style follows. The lecturer may vary the assessment scheme after inspecting the assignment submissions, but it is likely to be broadly similar to the following:

100% for style perfect style
90% for style great style, almost all style characteristics perfect.
80% for style good style, one or two style characteristics not well done.
70% for style good style, a few style characteristics not well done.
60% for style ok style, an attempt at most style characteristics.
≤ 50% for style an attempt at style.

An indicative style rubric follows:

  • Formatting (6/20):
    • Whitespace (e.g. 1 + 2 instead of 1+2)
    • Indentation (consistent, tabs or spaces are okay)
    • Line length (below 80 characters unless very exceptional)
    • Line breaks (using vertical whitespace to improve readability)
  • Documentation (8/20):
    • Header comment (with name and zID)
    • Function comments (above each function with a good description)
    • Descriptive variable names (e.g. char *home_directory instead of char *h)
    • Descriptive function names (e.g. get_home_directory instead of get_hd)
    • Sensible commenting throughout the code (don't comment every single line; leave comments when necessary)
  • Elegance (5/20):
    • Does this code avoid redundancy? (e.g. Don't repeat yourself!)
    • Are helper functions used to reduce complexity? (functions should be small and simple where possible)
    • Are constants appropriately created and used? (magic numbers should be avoided)
  • Portability (1/20):
    • Would this code be able to compile and behave as expected on other POSIX-compliant machines? (using standard libraries without platform-specific code)
    • Does this code make any assumptions about the endianness of the machine it is running on?

Note that the following penalties apply to your total mark for plagiarism:

0 for asst2 knowingly providing your work to anyone
and it is subsequently submitted (by anyone).
0 FL for
COMP1521
submitting any other person's work; this includes joint work.
academic
misconduct
submitting another person's work without their consent;
paying another person to do work for you.

Intermediate Versions of Work

You are required to submit intermediate versions of your assignment.

Every time you work on the assignment and make some progress you should copy your work to your CSE account and submit it using the give command above. It is fine if intermediate versions do not compile or otherwise fail submission tests. Only the final submitted version of your assignment will be marked.

Assignment Conditions

  • Joint work is not permitted on this assignment.

    This is an individual assignment. The work you submit must be entirely your own work: submission of work even partly written by any other person is not permitted.

    Do not request help from anyone other than the teaching staff of COMP1521 — for example, in the course forum, or in help sessions.

    Do not post your assignment code to the course forum. The teaching staff can view code you have recently submitted with give, or recently autotested.

    Assignment submissions are routinely examined both automatically and manually for work written by others.

    Rationale: this assignment is designed to develop the individual skills needed to produce an entire working program. Using code written by, or taken from, other people will stop you learning these skills. Other CSE courses focus on skills needed for working in a team.

  • The use of generative tools such as Github Copilot, ChatGPT, Google Bard is not permitted on this assignment.

    Rationale: this assignment is designed to develop your understanding of basic concepts. Using synthesis tools will stop you learning these fundamental concepts, which will significantly impact your ability to complete future courses.

  • Sharing, publishing, or distributing your assignment work is not permitted.

    Do not provide or show your assignment work to any other person, other than the teaching staff of COMP1521. For example, do not message your work to friends.

    Do not publish your assignment code via the Internet. For example, do not place your assignment in a public GitHub repository.

    Rationale: by publishing or sharing your work, you are facilitating other students using your work. If other students find your assignment work and submit part or all of it as their own work, you may become involved in an academic integrity investigation.

  • Sharing, publishing, or distributing your assignment work after the completion of COMP1521 is not permitted.

    For example, do not place your assignment in a public GitHub repository after this offering of COMP1521 is over.

    Rationale: COMP1521 may reuse assignment themes covering similar concepts and content. If students in future terms find your assignment work and submit part or all of it as their own work, you may become involved in an academic integrity investigation.

Violation of any of the above conditions may result in an academic integrity investigation, with possible penalties up to and including a mark of 0 in COMP1521, and exclusion from future studies at UNSW. For more information, read the UNSW Student Code, or contact the course account.