Assignment 1

Efficient Set ADT

Changelog

All important changes to the assignment specification and files will be listed here.

  • [05/03 12:00] Assignment released
  • [08/03 10:20] Added a question to the FAQ about including additional libraries.
  • [09/03 10:40] Added a clarification about the behaviour of a cursor after its associated set has been freed.

Admin

Marks
contributes 15% towards your final mark (see Assessment section for more details)
Submit
see the Submission section
Deadline
8pm on Monday of Week 7
Late penalty
0.2% per hour or part thereof deducted from the attained mark, submissions later than 5 days not accepted

Prerequisite Knowledge

  • Recursion
  • Analysis of Algorithms
  • Abstract Data Types
  • Binary Search Trees
  • Balanced BSTs (including AVL Trees)

Background

Sets

A set is a collection of distinct elements. In this assignment, we are concerned with sets of integers.

Usually, we write a set by displaying its elements between two curly braces. For example, \(\{1, 4, 6, 9\}\) is a set whose elements are 1, 4, 6, 9. Note that the order in which the elements are written does not make the set different, so for example, \(\{1, 4, 6, 9\}\) and \(\{6, 9, 1, 4\}\) and \(\{1, 6, 9, 4\}\) all represent the same set.

Notation

Let \(A\) be a set and \(x\) be an item.

  • \(x\) is said to be in \(A\), written as \(x\in A\), if \(x\) is an element of \(A\).

  • The empty set (i.e., the set containing no elements) is denoted by \(\varnothing\).

Set ADT

A set is an example of an abstract data type: there is a collection of operations that can be performed on them, but the exact details of the implementation are unimportant, as long as they produce the desired behaviour from the user's perspective.

In lectures, we considered several different implementations of the Set ADT: unordered/ordered arrays and unordered/ordered linked lists. Although these were relatively simple to implement, each of them were inefficient in some way:

  • An ordered array allows for binary search to be used, but requires elements to be shifted to maintain order when inserting
  • A linked list does not require shifting, but requires a linear traversal when inserting and searching for elements

Recently, we introduced binary search trees as a solution to both of these problems (shifting and linear search). A binary search tree (BST) is a linked data structure, and therefore does not require shifting, and the way searching works in a BST is similar to how binary search works. Thus, this assignment will involve implementing a set using a binary search tree, and part of this assignment will involve making it as efficient as possible using a balanced binary search tree.

Setting Up

Change into the directory you created for the assignment and run the following command:

unzip /web/cs2521/24T1/ass/ass1/downloads/files.zip

If you're working at home, download files.zip by clicking on the above link and then run the unzip command on the downloaded file.

You should now have the following files:

Makefile
a set of dependencies used to control compilation
Set.h
interface to the Set ADT - cannot be modified
Set.c
implementation of the Set ADT (incomplete)
SetStructs.h
definition of structs used in the Set ADT (incomplete)
testSet.c
a main program containing some basic tests for the Set ADT
analysis.txt
a template for you to enter your time complexity analysis for selected functions

Usually, the structs used by an ADT implementation are defined in the .c file along with the functions. However, in this assignment, they are defined in a separate file, SetStructs.h, because our tests will need access to these structs in order to make it easier to independently test each function, as well as to check whether your trees are balanced in Part 3. You may add extra fields to these structs and define additional structs if needed, but you must use the given structs/fields in your implementation as follows:

  • You must use struct node for binary search tree nodes
  • The elements of the set must be stored in the item fields of the struct node
  • The left and right pointers should be used to connect a tree node to its left and right subtrees respectively
  • The tree field must be used to point to the binary search tree that stores all the elements of the set

Note that the only files that you are allowed to submit are Set.c, SetStructs.h and analysis.txt. This means all your code for implementing the Set ADT must be placed in Set.c, except the struct definitions which should be in SetStructs.h.

Part 1: Basic Set Operations

In Part 1, you will implement some basic set operations such as insertion and deletion. Note that each of these operations has a time complexity requirement, which, if not met, may result in a loss of marks. In the time complexities below, \(n\) refers to the size of the set, and \(h\) refers to the height of the underlying binary search tree.

Note that a balanced tree is not required at this stage.

Operation/Function Description Required worst-case time complexity
SetNew Creates a new empty set. \(O(1)\)
SetFree Frees all memory allocated to the set. \(O(n)\)
SetInsert Inserts the given item into the set. Any integer may be inserted (including negative integers) except for the value UNDEFINED (which is #included in Set.h). \(O(h)\)
SetDelete Deletes the given item from the set (if it exists). \(O(h)\)
SetSize Returns the number of elements in the set. \(O(1)\)
SetContains Checks if the given item is in the set. Returns true if it is, and false otherwise. \(O(h)\)
SetPrint Prints the elements in the set to the given file in ascending order between curly braces, with items separated by a comma and space. Do not print a newline. Do not use the backspace character ('\b').
For example, if the set is empty, the following should be printed:
{}
For example, if the set contains only the value 2, then the following should be printed:
{2}
For example, if the set contains the values 2, 4, 7 and 8, then the following should be printed:
{2, 4, 7, 8}
Use the fprintf function to print to a file.
fprintf is like printf, except that the user can specify which file to print to via the first argument.
\(O(n)\)

Part 2: Common Set Operations

In Part 2, you will implement some common set operations such as union and intersection. Note that there is no time complexity requirement for these operations, but you will be required to analyse the time complexity of your solutions.

Note: None of these operations should modify the given sets.

Important Constraint

In this part, the method of converting the given tree(s) into an array or linked list, solving the main problem using the array/linked list and then returning the result or converting the result back into a tree is strictly forbidden, as this assignment is intended to give you experience with using binary search trees.

Operation/Function Description
SetUnion Given two sets \(A\) and \(B\), returns a new set representing their union.

Let \(A\) and \(B\) be two sets. The union of \(A\) and \(B\), denoted by \(A \cup B\), is the set of all elements that are contained in \(A\), or \(B\), or both \(A\) and \(B\). Formally,

\[ A \cup B = \{ x: x\in A \mbox{ or } x\in B \} \]

For example:

\[ \{1, 2, 3\} \cup \{2, 4\} = \{1, 2, 3, 4\} \] \[ \{1, 5\} \cup \{2, 6, 8\} = \{1, 2, 5, 6, 8\} \]

SetIntersection Given two sets \(A\) and \(B\), returns a new set representing their intersection.

Let \(A\) and \(B\) be two sets. The intersection of \(A\) and \(B\), denoted by \(A \cap B\), is the set of all elements that are contained in both \(A\) and \(B\). Formally,

\[ A \cap B = \{x : x\in A \mbox{ and } x\in B\} \]

For example:

\[ \{1,2,3\} \cap \{2, 4\} = \{2\} \] \[ \{1,5\} \cap \{2,6,8\} = \emptyset \]

SetEquals Given two sets, returns true if the sets are equal, i.e., they contain exactly the same elements, and false otherwise.
SetSubset Given two sets, returns true if the first set is a subset of the second set, and false otherwise.

Let \(A\) and \(B\) be two sets. \(A\) is said to be a subset of \(B\), written as \(A \subseteq B\), if all elements of \(A\) are also elements of \(B\).

For example:

\[ \{1, 2, 3\} \subseteq \{1, 2, 3, 5\} \] \[ \{1, 5, 8\} \nsubseteq \{1, 5, 3, 9\} \]

For each of the operations above, determine the worst-case time complexity of your implementation, and write your answers in analysis.txt along with an explanation of each answer.

Your answers should be in big-O notation, and the time complexity should be in terms of \(n\) and \(m\), where \(n\) is the size of the first set, and \(m\) is the size of the second set.

For convenience of analysis, you must assume that the underlying binary search trees are balanced, even though your implementation might not use a balanced BST yet (this is the objective of Part 3).

In your explanations, you may use time complexities established in lectures without proof, as long as you indicate that it was established in lectures. For example, you may use the fact that the worst-case time complexity of searching in an AVL tree is \(O(\log n)\) where \(n\) is the size of the tree, as this was established in lectures.

Part 3: Balanced BST

Although binary search trees have the potential to be more efficient than ordered arrays and linked lists, they are only efficient if they are balanced.

Therefore, your task is now to update your implementation so that it uses a height-balanced binary search tree.

After you have updated your implementation, it is expected that SetInsert and SetDelete will have a worst-case time complexity of \(O(\log n)\), and that the underlying binary search tree of any set will always be height-balanced, even for those created by SetUnion and SetIntersection.

Part 4: Index Operations

Note: You will need to modify some of your existing code.

In some cases, a user may be interested in knowing what the \(i\)-th smallest element is, or how a given element ranks among all the other elements.

Thus, in Part 4, you will implement some operations that allow a user to retrieve the element at a given index and obtain the index of a particular element.

The element of an index is the index that it would be stored at if all the elements in the set were stored in a sorted array. For example, if a set contains the values 1, 3, 4 and 7, then the index of 1 is 0, the index of 3 is 1, and so on.

Operation/Function Description Required worst-case time complexity
SetAtIndex Given an index, returns the element at that index, or UNDEFINED if the given index is outside the range \([0, n - 1]\) where \(n\) is the size of the set. \(O(\log n)\)
SetIndexOf Given an element, returns the index of the element in the set if it exists, and -1 otherwise. \(O(\log n)\)

Part 5: Cursor Operations

This is a challenge! You should not expect to receive hints for this part.
Note: You will need to modify some of your existing code.

As you have learned, an ADT does not provide users access to the internal representation of the data type. But what if a user wanted to know what elements are contained in a set? With only the basic operations listed above, the only way the user could achieve this is to check the count of every possible item, like so:

let s be a set
i = smallest possible integer
while i <= largest possible integer:
	check if i exists in the set
	i = i + 1

However, this is not feasible given how many possible integers there are. To solve this problem, many collection data types provide a simple way for users to iterate over their elements, called iterators. Here is how an iterator would be used:

let s be a set
it = create an iterator for s
while thereIsStillANextItem(it):
	i = getNextItem(it)

Your task is to implement a more flexible iterator that allows users to move forwards and backwards through the elements of a set, which we will call a cursor.

To understand what a cursor is, imagine that the elements of a set are arranged in a line in increasing order, and that there are two imaginary elements - one at the start, and one at the end:

A cursor is like a pointer that moves between these elements. When a cursor is created, it will be positioned at the start element, and the user can use operations called next and prev to move the cursor right or left.

There are five cursor operations:

Operation/Function Description
SetCursorNew Creates a new cursor for the given set, initially positioned at the start of the set
SetCursorFree Frees all memory allocated to the given cursor
SetCursorGet Returns the element at the cursor's position, or UNDEFINED if the cursor is positioned at the start or end of the set.
SetCursorNext Moves the cursor to the next greatest element, or to the end of the set if there is no next greatest element. Does not move the cursor if it is already at the end. Returns false if the cursor is at the end after this operation, and true otherwise.
SetCursorPrev Moves the cursor to the next smallest element, or to the start of the set if there is no next smallest element. Does not move the cursor if it is already at the start. Returns false if the cursor is at the start after this operation, and true otherwise.

Here is an example of how a cursor could be used:

int main(void) {
	Set s = SetNew();
	SetInsert(s, 4);
	SetInsert(s, 7);
	SetInsert(s, 1);
	SetInsert(s, 3);

	SetCursor cur = SetCursorNew(s);
	while (SetCursorNext(cur)) {
		int elem = SetCursorGet(cur);
		printf("%d\n", elem);
	}
	SetCursorFree(cur);

	SetFree(s);
}

This would produce the output:

1
3
4
7

The following are some clarifications about the expected behaviour of the cursor:

  • If the set associated with a cursor is empty, SetCursorNext should move the cursor to the end, and SetCursorPrev should move the cursor to the start. In both cases, the functions should return false as specified above.

  • A user should be able to create multiple cursors for a given set and iterate over the set independently with each cursor.

  • SetCursorNext and SetCursorPrev should continue to work as described above if elements are inserted after the cursor has been created.

    • For example, suppose a set contains the elements 2, 5 and 8, and a cursor is currently positioned at 5. If 7 is now inserted into the set, then calling SetCursorNext should move the cursor to 7.

  • SetCursorNext and SetCursorPrev should also continue to work as described above if elements are deleted after the cursor has been created, unless the cursor was positioned at the element which was deleted. If a cursor is positioned at an element and then the element is deleted, the cursor is said to be invalidated and can no longer be used except in SetCursorFree.

  • If the set associated with a particular cursor is freed, no further operations may be performed on the cursor except for SetCursorFree.

In order to obtain full marks for this part, all cursor operations should have a worst-case time complexity of \(O(1)\). A solution where one or more of the cursor operations have a worst-case time complexity of \(O(\log n)\) is worth half the marks. Solutions where one or more operations are slower than \(O(\log n)\) will not receive marks for this part.

You must also explain the design and implementation of your solution and how it meets the time complexity requirement of \(O(1)\) or \(O(\log n)\) in analysis.txt.

Testing

We have provided a main program testSet.c containing basic test cases. You should examine testSet.c to see what the tests do and how the tests call your functions.

To run these tests, first run make, which compiles your code and creates an executable called testSet, and then run ./testSet.

All of the given tests are assertion-based, which means that the program will exit as soon as a test fails. If you want to ignore a test for the time being, then you can comment out the corresponding function which runs that test.

We strongly recommend you to add your own tests, as the given tests are very simple. You can easily add your own tests by creating a new function in testSet.c and then calling it from the main function. You are also free to completely restructure the testing program if you want.

Debugging

Debugging is an important and inevitable aspect of programming. We expect everyone to have an understanding of basic debugging methods, including using print statements, knowing basic GDB commands and running Valgrind. In the forum and in help sessions, tutors will expect you to have made a reasonable attempt to debug your program yourself using these methods before asking for help.

You can learn about GDB and Valgrind in the Debugging with GDB and Valgrind lab exercise.

Frequently Asked Questions

  • Are we allowed to create our own functions? You are always allowed to create your own functions. All additional functions you create should be made static.
  • Are we allowed to #include additional libraries? No. All the libraries required for this assignment are provided already.
  • Are we allowed to create our own #defines and structs? Yes.
  • Are we allowed to modify Set.h in any way? No.
  • Are we allowed to change the signatures of the given functions? No. If you change these, your code won't compile and we won't be able to test it.
  • What errors do we need to handle? You should handle common errors such as NULL returned from malloc by printing an error message to stderr and terminating the program. You are not required to handle other errors.
  • What invalid inputs do we need to handle? You are not required to handle invalid inputs, such as NULL being passed in as a set. It is the user's responsibility to provide valid inputs.
  • Will we be assessed on our tests? No. You will not be submitting any test files, and therefore you will not be assessed on your tests.
  • Are we allowed to share tests? No. Testing is an important part of software development. Students who test their code more will be more likely to have correct code, so to ensure fairness, each student should independently develop their own tests.

Submission

You must submit the files Set.c, SetStructs.h and analysis.txt. You can submit via the command line using the give command:

give cs2521 ass1 Set.c SetStructs.h analysis.txt

You can also submit via give's web interface. You can submit multiple times. Only your last submission will be marked. You can check the files you have submitted here.

Compilation

You must ensure that your final submission compiles on CSE machines. Submitting non-compiling code leads to extra administrative overhead and will result in a 10% penalty.

Every time you make a submission, a dryrun test will be run on your code to check that it compiles. Please ensure that your final submission successfully compiles, even for parts that you have not completed.

Assessment Criteria

This assignment will contribute 15% to your final mark.

Correctness

76% of the marks for this assignment will be based on the correctness of your code, and will be based on autotesting. Marks for correctness will be distributed as follows:

Part Operation Weighting
Part 1 SetFree See memory errors/leaks section below
SetInsert 2%
SetDelete 2%
SetSize 2%
SetContains 2%
SetPrint 4%
Part 2 SetUnion 6%
SetIntersection 6%
SetEquals 6%
SetSubset 6%
Part 3 SetInsert (balanced) 12%
SetDelete (balanced) 8%
Part 4 SetAtIndex 6%
SetIndexOf 6%
Part 5 All cursor operations
Note: you must implement all cursor operations to get marks
8%
Memory errors/leaks

You must ensure that your code does not contain memory errors or leaks, as code that contains memory errors is unsafe and it is bad practice to leave memory leaks in your program.

Our tests will include a memory error/leak test for each operation. If a memory error/leak arises from your code, you will receive a penalty which is 10% of the marks for that operation. For example, the penalty for causing a memory error/leak in the SetEquals operation will be 0.6%. Note that our tests will always call SetFree at the end of the test (and SetCursorFree if appropriate) to free all memory associated with the set.

Complexity analysis

12% of the marks for this assignment will be based on the correctness of your complexity analysis in analysis.txt and the quality of your explanations.

Style

12% of the marks for this assignment will come from hand marking of the readability of the code you have written. These marks will be awarded on the basis of clarity, commenting, elegance and style. The following is an indicative list of characteristics that will be assessed, though your program will be assessed wholistically so other characteristics may be assessed too (see the style guide for more details):

  • Consistent and sensible indentation and spacing
  • Using blank lines and whitespace
  • Using functions to avoid repeating code
  • Decomposing code into functions and not having overly long functions
  • Using comments effectively and not leaving planning or debugging comments

The course staff may vary the assessment scheme after inspecting the assignment submissions but it will remain broadly similar to the description above.

Originality of Work

This is an individual assignment. The work you submit must be your own work and only your work apart from the exceptions below. Joint work is not permitted. At no point should a student read or have a copy of another student's assignment code.

You may use any amount of code from the lectures and labs of the current iteration of this course. You must clearly attribute the source of this code in a comment. Please see the referencing guide.

You may use small amounts (< 10 lines) of general purpose code (not specific to the assignment) obtained from sites such as Stack Overflow or other publicly available resources. You must clearly attribute the source of this code in a comment.

You are not permitted to request help with the assignment apart from in the course forum, help sessions or from course lecturers or tutors.

Do not provide or show your assignment work to any other person (including by posting it publicly on the forum) apart from the teaching staff of COMP2521. When posting on the course forum, teaching staff will be able to view the assignment code you have recently submitted with give.

The work you submit must otherwise be entirely your own work. Submission of work partially or completely derived from any other person or jointly written with any other person is not permitted. The penalties for such an offence may include negative marks, automatic failure of the course and possibly other academic discipline. Assignment submissions will be examined both automatically and manually for such issues.

Relevant scholarship authorities will be informed if students holding scholarships are involved in an incident of plagiarism or other misconduct. If you knowingly provide or show your assignment work to another person for any reason, and work derived from it is submitted, then you may be penalised, even if the work was submitted without your knowledge or consent. This may apply even if your work is submitted by a third party unknown to you.