Week 09 Tutorial Answers

Sorting

Resources

Questions

  1. Explain what these properties mean in relation to sorting algorithms:

    • Stability
    • Adaptability
    • Comparison-based

    Answer:

    • Stability: A sorting algorithm is stable if it maintains the relative order of items with equal sort keys.
    • Adaptiveness: Whether or not the pre-sortedness of the input affects the running time. A sorting algorithm is adaptive if it runs faster when the input is already partially sorted.
    • Comparison-based: A sorting algorithm is comparison-based if it examines the data only through comparison operators (such as < and ≤). Examples of where sorting algorithms are not comparison based are when an item is used with the modulo operator (as in radix sort) or as an index (as in counting sort).
  2. Consider the following simple table of enrolments, sorted by course code:

    COMP1927Jane3970
    COMP1927John3978
    COMP1927Pete3978
    MATH1231John3978
    MATH1231Adam3970
    PSYC1011Adam3970
    PSYC1011Jane3970

    Now we wish to sort it by student name, so that we can easily see what courses each student is studying. Show an example of what the final array would look like if

    1. we used a stable sorting algorithm
    2. we used an unstable sorting algorithm

    Answer:

    After a stable sort, the relative positions of items with equal sort keys is the same. In the context of this problem, this means that the subjects for a given student should be in the same order as in the original table (which happens to be sorted on subject):

    MATH1231Adam3970
    PSYC1011Adam3970
    COMP1927Jane3970
    PSYC1011Jane3970
    COMP1927John3978
    MATH1231John3978
    COMP1927Pete3978

    An unstable sorting algorithm could produce any order that is sorted on the sort key (in this case name), including the order that is produced by the sorting stable algorithm. For example:

    PSYC1011Adam3970
    MATH1231Adam3970
    PSYC1011Jane3970
    COMP1927Jane3970
    MATH1231John3978
    COMP1927John3978
    COMP1927Pete3978

    The result will definitely be sorted by name, but not necessarily be sorted by course-code within each name.

  3. Sorting algorithms generally use comparison on the key to determine ordering. Ordering of items with equal keys has been left to the stability of the sorting algorithm used. Sometimes, it is important that items with the same key be ordered on a secondary key.

    Write a function that takes two items and determines an order based on the primary key k and then on the value of a secondary key j. The function should return -1 if the first item is "less than" the second item, 1 if the first item is "greater than" the second item, and 0 if the items are equal. The function has the following signature:

    typedef struct item {
    	int k;
    	int j;
    	/* ... other fields ...*/
    } Item;
    
    int itemCmp(Item a, Item b) { ... }
    

    Answer:

    The following function would do what we want:

    int itemCmp(Item a, Item b) {
    	if (a.k < b.k) {
    		return -1;
    	} else if (a.k > b.k) {
    		return 1;
    	} else { // (a.k == b.k)
    		if (a.j < b.j) {
    			return -1;
    		} else if (a.j > b.j) {
    			return 1;
    		} else {
    			return 0;
    		}
    	}
    }
    
  4. How many comparisons would be required to sort this array

    int a[] = {4, 3, 6, 8, 2};
    

    for each of the following sorting algorithms:

    1. selection sort
    2. bubble sort
    3. insertion sort

    Answer:

    The comparisons are highlighted in red.

    1. selection sort

      Iteration 1:
      4  3  6  8  2  # assume 4 is the smallest item
      4  3  6  8  2  # found a smaller item, 3
      4  3  6  8  2
      4  3  6  8  2
      4  3  6  8  2  # found a smaller item, 2
      2  3  6  8  4  # swap smallest item with item at index 0
      
      Iteration 2:
      2  3  6  8  4  # assume 3 is the next-smallest item
      2  3  6  8  4
      2  3  6  8  4
      2  3  6  8  4
      2  3  6  8  4  # swap next-smallest item with item at index 1
      
      Iteration 3:
      2  3  6  8  4  # assume 6 is the next-smallest item
      2  3  6  8  4
      2  3  6  8  4  # found a smaller item, 4
      2  3  4  8  6  # swap next-smallest item with item at index 2
      
      Iteration 4:
      2  3  4  8  6  # assume 8 is the next-smallest item
      2  3  4  8  6  # found a smaller item, 6
      2  3  4  6  8  # swap next-smallest item with item at index 3
      
      Done
      

      10 comparisons are made in total.

    2. bubble sort (bubble-up)

      Iteration 1:
      4  3  6  8  2  # 4 and 3 are out of order, swap
      3  4  6  8  2
      3  4  6  8  2
      3  4  6  8  2  # 8 and 2 are out of order, swap
      3  4  6  2  8
      
      Iteration 2:
      3  4  6  2  8
      3  4  6  2  8
      3  4  6  2  8  # 6 and 2 are out of order, swap
      3  4  2  6  8
      
      Iteration 3:
      3  4  2  6  8
      3  4  2  6  8  # 4 and 2 are out of order, swap
      3  2  4  6  8
      
      Iteration 4:
      3  2  4  6  8  # 3 and 2 are out of order, swap
      2  3  4  6  8
      
      Done
      

      10 comparisons are made in total.

      adaptive bubble sort (bubble-down)

      Iteration 1:
      4  3  6  8  2  # 8 and 2 are out of order, swap
      4  3  6  2  8  # 6 and 2 are out of order, swap
      4  3  2  6  8  # 3 and 2 are out of order, swap
      4  2  3  6  8  # 4 and 2 are out of order, swap
      2  4  3  6  8
      
      Iteration 2:
      2  4  3  6  8
      2  4  3  6  8
      2  4  3  6  8  # 4 and 3 are out of order, swap
      2  3  4  6  8
      
      Iteration 3:
      2  3  4  6  8
      2  3  4  6  8
      2  3  4  6  8
      
      Finished an iteration without swaps, done
      

      9 comparisons are made in total.

    3. insertion sort

      Iteration 1:
      4  3  6  8  2  # in this iteration, we "insert" 3
      4  3  6  8  2  # 4 and 3 are out of order, so we shift 3 down
      3  4  6  8  2  # reached bottom of array, so 3 must be in the right place
      
      Iteration 2:
      3  4  6  8  2  # in this iteration, we "insert" 6
      3  4  6  8  2  # 4 and 6 are in order, so we stop
      3  4  6  8  2
      
      Iteration 3:
      3  4  6  8  2  # in this iteration, we "insert" 8
      3  4  6  8  2  # 6 and 8 are in order, so we stop
      3  4  6  8  2
      
      Iteration 4:
      3  4  6  8  2  # in this iteration, we "insert" 2
      3  4  6  8  2  # 8 and 2 are out of order, so we shift 2 down
      3  4  6  2  8  # 6 and 2 are out of order, so we shift 2 down
      3  4  2  6  8  # 4 and 2 are out of order, so we shift 2 down
      3  2  4  6  8  # 3 and 2 are out of order, so we shift 2 down
      2  3  4  6  8  # reached bottom of array, so 2 must be in the right place
      
      Done
      

      7 comparisons are made in total.

  5. Merge sort is typically implemented as follows:

    void mergeSort(int A[], int lo, int hi) {
    	if (lo >= hi) return;
    	
    	int mid = (lo + hi) / 2;
    	mergeSort(A, lo, mid);
    	mergeSort(A, mid + 1, hi);
    	merge(A, lo, mid, hi);
    }
    
    void merge(int A[], int lo, int mid, int hi) {
    	int nitems = hi - lo + 1;
    	int *tmp = malloc(nitems * sizeof(int));
    	
    	int i = lo;
    	int j = mid + 1;
    	int k = 0;
    	
    	// scan both segments into tmp
    	while (i <= mid && j <= hi) {
    		if (A[i] <= A[j]) {
    			tmp[k++] = A[i++];
    		} else {
    			tmp[k++] = A[j++];
    		}
    	}
    	
    	// copy items from unfinished segment
    	while (i <= mid) tmp[k++] = A[i++];
    	while (j <= hi)  tmp[k++] = A[j++];
    	
    	// copy items from tmp back to main array
    	for (i = lo, k = 0; i <= hi; i++, k++) {
    		A[i] = tmp[k];
    	}
    	free(tmp);
    }
    

    Suppose that at a particular point during the execution of the mergeSort function, the array A looks like this:

    { 1, 4, 5, 6, 7, 2, 3, 4, 7, 9 }
    

    Show how the call merge(A, 0, 4, 9) would merge the elements of the two sorted subarrays into a single sorted array.

    Answer:

    At the beginning of the function, a temporary array is created to sort the elements (before they are copied back into the original array at the end).

    Then, three index variables are created: i is 0, which is the index into the beginning of the first sorted subarray, j is 5, which is the index into the beginning of the second sorted subarray, and k is 0, which is just an index into tmp so we know how many elements we have inserted into tmp so far (and where the next element should be inserted).

    A:   { 1, 4, 5, 6, 7, 2, 3, 4, 7, 9 }
           ^              ^
           i              j
    
    tmp: {  ,  ,  ,  ,  ,  ,  ,  ,  ,   }
           ^
           k
    

    Now we enter the first loop, which does the majority of the merging. The loop compares the smallest elements of each subarray that haven't been consumed yet (i.e., A[i] and A[j]), and inserts the smaller of the two elements into tmp. If the two elements are the same, it takes the element from the first subarray - this ensures stability. It continues to do this until one of the subarrays has been exhausted.

    A[i] (1) is less than A[j] (2) - so copy A[i] into tmp and advance i and k
    
    A:   { 1, 4, 5, 6, 7, 2, 3, 4, 7, 9 }
              ^           ^
              i           j
    
    tmp: { 1,  ,  ,  ,  ,  ,  ,  ,  ,   }
              ^
              k
    
    A[j] (2) is less than A[i] (4) - so copy A[j] into tmp and advance j and k
    
    A:   { 1, 4, 5, 6, 7, 2, 3, 4, 7, 9 }
              ^              ^
              i              j
    
    tmp: { 1, 2,  ,  ,  ,  ,  ,  ,  ,   }
                 ^
                 k
    
    A[j] (3) is less than A[i] (4) - so copy A[j] into tmp and advance j and k
    
    A:   { 1, 4, 5, 6, 7, 2, 3, 4, 7, 9 }
              ^                 ^
              i                 j
    
    tmp: { 1, 2, 3,  ,  ,  ,  ,  ,  ,   }
                    ^
                    k
    
    A[i] (4) is equal to  A[j] (4) - so copy A[i] into tmp and advance i and k
    
    A:   { 1, 4, 5, 6, 7, 2, 3, 4, 7, 9 }
                 ^              ^
                 i              j
    
    tmp: { 1, 2, 3, 4,  ,  ,  ,  ,  ,   }
                       ^
                       k
    
    A[j] (4) is less than A[i] (5) - so copy A[j] into tmp and advance j and k
    
    A:   { 1, 4, 5, 6, 7, 2, 3, 4, 7, 9 }
                 ^                 ^
                 i                 j
    
    tmp: { 1, 2, 3, 4, 4,  ,  ,  ,  ,   }
                          ^
                          k
    

    ...

    A[i] (7) is equal to  A[j] (7) - so copy A[i] into tmp and advance i and k
    
    A:   { 1, 4, 5, 6, 7, 2, 3, 4, 7, 9 }
                          ^        ^
                          i        j
    
    tmp: { 1, 2, 3, 4, 4, 5, 6, 7,  ,   }
                                   ^
                                   k
    

    The first subarray has been exhausted, which means the condition (i <= mid) is false, so the loop stops executing. (If the second subarray was exhausted instead, then the condition (j <= hi) would be false, so the loop would also stop executing.) The next two loops simply copy the remaining elements from the non-exhausted subarray into tmp. In this example, the first subarray was exhausted, so the first loop doesn't actually do anything - the second loop copies the remaining elements from the second subarray into tmp.

    A:   { 1, 4, 5, 6, 7, 2, 3, 4, 7, 9 }
                          ^              ^
                          i              j
    
    tmp: { 1, 2, 3, 4, 4, 5, 6, 7, 7, 9 }
                                         ^
                                         k
    

    Notice that tmp contains all of the elements of the entire subarray in sorted order. All that needs to be done now is to copy the elements from tmp back into the original array. tmp is no longer needed, so it is freed.

    A:   { 1, 2, 3, 4, 4, 5, 6, 7, 7, 9 }
    
  6. An important operation in the quicksort algorithm is partition, which takes a element called the pivot, and reorganises the elements in the subarray A[lo..hi] such that all elements to the left of the pivot in the subarray are less than or equal to the pivot, and all elements to the right of the pivot in the subarray are greater than the pivot.

    Here is one possible implementation of partition:

    int partition(int A[], int lo, int hi) {
    	int pivot = A[lo]; // pivot
    	int l = lo + 1;
    	int r = hi;
    	while (l < r) {
    		while (l < r && A[l] <= pivot) l++;
    		while (l < r && A[r] >= pivot) r--;
    		if (l == r) break;
    		swap(A, l, r);
    	}
    	int m = A[l] <= pivot ? l : l - 1;
    	swap(A, lo, m);
    	return m;
    }
    

    Show how the call partition(A, 0, 9) would partition each of these arrays. What index is returned by each of these calls?

    1. { 5, 3, 9, 6, 4, 2, 9, 8, 1, 7 }
    2. { 5, 9, 8, 7, 6, 0, 1, 2, 3, 4 }
    3. { 0, 1, 2, 3, 4, 5, 6, 7, 8, 9 }
    4. { 9, 8, 7, 6, 5, 4, 3, 2, 1, 0 }

    Answer:

    1. { 4, 3, 1, 2, 5, 6, 9, 8, 9, 7 } => returns index 4
    2. { 0, 4, 3, 2, 1, 5, 6, 7, 8, 9 } => returns index 5
    3. { 0, 1, 2, 3, 4, 5, 6, 7, 8, 9 } => returns index 0
    4. { 0, 8, 7, 6, 5, 4, 3, 2, 1, 9 } => returns index 9
  7. It is easy to check whether an array of integers is sorted. A trickier problem is to determine whether a sorting algorithm produced a stable sort. Assume that the array is composed of Items with two fields, as in:

    typedef struct {
    	int a;
    	int b;
    } Item;
    

    Assume also that the array has been sorted on the Item.a field.

    Given the final sorted array alone, you cannot determine whether the sort was stable. We also assume that the sorting client keeps a copy of the original array, which is available for checking.

    Given the above, write a function that takes the original array, the sorted array, and determines whether the sort was stable. The function has the following interface:

    int isStableSort(Item original[], Item sorted[], int size) { ... }
    

    Answer:

    A simple solution would be to run a sorting algorithm that is known to be stable (such as bubble sort, insertion sort, or merge sort) on the original array, and check whether the output of this algorithm matches the given sorted array. If they match, then the sort was stable (as a stable sorting algorithm has only one possible output). If they don't, then the sort was unstable.

    Here is a more inefficient solution, which uses the definition of stability to check that items with the same sort key remain in the same relative order.

    bool isStableSort(Item original[], Item sorted[], int size) {
        int i, j, k, key;
        
        for (i = 0; i < size; i++) {
            key = sorted[i].a;
            
            j = 0; k = 0;
            
            while (j < size && k < size) {
                for (; j < size; j++)
                    if (sorted[j].a == key) break;
                
                for (; k < size; k++)
                    if (original[k].a == key) break;
                    
                if (j < size && k < size) {
                    if (sorted[j].b != original[k].b)
                        return false;
                        
                    j++; k++;
                }
            }
        }
        return true;
    }
    

    Here is an efficient solution which uses a hash table that maps the type of the key (in this case, an integer) to a Queue.

    bool isStableSort(Item original[], Item sorted[], int size) {
    	HashTable ht = HashTableNew();
    
    	for (int i = 0; i < size; i++) {
    		if (!HashTableContains(ht, original[i].a)) {
    			HashTableSet(ht, original[i].a, QueueNew());
    		}
    		Queue q = HashTableGet(ht, original[i].a);
    		QueueEnqueue(q, original[i].b);
    	}
    	
    	for (int i = 0; i < size; i++) {
    		Queue q = HashTableGet(ht, sorted[i].a);
    		if (QueueDequeue(q) != sorted[i].b) {
    			return false;
    		}
    	}
    	
    	return true;
    }
    
  8. Write a version of selection sort that sorts a linked list. The sort should modify the original list - do not create any new nodes. The function has the following signature:

    typedef struct Node *List;
    struct Node {
    	int value;
    	List next;
    };
    
    List selectionSort(List ls) { ... }
    

    Answer:

    Keeping the spirit of array-based selection sort, each iteration of this algorithm does the following:

    • find the largest element remaining in the original list
    • remove the largest element from the original list
    • insert the element at the front of the new list

    List selectionSort(List ls) {
    	List old = ls;
    	List sorted = NULL; // result
    	
    	while (old != NULL) {
    		// find largest node in old list
    		List max = listMax(old);
    		
    		// unlink largest node in old list
    		old = listUnlinkNode(old, max);
    		
    		// add node to beginning of list
    		max->next = sorted;
    		sorted = max;
    	}
    	
    	return sorted;
    }
    
    // Returns a pointer to the node containing the largest
    // element of the given list.
    static List listMax(List ls) {
    	List max = ls;
    	for (List curr = ls; curr != NULL; curr = curr->next) {
    		if (curr->value > max->value) {
    			max = curr;
    		}
    	}
    	return max;
    }
    
    // Removes a given node from a list without freeing it.
    static List listUnlinkNode(List ls, List node) {
    	if (ls == node) {
    		return ls->next;
    	} else {
    		ls->next = listUnlinkNode(ls->next, node);
    		return ls;
    	}
    }
    
  9. Write a program that reads two sorted files and merges them onto standard output. The names of the files are provided as command-line arguments.

    ./merge File1 File2
    

    If either file is not readable, exit with the error message:

    Invalid input file(s).
    

    Answer:

    int main(int argc, char *argv[]) {
    	if (argc != 3) {
    		fprintf(stderr, "Usage: ./merge File1 File2\n");
    		return 1; // error code
    	}
    	FILE *in1 = fopen(argv[1], "r");
    	FILE *in2 = fopen(argv[2], "r");
    	if (in1 == NULL || in2 == NULL) {
    		fprintf(stderr, "Invalid input file(s).\n");
    		return 1; // error code
    	}
    	
    	// now start merging
    	char line1[MAXLINE]
    	char line2[MAXLINE];
    	bool more1 = fgets(line1, MAXLINE, in1);
    	bool more2 = fgets(line2, MAXLINE, in2);
    	while (more1 && more2) {
    		int diff = strcmp(line1, line2);
    		if (strcmp(line1, line2) <= 0) {
    			fputs(line1, stdout);
    			more1 = fgets(line1, MAXLINE, in1);
    		} else {
    			fputs(line2, stdout);
    			more2 = fgets(line2, MAXLINE, in2);
    		}
    	}
    	while (fgets(line1, MAXLINE, in1) != NULL) fputs(line1, stdout);
    	while (fgets(line2, MAXLINE, in2) != NULL) fputs(line2, stdout);
    	fclose(in1);
    	fclose(in2);
    }
    
  10. Write a program that reads data about students in the form:

    5059413  Daisy  3762  15
    3461045  Dotty  3648  42
    3474881  Daisy  8543  16
    5061020  David  3970   3
    

    from standard input, and then:

    • stores it in an array of Student structs {zid, name, prog, favnum}
    • uses qsort() to sort it, making use of the stuCmp() function (see below)
    • order is initially by name, and then by zID if the names are the same
    • writes out sorted array, one Student per line
    • use the following format for printing students "%7d %-20s %4d %d\n"

    You can assume that all of the input lines contain valid student entries, and that the MAXxxx constants are defined, and large enough to deal with any input data. You do not need to implement the stuCmp() function; just assume it exists.

    Use the following template for the program:

    typedef struct student {
    	int  zid;            // 7-digit number
    	char name[MAXNAME];  // names are stored *within* the struct
    	int  prog;           // 4-digit number
    	int  favnum;         // favourite number
    } Student;
     
    // return -ve if a < b, +ve if a > b, 0 if a == b
    int stuCmp(const void *a, const void *b);
     
    int main(int argc, char *argv[]) {
    	Student students[MAXSTUDENTS];
    
    	// read stdin line-by-line into students[]
    
    	// sort the students[] array
    
    	// print the contents of the students[] array
    
    }
    

    Answer:

    int main(int argc, char *argv[]) {
    	Student students[MAXSTUDENTS];
    
    	// read stdin line-by-line into students[]
    	int i = 0;
    	while (scanf("%d %s %d %d",
    	             &students[i].zid, students[i].name,
    	             &students[i].prog, &students[i].favnum) == 4)
    	{
    		i++;
    	}
    
    	// sort the students[] array
    	int nstudents = i;
    	qsort(students, nstudents, sizeof(Student), stuCmp);
    
    	// print the contents of the students[] array
    	for (i = 0; i < nstudents; i++) {
    		printf("%7d %-20s %4d %d\n",
    		       students[i].zid, students[i].name, students[i].prog,
    		       students[i].favnum);
    	}
    }