Week 10 Tutorial Answers

Heaps and Tries

Heaps
  1. Consider the series of heap operations below. Assuming that higher values have higher priority, show the state of the heap (in both binary tree form and array form) after each operation.

    Heap h;
    Item it;
    h = heapNew();
    
    heapInsert(h, 10);
    heapInsert(h,  5);
    heapInsert(h, 15);
    heapInsert(h , 3);
    heapInsert(h, 16);
    heapInsert(h, 13);
    heapInsert(h,  6);
    it = heapDelete(h);
    heapInsert(h,  2);
    it = heapDelete(h);
    it = heapDelete(h);
    it = heapDelete(h);
    it = heapDelete(h);
    it = heapDelete(h);
    

    Answer:

    Here is the state of the heap (in binary tree form) after each operation:

    Here is the state of the heap (in array form) after each operation:

    Operation            [0] [1] [2] [3] [4] [5] [6] [7] [8]
    h = heapNew();        -   -   -   -   -   -   -   -   -
    heapInsert(h, 10);    -  10   -   -   -   -   -   -   -
    heapInsert(h,  5);    -  10   5   -   -   -   -   -   -
    heapInsert(h, 15);    -  15   5  10   -   -   -   -   -
    heapInsert(h,  3);    -  15   5  10   3   -   -   -   -
    heapInsert(h, 16);    -  16  15  10   3   5   -   -   -
    heapInsert(h, 13);    -  16  15  13   3   5  10   -   -
    heapInsert(h,  6);    -  16  15  13   3   5  10   6   -
    it = heapDelete(h);   -  15   6  13   3   5  10   -   -
    heapInsert(h,  2);    -  15   6  13   3   5  10   2   -
    it = heapDelete(h);   -  13   6  10   3   5   2   -   -
    it = heapDelete(h);   -  10   6   2   3   5   -   -   -
    it = heapDelete(h);   -   6   5   2   3   -   -   -   -
    it = heapDelete(h);   -   5   3   2   -   -   -   -   -
    it = heapDelete(h);   -   3   2   -   -   -   -   -   -
    
  2. Suppose you are given an array \(A\) of size \(n\), and you would like to find the largest \(k\) elements (\(k \le n\)). For example, if given the array \(A = [7, 4, 2, 5, 9]\) and the value \(k = 3\), the largest \(k\) values are 9, 7 and 5.

    1. A simple way to solve this problem is to sort the array in increasing order, and then the largest \(k\) elements would be the last \(k\) elements of the sorted array. What is the worst-case time complexity of this approach?
    2. Come up with a more efficient method that uses a heap. What is the worst-case time complexity of this method?

    Answer:

    1. The worst-case time complexity is dominated by the cost of sorting the array, which is \(O(n \log n)\).
    2. A simple approach is to insert all the elements into a max heap, and then delete from it \(k\) times to obtain the largest \(k\) elements. Inserting all \(n\) elements into a max heap is \(O(n \log n)\), and deleting from it \(k\) times is \(O(k \log n)\). Therefore, the time complexity of this approach is \(O(n \log n)\), which is not more efficient than the sorting approach.

      To make this more efficient, we can use a min heap instead and start deleting from the heap as soon as it contains more than \(k\) elements. This works because if we keep deleting the smallest element, then the heap will always contain the largest elements we have seen so far. Here is the pseudocode for this approach:

      topK:
      	Input: array A of size n
      	       integer k
      
      	h = new min heap
      	for i from 0 up to n - 1:
      		insert A[i] into h
      		if i >= k:
      			delete from h
      

      The worst-case time complexity of this approach is \(O(n \log k)\), because whenever we insert into or delete from the heap, it will contain at most \(k + 1\) elements, which means insertion/deletion is \(O(\log k)\). This is more efficient than the sorting approach.

Tries
  1. Consider the following trie:

    Show which nodes would be visited when searching for each of the following words:

    1. now
    2. the
    3. ant
    4. access
    5. actor
    6. action
    7. actors
    8. tang
    9. a

    Answer:

    1. now → 2, 6, 13
    2. the → 3, 8, 16
    3. ant → 1, 5, 12
    4. access → 1, 4, 10, 17, 22, 24
    5. actor → 1, 4, 11, 18, 23
    6. action → 1, 4, 11, fails (no "i")
    7. actors → 1, 4, 11, 18, 23, fails (leaf)
    8. tang → 3, 7, 15, fails (leaf)
    9. a → 1, fails (not red)
  2. Under what circumstances would inserting a new word into a trie not result in adding any new nodes? What would be the effect on the tree of inserting the word?

    Answer:

    If the new word is a prefix of an existing word in the trie, then it won't require a new node to be added. The node representing the last character in the word will simply be turned into a finishing (red) node.

  3. Show the final trie resulting from inserting the following words into an initially empty trie.

    so   boo   jaws   boon   boot   axe   jaw   boots   sore
    

    Does the order that the words are inserted affect the shape of the tree?

    Answer:

    Final trie:

    The order of insertion does not affect the final shape of the tree.

  4. What are the two cases when deleting a word from a trie? How is each case handled?

    Answer:

    1. The word ends at an internal node

      The finishing (red) node is simply changed into a non-finishing (black) node

    2. The word ends at a leaf node

      The finishing (red) node is unlinked from its parent and removed (freed). If the parent of that node is a non-finishing leaf node, it is also removed from the tree. Repeat until we reach a finishing node or a non-leaf node.

  5. Autocomplete is a feature in which an application predicts the rest of a word a user is typing. For example, if a user has typed in "th", then autocomplete may suggest the words "the", "they", "then", and so on.

    1. Explain how a trie could be used to implement basic autocomplete, which suggests all possible words.
    2. Users are much more likely to type some words than others, so ideally autocomplete should suggest more likely words. For example, if a user has typed "tho", autocomplete should suggest "though" instead of "thou". How could you extend your solution so that autocomplete suggests more likely words?

    Answer:

    1. As a pre-processing step, store all the words of the language into a trie. Then, once the user has typed something, traverse down to the node corresponding to what was typed. Then all the words in this node's sub-trie are possible suggestions.
    2. We need some way to determine the likeliness of each word, and this could be achieved by storing a count of how often each word is typed, and incrementing this count whenever the user finishes typing a word. The higher the count, the more likely the user is to type the word. Then, when the user types something, instead of suggesting all possible words, suggest the top three (or so) words with the highest counts.

      This requires searching through an entire sub-trie each time the user types something, which could be inefficient. One way to optimise this would be to store the top three suggestions as data in each trie node and update these as the counts are updated. For example, in the node corresponding to "t", store the top three words starting with "t", in the node corresponding to "th", store the top three words starting with "th", and so on.

  6. It's the year 2521, and space exploration has uncovered an alien language (which just so happens to use the same letters as English 😉️)! A team of linguists has found that these aliens loved poetry, and that two words are said to rhyme with degree \(k\) if the last \(k\) letters of each word are identical. For example, the words GOODRA and QUOLODRA rhyme with degree 4, as the last 4 letters of these words are identical. These words also rhyme with degree 1, 2 and 3.

    To help investigate this poetry and write some of your own, you would like to devise a method to find the number of words that rhyme with a given word with degree \(k\).

    For example, suppose the words of the language are ARON, AGGRON, SQUARON, RONA, LAVARON and UDON. Then:

    • The number of words that rhyme with SQUARON with degree 2 is 4 (the words are ARON, AGGRON, LAVARON and UDON)
    • The number of words that rhyme with SQUARON with degree 3 is 3 (the words are ARON, AGGRON and LAVARON)
    • The number of words that rhyme with SQUARON with degree 4 is 2 (the words are ARON and LAVARON)
    • The number of words that rhyme with SQUARON with degree 5 is 0.

    Can you come up with a method to efficiently answer queries like the ones above? Suppose you are allowed to preprocess the words of the language before answering any queries.

    Answer:

    One way to achieve this efficiently is to first store the reverse of each word into a trie. For example, the trie for the above words would look like this:

    To find the number of words that rhyme with a word \(w\) with degree \(k\), traverse down to the node corresponding to the last \(k\) letters of \(w\), then count the number of finishing nodes in that sub-trie. For example, to find the number of words that rhyme with SQUARON with degree 3, traverse down to the node corresponding to NOR. The number of finishing nodes in this sub-trie is 4, so the number of rhyming words is 3 (not including SQUARON itself).

    This is still slightly inefficient, as it requires a traversal of an entire sub-trie. It can be improved by augmenting each node with the count of the number of finishing nodes in that node's sub-trie, like so:

    These counts can be calculated in one pass after the trie has been constructed, or they can also be easily maintained while inserting new words (by incrementing the count in all nodes along the insertion path of a word).

    Now, instead of traversing the sub-trie to count the number of finishing nodes, one can simply return the count stored in the node (minus one), so each query can be answered in \(O(k)\) time.