COMP9319 Revision Exercises

Question numbering : As part II of the exercises from Ex1 (Q1-4) to form a complete set of revision exercises.

Question 5

Derive the Burrows Wheeler transform BWT(S) of character sequence S:

database

Derive the Move-to-Front transform of the BWT(S) from part (a), assuming we use the 255 ASCII symbols as the symbol table.

Derive and recover the original character sequence S for the BWT(S):

arbbr$aa

where $ is the pseudo end of sequence symbol.

Question 6

Given the text string chromosome$ where $ is the pseudo end of sequence symbol. To construct the suffix array for the given text string in linear time, we need to identify two types of suffixes (namely, S type and L type). Show the steps in the linear time algorithm required to construct the suffix array for the given text string.
Identify the type of each suffix as part of your steps.
What is the final output suffix array for the given text string?
Derive the BWT of the given text string from the suffix array from part(a).
Suppose BWT(S) = ooolooooolwml$ where $ is the pseudo end of sequence symbol. Find the number of matches for pattern loo using backward search. Show all the steps involved.
Derive and recover the original text string S for the BWT(S): ooolooooolwml$ where $ is the pseudo end of sequence symbol (assume the sorting of these symbols is based on their ASCII representation).

Question 7

Derive a regular expression equivalent to the DFA:
Draw a DFA equivalent to the regular expression a*b(a|b)*

Question 8

Given a text string T, Run-length FM-index is a variant of FM-index or BWT by replacing BWT(T) with a sequence S and two bit vectors B and B'. What are the corresponding S, B and B' of the BWT encoded string ooolooooolwml$ ?

Question 9

Explain the differences between the two types of XML parser, namely SAX parser and DOM parser.
Will these two XPath expressions /bib//book[year > 2000]/author and /bib[*//year > 2000]//book/author always produce the same result on any XML documents? If not, will one of them always produce a subset of the result from the other one. Explain your answer.

Question 10

Using the following example XML fragment and its corresponding XML tree, explain how ISX scheme can achieve efficient and yet queriable XML compression:

A Dataguide is a simple index for semistructured data. Draw the Dataguide for the XML tree above.

Question 11

Using the example XML fragment from Question 10, explain how XBW transform works. In particular, explain how to match a sub-path in an XML tree efficient using XBW transform.

Question 12

Using the regular path expression a.b+.c and the following distributed database as an example, describe how queries on distributed semistructured data can be evaluated efficiently (with the cost of network communications minimised). Please show intermediate steps/results.