Week 01 Laboratory Exercises

Objectives

  • Understanding regular expressions
  • Understanding use of UNIX filters (grep)

Preparation

Before the lab you should re-read the relevant lecture slides and their accompanying examples.

Getting Started

Set up for the lab by creating a new directory called lab01 and changing to this directory.
mkdir lab01
cd lab01

There are some provided files for this lab which you can fetch with this command:

2041 fetch lab01

If you're not working at CSE, you can download the provided files as a zip file or a tar file.

Exercise:
grep-ing a Dictionary

You have been given a file named dictionary_answers.txt.
Which you must use to enter the answers for this exercise.

The autotest scripts depend on the format of dictionary_answers.txt.
So just add your answers where indicated but don't otherwise change the file.

    Open a text editor (gedit) in the background (&) and not owned by the current terminal (disown)
    gedit dictionary_answers.txt & disown
    Or use any other text editor of your choosing

On most Unix systems you will find one or more dictionaries containing many thousands of words:
Typically in the directory /usr/share/dict/

    ls -1 /usr/share/dict/
    american-english
    american-english-huge
    american-english-insane
    american-english-large
    american-english-small
    british-english
    british-english-huge
    british-english-insane
    british-english-large
    british-english-small
    cracklib-small
    words -> /etc/dictionaries-common/words -> /usr/share/dict/american-english

We've created an example dictionary named dictionary.txt for this lab exercise.

  1. Write a grep -E command that prints the words which contain the characters "lmn" consecutively.

    The COMP2041 class account contains a script named autotest that automatically runs tests on your lab exercises.

    Once you have entered you answer you can check it like this:

                2041 autotest dictionary Q1
                Test Q1 (dictionary Q1) - passed
                1 tests passed 0 tests failed
            
  2. Write a grep -E command that prints the words which contain any four consecutive vowels.

    Once you have entered you answer you can check it like this:

                2041 autotest dictionary Q2
                Test Q2 (dictionary Q2) - passed
                1 tests passed 0 tests failed
            
  3. Write a grep -E command that prints the words which contain all 5 vowels "aeiou" in that order.

    The words may contain more than 5 vowels but they must contain "aeiou" in that order.

    Once you have entered you answer you can check it like this:

                2041 autotest dictionary Q3
                Test Q3 (dictionary Q3) - passed
                1 tests passed 0 tests failed
            
  4. Write a grep -E command that prints the words which contain the vowels "aeiou", in that order, and no other vowels.

    Once you have entered you answer you can check it like this:

                2041 autotest dictionary Q4
                Test Q4 (dictionary Q4) - passed
                1 tests passed 0 tests failed
            

When you think your program is working, you can use autotest to run some simple automated tests:

2041 autotest dictionary 

When you are finished working on this exercise, you must submit your work by running give:

give cs2041 lab01_dictionary dictionary_answers.txt

before Tuesday 27 February 12:00 (midday) (2024-02-27 12:00:00) to obtain the marks for this lab exercise.

Exercise:
grep-ing Federal Parliament

You have been given a file named parliament_answers.txt.
Which you must use to enter the answers for this exercise.

The autotest scripts depend on the format of parliament_answers.txt.
So just add your answers where indicated but don't otherwise change the file.

    Open a text editor (gedit) in the background (&) and not owned by the current terminal (disown)
    gedit parliament_answers.txt & disown
    Or use any other text editor of your choosing

In this exercise you will analyze a file named parliament.txt containing a list of the members of the Australian House of Representatives (MPs).

  1. Write a grep -E command that will print all the lines in the file where the electorate begins with 'W'.

    Once you have entered you answer you can check it like this:

                2041 autotest parliament Q1
                Test Q1 (parliament Q1) - passed
                1 tests passed 0 tests failed
            
  2. Write a grep -E command that will print all the lines in the file where the MP's given name (first name) is "Andrew".

    Once you have entered you answer you can check it like this:

                2041 autotest parliament Q2
                Test Q2 (parliament Q2) - passed
                1 tests passed 0 tests failed
            
  3. Write a grep -E command that will print all the lines in the file where the MP's surname (last name) ends in the letters 'll'.

    Once you have entered you answer you can check it like this:

                2041 autotest parliament Q3
                Test Q3 (parliament Q3) - passed
                1 tests passed 0 tests failed
            
  4. Write a grep -E command that will print all the lines in the file where the MP's surname (last name) and the electorate name ends in the letter 'y'.

    Once you have entered you answer you can check it like this:

                2041 autotest parliament Q4
                Test Q4 (parliament Q4) - passed
                1 tests passed 0 tests failed
            
  5. Write a grep -E command that will print all the lines in the file where the MP's surname (last name) or the electorate name ends in the letter 'y'.

    Once you have entered you answer you can check it like this:

                2041 autotest parliament Q5
                Test Q5 (parliament Q5) - passed
                1 tests passed 0 tests failed
            
  6. Write a grep -E command that will print all the lines in the file where there is any word in the MP's name or the electorate name that ends in "ng".

    Once you have entered you answer you can check it like this:

                2041 autotest parliament Q6
                Test Q6 (parliament Q6) - passed
                1 tests passed 0 tests failed
            
  7. Write a grep -E command that will print all the lines in the file where the MP's surname (last name) both begins and ends with a vowel.

    Once you have entered you answer you can check it like this:

                2041 autotest parliament Q7
                Test Q7 (parliament Q7) - passed
                1 tests passed 0 tests failed
            
  8. Write a grep -E command that will print all the lines in the file where the electorate name contains multiple words (separated by spaces or hyphens).

    Once you have entered you answer you can check it like this:

                2041 autotest parliament Q8
                Test Q8 (parliament Q8) - passed
                1 tests passed 0 tests failed
            

When you think your program is working, you can use autotest to run some simple automated tests:

2041 autotest parliament 

When you are finished working on this exercise, you must submit your work by running give:

give cs2041 lab01_parliament parliament_answers.txt

before Tuesday 27 February 12:00 (midday) (2024-02-27 12:00:00) to obtain the marks for this lab exercise.

Challenge Exercise:
Exploring Regular Expressions

You have been given a file named ab_answers.txt.
Which you must use to enter the answers for this exercise.

The autotest scripts depend on the format of ab_answers.txt.
So just add your answers where indicated but don't otherwise change the file.

    Open a text editor (gedit) in the background (&) and not owned by the current terminal (disown)
    gedit ab_answers.txt & disown
    Or use any other text editor of your choosing

Use grep -E to test your answers to these questions.

We've provided a set of test cases in input.txt

  1. Write a grep -E command that prints the lines in a file named input.txt containing at least one A and at least one B.

    Matching AB BA ABBA BANANA Andrew's favourite Band is not
    Not Matching A B AA Andrew George is Brilliant

    Once you have entered you answer you can check it like this:

                2041 autotest ab Q1
                Test Q1 (ab Q1) - passed
                1 tests passed 0 tests failed
            
  2. Write a grep -E command that prints the lines in a file named input.txt containing only the characters A and B
    such that all pairs of adjacent A's occur before any pairs of adjacent B's.

    In other words if there is pair of B's on the line, there can not be a pair of A's afterwards.

    Matching A ABBA ABAABAABAABBBBABB ABAAAAAAAAAABBA ABABABABA
    Not Matching BBAA ABBAA ABBABABABABAA ABBBAAA BBABABABABABABAA

    Once you have entered you answer you can check it like this:

                2041 autotest ab Q2
                Test Q2 (ab Q2) - passed
                1 tests passed 0 tests failed
            
  3. Write a grep -E command that prints the lines in a file named input.txt containing only the characters A and B such that the number of A's is divisible by 4.

    Matching AAAA BABABABAB AAAABBBBAAAA BBBAABBBBBAABBBAAAA B
    Not Matching A AAAAA ABABBBBBBBBBBBBBBBAAA AAAABBABBAAAA BBBAABBABBBAABBBAAAA

    Once you have entered you answer you can check it like this:

                2041 autotest ab Q3
                Test Q3 (ab Q3) - passed
                1 tests passed 0 tests failed
            

When you think your program is working, you can use autotest to run some simple automated tests:

2041 autotest ab 

When you are finished working on this exercise, you must submit your work by running give:

give cs2041 lab01_ab ab_answers.txt

before Tuesday 27 February 12:00 (midday) (2024-02-27 12:00:00) to obtain the marks for this lab exercise.

Submission

When you are finished each exercises make sure you submit your work by running give.

You can run give multiple times. Only your last submission will be marked.

Don't submit any exercises you haven't attempted.

If you are working at home, you may find it more convenient to upload your work via give's web interface.

Remember you have until Week 3 Tuesday 12:00:00 (midday) to submit your work.

You cannot obtain marks by e-mailing your code to tutors or lecturers.

You check the files you have submitted here.

Automarking will be run by the lecturer several days after the submission deadline, using test cases different to those autotest runs for you. (Hint: do your own testing as well as running autotest.)

After automarking is run by the lecturer you can view your results here. The resulting mark will also be available via give's web interface.

Lab Marks

When all components of a lab are automarked you should be able to view the the marks via give's web interface or by running this command on a CSE machine:

2041 classrun -sturec