Week 09 Tutorial Questions

Objectives

  1. Below are the current assignment autotests.

    Discuss what these print and why:

    subset 0: quit
        seq 42 44 | 2041 eddy 1q
    
        2041 eddy 10q < dictionary.txt
    
        seq 41 43 | 2041 eddy 4q
    
        seq 90 110 | 2041 eddy /.1/q
    
        2041 eddy '/r.*v/q' < dictionary.txt
    
        yes | 2041 eddy 3q
    
    subset 0: print
        seq 41 43 | 2041 eddy 2p
    
        head dictionary.txt | 2041 eddy 3p
    
        seq 41 43 | 2041 eddy -n 2p
    
        2041 eddy -n 42p < dictionary.txt
    
        head -n 1000 dictionary.txt | 2041 eddy -n '/z.$/p'
    
    subset 0: substitute
        seq 1 5 | 2041 eddy 's/[15]/zzz/'
    
        seq 1 5 | 2041 eddy 's/[15]/zzz/g'
    
        echo "Hello Andrew" | 2041 eddy 's/e//'
    
        echo "Hello Andrew" | 2041 eddy 's/e//g'
    
    subset 1: addresses
        seq 1 5 | 2041 eddy '$d'
    
        seq 42 44 | 2041 eddy 2,3d
    
        seq 10 21 | 2041 eddy 3,/2/d
    
        seq 10 21 | 2041 eddy /2/,7d
    
        seq 10 21 | 2041 eddy /2/,/7/d
    
    subset 1: substitute
        seq 1 5 | 2041 eddy 'sX[15]XzzzX'
    
    subset 1: multiple commands
        seq 1 5 | 2041 eddy '4q;/2/d'
    
    subset 1: -f
        echo "4q" > commands.script
        echo "/2/d" >> commands.script
        seq 1 5 | 2041 eddy -f commands.script
    
    subset 1: input files
        seq 1 2 > two.txt
        seq 1 5 > five.txt
        2041 eddy '4q;/2/d' two.txt five.txt
    
    subset 1: whitespace
        seq 24 42 | 2041 eddy ' 3, 17  d  # comment'
    
    subset 2: -i
        seq 1 5 > five.txt
        2041 eddy -i /[24]/d five.txt
        cat five.txt
    
    subset 2: multiple commands
        echo 'Punctuation characters include . , ; :' | 2041 eddy 's/;/semicolon/g;/;/q'
    
  2. Write a Python program, tags.py which given the URL of a web page fetches it by running wget(1) and prints the HTML tags it uses.

    The tag should be converted to lower case and printed in alphabetical order with a count of how often each is used.

    Don't count closing tags.

    Make sure you don't print tags within HTML comments.

        ./tags.py https://www.cse.unsw.edu.au
        a 141
        body 1
        br 14
        div 161
        em 3
        footer 1
        form 1
        h2 2
        h4 3
        h5 3
        head 1
        header 1
        hr 3
        html 1
        img 12
        input 5
        li 99
        link 3
        meta 4
        noscript 1
        p 18
        script 14
        small 3
        span 3
        strong 4
        title 1
        ul 25
    

    Note the counts in the above example will not be current - the CSE pages change almost daily.

  3. Add an -f option to tags.py which indicates the tags are to be printed in order of frequency.

        ./tags.py -f https://www.cse.unsw.edu.au
        head 1
        noscript 1
        html 1
        form 1
        title 1
        footer 1
        header 1
        body 1
        h2 2
        hr 3
        h4 3
        span 3
        link 3
        small 3
        h5 3
        em 3
        meta 4
        strong 4
        input 5
        img 12
        br 14
        script 14
        p 18
        ul 25
        li 99
        a 141
        div 161
    
  4. Modify tags.py to use the requests and beautifulsoup4 modules.

  5. If you fell like a harder challenge after finishing the challenge activity in the lab this week have a look at the following websites for some problems to solve using regexp: