Week 09 Tutorial Questions
Objectives
-
Below are the current assignment autotests.
Discuss what these print and why:
subset 0: quit
seq 42 44 | 2041 eddy 1q
2041 eddy 10q < dictionary.txt
seq 41 43 | 2041 eddy 4q
seq 90 110 | 2041 eddy /.1/q
2041 eddy '/r.*v/q' < dictionary.txt
yes | 2041 eddy 3q
subset 0: print
seq 41 43 | 2041 eddy 2p
head dictionary.txt | 2041 eddy 3p
seq 41 43 | 2041 eddy -n 2p
2041 eddy -n 42p < dictionary.txt
head -n 1000 dictionary.txt | 2041 eddy -n '/z.$/p'
subset 0: substitute
seq 1 5 | 2041 eddy 's/[15]/zzz/'
seq 1 5 | 2041 eddy 's/[15]/zzz/g'
echo "Hello Andrew" | 2041 eddy 's/e//'
echo "Hello Andrew" | 2041 eddy 's/e//g'
subset 1: addresses
seq 1 5 | 2041 eddy '$d'
seq 42 44 | 2041 eddy 2,3d
seq 10 21 | 2041 eddy 3,/2/d
seq 10 21 | 2041 eddy /2/,7d
seq 10 21 | 2041 eddy /2/,/7/d
subset 1: substitute
seq 1 5 | 2041 eddy 'sX[15]XzzzX'
subset 1: multiple commands
seq 1 5 | 2041 eddy '4q;/2/d'
subset 1: -f
echo "4q" > commands.script echo "/2/d" >> commands.script seq 1 5 | 2041 eddy -f commands.script
subset 1: input files
seq 1 2 > two.txt seq 1 5 > five.txt 2041 eddy '4q;/2/d' two.txt five.txt
subset 1: whitespace
seq 24 42 | 2041 eddy ' 3, 17 d # comment'
subset 2: -i
seq 1 5 > five.txt 2041 eddy -i /[24]/d five.txt cat five.txt
subset 2: multiple commands
echo 'Punctuation characters include . , ; :' | 2041 eddy 's/;/semicolon/g;/;/q'
-
Write a Python program,
tags.py
which given the URL of a web page fetches it by running wget(1) and prints the HTML tags it uses.The tag should be converted to lower case and printed in alphabetical order with a count of how often each is used.
Don't count closing tags.
Make sure you don't print tags within HTML comments.
./tags.py https://www.cse.unsw.edu.au a 141 body 1 br 14 div 161 em 3 footer 1 form 1 h2 2 h4 3 h5 3 head 1 header 1 hr 3 html 1 img 12 input 5 li 99 link 3 meta 4 noscript 1 p 18 script 14 small 3 span 3 strong 4 title 1 ul 25
Note the counts in the above example will not be current - the CSE pages change almost daily.
-
Add an
-f
option totags.py
which indicates the tags are to be printed in order of frequency../tags.py -f https://www.cse.unsw.edu.au head 1 noscript 1 html 1 form 1 title 1 footer 1 header 1 body 1 h2 2 hr 3 h4 3 span 3 link 3 small 3 h5 3 em 3 meta 4 strong 4 input 5 img 12 br 14 script 14 p 18 ul 25 li 99 a 141 div 161
-
Modify tags.py to use the
requests
andbeautifulsoup4
modules. -
If you fell like a harder challenge after finishing the challenge activity in the lab this week have a look at the following websites for some problems to solve using regexp: