Week 09 Tutorial Questions
Objectives
-
The assignment specification doesn't fully explain the assignment - what can I do?
-
How hard are the subsets?
-
What does git init do?
How does this differ from grip-init?
-
What do git add file and grip-add file do?
-
What is the index in grip (and git), and where does it get stored?
-
What is a commit in grip (and git), and where does it get stored?
-
Apart from the grip-* scripts what else do you need to submit (and give an example)?
-
You work on the assignment for a couple of hour tonight.
What do you need to do when you are finished? -
Write a Python program,
tags.py
which given the URL of a web page fetches it by running wget(1) and prints the HTML tags it uses.The tag should be converted to lower case and printed in alphabetical order with a count of how often each is used.
Don't count closing tags.
Make sure you don't print tags within HTML comments.
./tags.py https://www.cse.unsw.edu.au a 141 body 1 br 14 div 161 em 3 footer 1 form 1 h2 2 h4 3 h5 3 head 1 header 1 hr 3 html 1 img 12 input 5 li 99 link 3 meta 4 noscript 1 p 18 script 14 small 3 span 3 strong 4 title 1 ul 25
Note the counts in the above example will not be current - the CSE pages change almost daily.
-
Add an
-f
option totags.py
which indicates the tags are to be printed in order of frequency../tags.py -f https://www.cse.unsw.edu.au head 1 noscript 1 html 1 form 1 title 1 footer 1 header 1 body 1 h2 2 hr 3 h4 3 span 3 link 3 small 3 h5 3 em 3 meta 4 strong 4 input 5 img 12 br 14 script 14 p 18 ul 25 li 99 a 141 div 161
-
Modify tags.py to use the
requests
andbeautifulsoup4
modules. -
If you fell like a harder challenge after finishing the challenge activity in the lab this week have a look at the following websites for some problems to solve using regexp: