Week 09 Tutorial Questions
Objectives
- Discuss how Python can be generated for the supplied examples for subsets 0-3
- Discuss the assignment specification and possible strategies for the assignment.
-
Write a Python program,
tags.pywhich given the URL of a web page fetches it by running wget(1) and prints the HTML tags it uses.The tags should be converted to lower case and printed in alphabetical order with a count of how often each is used.
Don't count closing tags.
Make sure you don't print tags within HTML comments.
./tags.py https://www.cse.unsw.edu.au a 141 body 1 br 14 div 161 em 3 footer 1 form 1 h2 2 h4 3 h5 3 head 1 header 1 hr 3 html 1 img 12 input 5 li 99 link 3 meta 4 noscript 1 p 18 script 14 small 3 span 3 strong 4 title 1 ul 25Note the counts in the above example will not be current - the CSE pages change almost daily.
-
Add an
-foption totags.pywhich indicates the tags are to be printed in order of frequency../tags.py -f https://www.cse.unsw.edu.au head 1 noscript 1 html 1 form 1 title 1 footer 1 header 1 body 1 h2 2 hr 3 h4 3 span 3 link 3 small 3 h5 3 em 3 meta 4 strong 4 input 5 img 12 br 14 script 14 p 18 ul 25 li 99 a 141 div 161 -
Modify
tags.pyto use therequestsandbeautifulsoup4modules. -
If you feel like a harder challenge after finishing the challenge activity in the lab this week have a look at the following websites for some problems to solve using regexp: