Week 10 Laboratory Exercises
Objectives
- Writing and use your own Python modules
- Exploring modules in Python
Preparation
Before the lab you should re-read the relevant lecture slides and their accompanying examples.
Getting Started
lab10
and changing to this directory.
mkdir lab10 cd lab10
There are some provided files for this lab which you can fetch with this command:
2041 fetch lab10
If you're not working at CSE, you can download the provided files as a zip file or a tar file.
Exercise:
DNA analysis in Python
Your task is to add code to the file dna.py
to do DNA analysis.
Don't worry you don't need to know anything about DNA, RNA or base pairs.
You have been given the file test_dna.py
that imports dna.py
and uses its functions to analyse a file. Do not change test_dna.py
.
Only change dna.py
You have been given 6 test files data1
.. data6
containing base pairs (again don't worry you don't need to know what base pair is).
The format of the base pair files is simple:
sed -n 1,3p data1 G <-> C T <-> A A <-> TBut note one or both element of a base pair may be missing.
grep -E '^ <->|<-> $' data3|head <-> A <-> T <-> A <-> G <-> A <-> G <-> <-> G <-> A <->Here is how
test_dna.py
will work when you've completed
the functions in dna.py
(code>test_dna.py imports dna.py
).
./test_dna.py data1 the file data1 is DNA there are 100 pairs in the file first 10 pairs: G <-> C T <-> A A <-> T G <-> C A <-> T G <-> C T <-> A C <-> G T <-> A A <-> T last 10 pairs: C <-> G T <-> A G <-> C T <-> A G <-> C G <-> C T <-> A C <-> G A <-> T T <-> A the most common base is Guanine
The docstrings of the functions in dna.py
give you more information
about how to complete each function.
def read_dna(dna_file):
"""
Read a DNA string from a file.
the file contains data in the following format:
A <-> T
G <-> C
G <-> C
C <-> G
G <-> C
T <-> A
Output a list of touples:
[
('A', 'T'),
('G', 'C'),
('G', 'C'),
('C', 'G'),
('G', 'C'),
('T', 'A'),
]
Where either (or both) elements in the string might be missing:
<-> T
G <->
G <-> C
<->
<-> C
T <-> A
Output:
[
('', 'T'),
('G', ''),
('G', 'C'),
('', ''),
('', 'C'),
('T', 'A'),
]
"""
pass
def is_rna(dna):
"""
Given DNA in the aforementioned format,
return the string "DNA" if the data is DNA,
return the string "RNA" if the data is RNA,
return the string "Invalid" if the data is neither DNA nor RNA.
DNA consists of the following bases:
Adenine ('A'),
Thymine ('T'),
Guanine ('G'),
Cytosine ('C'),
RNA consists of the following bases:
Adenine ('A'),
Uracil ('U'),
Guanine ('G'),
Cytosine ('C'),
The data is DNA if at least 90% of the bases are one of the DNA bases.
The data is RNA if at least 90% of the bases are one of the RNA bases.
The data is invalid if more than 10% of the bases are not one of the DNA or RNA bases.
Empty bases should be ignored.
"""
pass
def clean_dna(dna):
"""
Given DNA in the aforementioned format,
If the pair is incomplete, ('A', '') or ('', 'G'), ect
Fill in the missing base with the match base.
In DNA 'A' matches with 'T', 'G' matches with 'C'
In RNA 'A' matches with 'U', 'G' matches with 'C'
If a pair contains an invalid base the pair should be removed.
Pairs of empty bases should be ignored.
"""
pass
def mast_common_base(dna):
"""
Given DNA in the aforementioned format,
return the most common first base:
eg. given:
A <-> T
G <-> C
G <-> C
C <-> G
G <-> C
T <-> A
The most common first base is 'G'.
Empty bases should be ignored.
"""
pass
def base_to_name(base):
"""
Given a base, return the name of the base.
The base names are:
Adenine ('A'),
Thymine ('T'),
Guanine ('G'),
Cytosine ('C'),
Uracil ('U'),
return the string "Unknown" if the base isn't one of the above.
"""
pass
Download dna.py, or copy it to your CSE account using the following command:
cp -n /import/ravel/A/cs2041/public_html/24T1/activities/DNA/files.cp/dna.py dna.py
When you think your program is working,
you can use autotest
to run some simple automated tests:
2041 autotest DNA
When you are finished working on this exercise,
you must
submit your work by running give
:
give cs2041 lab10_DNA dna.py
before Monday 22 April 12:00 (midday) (2024-04-22 12:00:00) to obtain the marks for this lab exercise.
Challenge Exercise:
Bashful Python
We have some Shell (Bash) scripts that do arithmetic calculations that we need to translate to Python.
Write a Python program bashpy.py
which
takes such a Bash script on stdin and outputs an equivalent Python program.
The scripts use the arithmetic syntax supported by Bash (and several other shells). Fortunately, the scripts only use a very limited set of shell features.
You can assume all the features you need to translate are present in the following 4 examples.
-
sum.sh
sums a series of integers:cat sum.sh #!/bin/bash # sum the integers $start .. $finish start=1 finish=100 sum=0 i=1 while ((i <= finish)) do sum=$((sum + i)) i=$((i + 1)) done echo $sum ./sum.sh 5050 ./bashpy.py < sum.sh #!/usr/bin/env python3 # sum the integers $start .. $finish start = 1 finish = 100 sum = 0 i = 1 while i <= finish: sum = sum + i i = i + 1 print(sum) ./bashpy.py < sum.sh | python3 5050
-
double.sh
prints powers of two:cat double.sh #!/bin/bash # calculate powers of 2 by repeated addition i=1 j=1 while ((i < 31)) do j=$((j + j)) i=$((i + 1)) echo $i $j done ./double.sh 2 2 3 4 4 8 5 16 6 32 7 64 8 128 9 256 10 512 11 1024 12 2048 13 4096 14 8192 15 16384 16 32768 17 65536 18 131072 19 262144 20 524288 21 1048576 22 2097152 23 4194304 24 8388608 25 16777216 26 33554432 27 67108864 28 134217728 29 268435456 30 536870912 31 1073741824 ./bashpy.py < double.sh #!/usr/bin/env python3 # calculate powers of 2 by repeated addition i = 1 j = 1 while i < 31: j = j + j i = i + 1 print(i, j) ./bashpy.py < double.sh > double.py chmod +x double.py ./double.py 2 2 3 4 4 8 5 16 6 32 7 64 8 128 9 256 10 512 11 1024 12 2048 13 4096 14 8192 15 16384 16 32768 17 65536 18 131072 19 262144 20 524288 21 1048576 22 2097152 23 4194304 24 8388608 25 16777216 26 33554432 27 67108864 28 134217728 29 268435456 30 536870912 31 1073741824
-
pythagorean_triple.sh
searches for Pythagorean triples:cat pythagorean_triple.sh #!/bin/bash max=42 a=1 while ((a < max)) do b=$a while ((b < max)) do c=$b while ((c < max)) do if ((a * a + b * b == c * c)) then echo $a $b $c fi c=$((c + 1)) done b=$((b + 1)) done a=$((a + 1)) done ./bashpy.py < pythagorean_triple.sh #!/usr/bin/env python3 max = 42 a = 1 while a < max: b = a while b < max: c = b while c < max: if a * a + b * b == c * c: print(a, b, c) c = c + 1 b = b + 1 a = a + 1 ./bashpy.py < pythagorean_triple.sh | python3 3 4 5 5 12 13 6 8 10 7 24 25 8 15 17 9 12 15 9 40 41 10 24 26 12 16 20 12 35 37 15 20 25 15 36 39 16 30 34 18 24 30 20 21 29 21 28 35 24 32 40
-
collatz.sh
prints an interesting series:cat collatz.sh #!/bin/bash # https://en.wikipedia.org/wiki/Collatz_conjecture # https://xkcd.com/710/ n=65535 while ((n != 1)) do if ((n % 2 == 0)) then n=$((n / 2)) else n=$((3 * n + 1)) fi echo $n done ./bashpy.py <collatz.sh #!/usr/bin/env python3 # https://en.wikipedia.org/wiki/Collatz_conjecture # https://xkcd.com/710/ n = 65535 while n != 1: if n % 2 == 0: n = n // 2 else: n = 3 * n + 1 print(n) ./bashpy.py <collatz.sh | python3 196606 98303 294910 147455 442366 221183 663550 331775 995326 497663 1492990 746495 2239486 1119743 3359230 1679615 5038846 2519423 7558270 3779135 11337406 5668703 17006110 8503055 25509166 12754583 38263750 19131875 57395626 28697813 86093440 43046720 21523360 10761680 5380840 2690420 1345210 672605 2017816 1008908 504454 252227 756682 378341 1135024 567512 283756 141878 70939 212818 106409 319228 159614 79807 239422 119711 359134 179567 538702 269351 808054 404027 1212082 606041 1818124 909062 454531 1363594 681797 2045392 1022696 511348 255674 127837 383512 191756 95878 47939 143818 71909 215728 107864 53932 26966 13483 40450 20225 60676 30338 15169 45508 22754 11377 34132 17066 8533 25600 12800 6400 3200 1600 800 400 200 100 50 25 76 38 19 58 29 88 44 22 11 34 17 52 26 13 40 20 10 5 16 8 4 2 1
When you think your program is working,
you can use autotest
to run some simple automated tests:
2041 autotest bashpy
When you are finished working on this exercise,
you must
submit your work by running give
:
give cs2041 lab10_bashpy bashpy.py
before Monday 22 April 12:00 (midday) (2024-04-22 12:00:00) to obtain the marks for this lab exercise.
Challenge Exercise:
When Regular Expressions Aren't Regular
Write a regular expression that validates a JSON file.
In other words, write a regex that matches a string iff that string is valid JSON.
Here is a test program assist you in doing this:
#! /usr/bin/env python3
from sys import argv, stderr
import regex
regex.DEFAULT_VERSION = regex.V1
assert len(argv) == 3, f"Usage: {argv[0]} <json file> <regex file>"
json_file, regex_file = argv[1], argv[2]
try:
with open(json_file) as json_data, open(regex_file) as regex_data:
if regex.search(regex_data.read(), json_data.read(), timeout=5):
# In the test suite, all files that start with "y_" should be valid.
print(f"Valid JSON file: {json_file}")
else:
# In the test suite, all files that start with "n_" should be invalid.
print(f"Invalid JSON file: {json_file}")
except TimeoutError as e:
# Allow a timeout error to signal that the jason file is not valid
print(f"Invalid JSON file: {json_file}")
# This is printed to stderr so that it is not captured by the test
print(f"5 second time limit reached while reading {json_file}", file=stderr)
Download test_regex_json.py, or copy it to your CSE account using the following command:
cp -n /import/ravel/A/cs2041/public_html/24T1/activities/regex_json/test_regex_json.py test_regex_json.py
You have been given a directory JSONTestSuite
containing a number of JSON files.
There are two types of files in this directory:
Files starting with y_
are valid JSON files.
Files starting with n_
are invalid JSON files.
Put your solution in regex_json.txt
:
For example to test the regex ^.+$
chmod 755 test_regex_json.py unzip JSONTestSuite.zip cat regex_json.txt ^.+$ ./test_regex_json.py JSONTestSuite/y_array_heterogeneous.json regex_json.txt Valid JSON file: JSONTestSuite/y_array_heterogeneous.json ./test_regex_json.py JSONTestSuite/n_array_star_inside.json regex_json.txt Valid JSON file: JSONTestSuite/n_array_star_inside.json This should be Invalid so the regex is incorrect
If your solution is correct,
all files in the JSONTestSuite
starting with y_
should be labelled valid,
and all files starting with y_
should be labelled invalid,
When you think your program is working,
you can use autotest
to run some simple automated tests:
2041 autotest regex_json
When you are finished working on this exercise,
you must
submit your work by running give
:
give cs2041 lab10_regex_json regex_json.txt
before Monday 22 April 12:00 (midday) (2024-04-22 12:00:00) to obtain the marks for this lab exercise.
Submission
give
.
You can run give
multiple times.
Only your last submission will be marked.
Don't submit any exercises you haven't attempted.
If you are working at home, you may find it more convenient to upload your work via give's web interface.
Remember you have until Week 11 Monday 12:00:00 (midday) to submit your work.
You cannot obtain marks by e-mailing your code to tutors or lecturers.
You check the files you have submitted here.
Automarking will be run by the lecturer several days after the submission deadline,
using test cases different to those autotest
runs for you.
(Hint: do your own testing as well as running autotest
.)
After automarking is run by the lecturer you can view your results here. The resulting mark will also be available via give's web interface.
Lab Marks
When all components of a lab are automarked you should be able to view the the marks via give's web interface or by running this command on a CSE machine:
2041 classrun -sturec