COMP3311 25T2

Assignment 2
Python, PostgreSQL, psycopg2

Last updated: Tuesday 29th July 5:10pm
Most recent changes are shown in red ... older changes are shown in brown.

[Assignment Spec] [SQL Schema] [SQL Data] [Grades+Rules] [Examples] [Testing] [Submitting]

Aims

This assignment aims to give you practice in

implementing Python scripts to extract and display data from a database
[optionally (but recommended)] implementing a collection of Python functions to support the scripts
[optionally] implementing SQL views and PLpgSQL functions to support the scripts

You could complete this assignment with minimal use of SQL
But it is highly recommended that you use python for its intended purpose
Use SQL queries, views, and functions to filter and manipulate the data
Use Python to format and display the data

The goal is to build some useful data access operations on the mymyunsw database.

Summary

Marks:	This assignment contributes 15 marks toward your total mark for this course.
Submission:	via WebCMS3 or `give`, submit the files `q1.py`, `q2.py`, `q3.py`, `q4.py`, `q5.py`, `helpers.py`, `helpers.sql`
Deadline:	Monday 4 August 2025 @ 21:59:59
Late Penalty:	0.2 percent off the raw mark for each hour late, for 5 days Any submission after 5 days scores 0 marks This is the UNSW standard late penalty.

How to do this assignment:

read this specification carefully and completely
familiarise yourself with the database schema
create a directory for this assignment
copy the supplied files into this directory
login to vxdb2 and run your PostgreSQL server
create a database mymyunsw on vxdb2
load the provided SQL dump file into the database
explore the database
complete the tasks below by editing q1.py, q2.py, q3.py, q4.py, q5.py
test your work on vxdb2
submit your python scripts via WebCMS or give (you can submit multiple times, only your last submission will be marked)

And, of course, if you have PostgreSQL installed on your home machine, you can do all of your development there.
But don't forget to test it on vxdb2 before submitting.

Introduction

All Universities require a significant information infrastructure in order to manage their affairs. This typically involves a large commercial DBMS installation. UNSW's student information system sits behind the MyUNSW web site. MyUNSW provides an interface to a PeopleSoft enterprise management system with an underlying Oracle database. This back-end system (Peoplesoft/Oracle) is sometimes called NSS. The specific version of PeopleSoft that we use is called Campus Solutions.

Despite its successes, however, MyUNSW/NSS still has a number of deficiencies, including:

no easy way to swap classes once enrolled
no representation for degree program structures
poor integration with the UNSW Online Handbook

The first point is inconvenient, since it means that the only way for a student to change tute classes is to drop the course and re-enrol into the course, selecting th new tute. If the course is already full, students would be unwilling to drop the course in case someone else grabs their place before they can re-enrol.

The second point prevents MyUNSW/NSS from being used for three important operations that would be extremely helpful to students in managing their enrolment:

finding out how far they have progressed through their degree program, and what remains to be completed
checking what are their enrolment options for next semester (e.g. get a list of suggested courses)
determining when they have completed all of the requirements of their degree program and are eligible to graduate

Note: The people's data in this database is not real, and does not correspond to any real students or staff at UNSW. It is synthetic data, created to simulate the kinds of information that might be found in a real database.

Doing this Assignment

The following sections describe how to carry out this assignment. Some of the instructions must be followed exactly; others require you to exercise some discretion. The instructions are targetted at people doing the assignment on d.cse. If you plan to work on this assignment at home on your own computer, you'll need to adapt the instructions to local conditions.

If you're doing your assignment on the CSE machines, some commands must be carried out on vxdb02, while others can (and probably should) be done on a CSE machine other than vxdb02. In the examples below, we'll use vxdb02$ to indicate that the comand must be done on vxdb02 and cse$ to indicate that it can be done elsewhere.

Setting Up

In addition to the database dump file, you are also provided a template Python files, and Python and SQL helper files.

The "template files" aim to save you some time in writing Python code. E.g. they do handle the command-line arguments and let you focus on the database interaction.

The helpers.py and helpers.sql files are provided in case you want to define Python functions or PLpgSQL functions that might be useful in several of your scripts.
You are not required to use them (i.e. you can leave them unchanged).

The template files are available in a single ZIP or TAR

or copy them to your CSE account with the following command:

cse$ mkdir -pm ~/COMP3311/ass2 # or any other directory of your choice
cse$ cd ~/COMP3311/ass2
cse$ cp /web/cs3311/current/assignments/ass2/files/* .

The database dump file can be downloaded HERE

or linked to your CSE account with the following command:

cse$ cd ~/COMP3311/ass2
cse$ ln -s /web/cs3311/current/assignments/ass2/database/mymyunsw.dump .

Now you can setup and use your database, eg:

vxdb02$ source /localstorage/$USER/env
vxdb02$ p1
vxdb02$ createdb mymyunsw
vxdb02$ psql mymyunsw -f ~/COMP3311/ass2/mymyunsw.dump
vxdb02$ psql mymyunsw

mymyunsw> SELECT * FROM ...
# ect
mymyunsw> \q

# after writing code in q1.py (if using vscode not on vxdb02)

vxdb02$ python3 ~/COMP3311/ass2/q1.py
# hopefully some output
vxdb02$ p0
vxdb02$ logout

Your Tasks

Q0. Style Mark (2 marks)

Style mark.

Ugly, inconsistent layout of SQL queries and PLpgSQL functions will be penalised. It's hard to layout Python3 code wrong, given that indentation replaces brackets, but if you manage to make your Python code ugly, that will also be penalised. You should ensure that your Python variable names are understandable and consistent.

Q1. OrgUnit Type Summary (4 marks)

Write a Python script q1.py that for every faculty in orgunits, reports:

the number of schools that have it as a parent
the number of distinct staff affiliated with those units (staff is "working" for the faculty if a course they convened is in a school of the faculty (or by the faculty itself))

Additional requirements:

The script takes no command-line arguments.
Types with zero units or zero staff must still appear (count = 0).
Order the rows by faculty name (ascending).

Output format:

Faculty                                 #Schools #Staff

Print the header line exactly as above.
For each type print one line, using the string format:
```
f"{type_name:<40}{num_schools:>8}{num_staff:>7}"
```

You can assume that at least one faculty will exist.

You can assume schools will be below faculties directly (no recursion needed).

Q2. Longest Increasing Average-Mark Run (5 marks)

Write a Python script q2.py that, for a given subject code, prints the longest strictly increasing sequence of term average marks for that subject.

The script takes exactly one command-line argument: <SubjectCode>.
If that code does not exist, print
```
Subject <SubjectCode> not found.
```
and exit.
For every term in which the subject was offered, compute the average of all non-NULL marks, and round to 2 decimal places (using SQL). If the course was offered but all marks are NULL, exclude it.
Considering terms in starting date order, find the maximal contiguous run in which each average mark is strictly greater than the previous one. If there are multiple ties, return the run with the latest term (determined by starting date). If no run of length ≥ 2 exists, print

No increasing run found for <SubjectCode>.

Otherwise print two lines:

<SubjectCode> (SubjectTitle):
<Term1>(<Avg1>) -> ... -> <TermK>(<AvgK>)

Q3. Person Information (6 marks)

In q3.py, write a script that given a zID, provides information about the person that zID belongs to.

The script takes one command-line argument: <zID>.
Find out if they are a student or a staff member (you can assume total and disjoint from the ER diagram). If the zID is not recorded in the database, print "No one has the zID <zID>." and exit.
If they are a student, display their zID, name, if they are domestic (not INTL) or international (INTL), what country they're from, and the latest program they were enrolled (use latest term starting date, then largest enrolment id), with the streams relevant to that enrolment : (streams ordered by stream code).

zID FamilyName, GivenNames (Domestic/International student from Country)
ProgramCode1 ProgramName1 (StreamCode1 and StreamCode2 and ... and StreamCodeN)

If they are a staff member, print out
```
zID FamilyName, GivenNames is a staff member, and not a student.
```
and exit. Assume this is true from the ER diagram, even without any actual implementation of the disjoint constraint.

f"{CourseCode} {Term} {SubjectTitle:<40s}{Mark:>3} {Grade:>2s}  {UOC:2d}uoc"

Total achieved UOC = total_achieved_uoc, WAM = wam

If the course title is over 40 characters, truncate the title to its first 40 characters.

If either of the mark or grade is null, print a "-", right-aligned, where mark or grade would normally go.

What to print for uoc and how to use the grades and marks to determine the WAM is given in the Grades + Rules page. The precise format of the output will be available in the Examples page.

Note that there are two UOC totals in this question:

total_achieved_uoc = sum(uoc_i) where course_i is a "pass"
this includes obvious ones like HD,DN,... but also ones like XE
basically any couse with "yes" in the UOC column in the Grades table
total_attempted_uoc = sum(uoc_i) for any course_i attempted
even if the course was failed, we count its UOC; it was attempted
any course which has "yes" in the WAM column in the Grades table
weighted_mark_sum = sum(uoc_i * mark_i) for any course_i attempted
any course which has "yes" in the WAM column in the Grades table

WAM = weighted_mark_sum / total_attempted_uoc If the WAM is not computable due to a denominator of 0, print instead of WAM = wam, the phrase "Can't compute WAM".

* Do the rounding with SQL instead of Python, due to the inconsistency of floats.

Q4. Subject Filter and Listing (6 marks)

Write a Python script that takes in one argument (a semicolon-separated list of filter conditions) that finds subjects that match the conditions, and prints them.

Each condition is written field:(<expression>). If there are multiple colons, the first one determines the field, and the others are part of the expression.
Fields can be separated by spaces and newlines after the semicolons or after the field.
The expressions may contain parentheses, single quotation marks ' to mark strings, and && (AND), || (OR), and ! (NOT) to form logical expressions.
You can assume strings can only contain letters, numbers, and whitespace characters.
For expressions like A && !B || C, NOT takes priority over both AND and OR, and AND takes priority over OR, so it should be interpreted as (A && (!B)) || C
Operations inside the expression depend on the field:
- uoc atoms are comparisons like >=6, <12, =9.
- career, name, code atoms are literal strings (substring match, case-insensitive).

"uoc:>=6 && <12;
career:!('PG');
title:'math' && !(' 1A' || 'Comp');
code:'10' && ('2' || '3' || '4')"

UOC is ≥ 6 and < 12.
Career does not contain "pg".
Full course title contains "math" but does not contain " 1a" nor "comp".
Code contains "10" and also contains either "2" or "3" or "4".

Error: No filter conditions provided

Error messages (go through each field and check in this order):

If a clause is missing :, print Error: Bad clause: missing a ":" in "<clause>" and exit
If the field name is not one of uoc, career, name title, code, print Error: Unknown field "<field>" and exit. These field names are case sensitive, so CODE will not work.
Otherwise, if the expression can not be evaluated for any other reason e.g. A && || B, print Error: The "<field>" expression is not evaluable and exit.
Hint: you don't actually need to specify why the expression is wrong or where it is wrong, just that if it can be evaluated or not.

Output format

Code      Title                                                    UoC    Career

If there are subjects that match the conditions, print the header exactly as shown.
Use the format string f"{code:<10}{title:<55}{uoc:>5}{career:>10}" to print out each subject that match the conditions.
If the subject title is longer than 55 characters, take the first 52 characters and add ... instead.
The output should be ordered by subject code.
Valid fields can be repeated, treat both their expressions as part of the filter (e.g. the two code:1;code:2; expressions must both be true)
If there are no subjects that match the condition, print There are no subjects that match the conditions. and do NOT print the header

Q5. Progression Check (7 marks)

Note: Please read the pinned forum post about this question for some previously missing details

Write a Python script q5.py to show a student's progression through their program/stream, and what they still need to do to complete their degree. The script takes three command line parameters:

python3  q5.py  StudentID  [ ProgramCode  StreamCode ]

If no program and stream is given, use the program for the student's most recent enrolment (determined by term starting date , and then largest enrolment id) and grab the first stream that appears in the program's stream requirements.
If a program is provided, but no stream, get the first stream that appears in the program's stream requirements (first stream of the first id-wise program requirement of type "stream").

The script already checks the validity of the command-line arguments.

The progression check should start with a two-line heading.

zID FamilyName, GivenNames
ProgramCode StreamCode ProgramName

Then check if the stream is part of the program's requirement if both stream and program are provided. If not, print

StreamCode is not a stream in ProgramCode

Otherwise, the output should look like this

CourseCode  Term  CourseTitle  Mark  Grade  UOC  NameOfRequirement

Use this f-string to get the formatting right:

f"{CourseCode} {Term} {SubjectTitle:<40s}{Mark:>3} {Grade:>2s}  {UOC:2d}uoc  {NameOfRequirement}"

The order should be initially by term, then by course code within the term. If either of the mark or grade is null, print a "-", right-aligned, where grade or mark would normally go.

You should keep track of which courses and how many UOC in which requirements have been completed. After the line for each of the courses taken, you should display a sequence of lines indicating which core courses have not been completed, and how many UOC from each group of electives remains to be done.

If you consider each requirement as a bucket, then the process of determining which requirement a course satisfies, is a process of determining which bucket a particular course belongs in. If the bucket for the most appropriate requirement is full, the course cannot be allocated to that requirement, and a new requirement must be sought. In the "worst" case, the course will end up in the free electives bucket. If the free electives bucket is full, and if all of the other buckets that the course potentially be allocated to are also full, then the course cannot be allocated to any requirement and does not count toward the degree. Such courses should have 0 UOC against them and have a note "Cannot be allocated".

The strategy for ordering the "to be completed" info

do all Core requirements first, stream Core's before program Core's
then do all Elective requirements, stream Electives before program Electives
then do GenEd requirements, then Free (elective) requirements

In other words, most specific to least specific.

Within groups (e.g stream Core's), order by Requirements.id. For Core requirements, print remaining UOC and the course codes and names of any not yet completed courses, in the order they appear in the group definition. For all other rule types, print remaining UOC and the name of the group. If a student has completed all UOCs for a rule, then no information on this rule needs to be printed.

If a student has satisfied all rules and enough UOC for the program, you should print

Eligible to graduate

instead of the "to be completed" text.

More details on the precise output format for rules will be available in the Examples page.

This question will not be tested against any programs that have no stream requirements. See the bottom of Grades+Rules to some programs and streams that have a proper set of requirements.

Submission and Testing

You can find the simple test cases in the Examples page.

Note that there is a time-limit of 2 seconds for each script.