COMP3311 23T3 Assignment 2
Python, PostgreSQL, MyMyUNSW
Database Systems
Last updated: Saturday 11th November 1:09am
Most recent changes are shown in red ... older changes are shown in brown.

[Specification]  [Database]  [SQL Schema]  [Grades+Rules]  [Examples]  [Testing]  [Submitting]  [Fixes+Updates]

Aims

This assignment aims to give you practice in

The goal is to build some useful data access operations on the MyMyUNSW database.

A minor theme of this assignment is "dirty data". We built the database, using a collection of reports from UNSW's information systems. There may be a very occasional glitch in the data, but the data is not generally dirty. Just work with what’s in the database; If it produces strange results, then write some queries to check whether those results fit with what's actually in the database, which is something you should be doing anyway.

You may bump into some residual data issues as you try to solve the problems below. Let us know via the Forum if you think you have found such data anomalies.

Summary

Marks: This assignment contributes 16 marks toward your total mark for this course.
Submission: Login to Course Web Site > Assignments > Assignment 1 > [Submit] > upload q1.py, q2.py, q3.py, q4.py, q5.py, helpers.py, helpers.sql
or, on a CSE server,
give cs3311 ass2 q1.py q2.py q3.py q4.py q5.py helpers.py helpers.sql
Deadline: Wednesday 15 November @ 23:59
Late Penalty: Standard UNSW late penalty: 5% off the maximum achievable mark for each day late;
marks deducted by the hour; after 5days late, mark is zero

How to do this assignment:

And, of course, if you have PostgreSQL installed on your home machine, you can do all of the vxdb2 work there. BUT don't forget to test it on vxdb2 before submitting.

The "template files" aim to save you some time in writing Python code. E.g. they handle the command-line arguments and let you focus on the database interaction. They are available in a file ass2.zip, which contains the following

There are even a couple of functions in helpers.py. Freebies!

Introduction

All Universities require a significant information infrastructure in order to manage their affairs. This typically involves a large commercial DBMS installation. UNSW's student information system sits behind the MyUNSW web site. MyUNSW provides an interface to a PeopleSoft enterprise management system with an underlying Oracle database. This back-end system (Peoplesoft/Oracle) is sometimes called NSS. The specific version of PeopleSoft that we use is called Campus Solutions. There is also a system called SiMS, which can be used to access the data.

UNSW has spent a considerable amount of money ($100M+) on the MyUNSW/NSS system, and it handles much of the educational administration plausibly well. Most people gripe about the quality of the MyUNSW interface, but the system does allow you to carry out most basic enrolment tasks online.

Despite its successes, however, MyUNSW/NSS still has a number of deficiencies, including:

The first point prevents MyUNSW/NSS from being used for three important operations that would be extremely helpful to students in managing their enrolment:

The second point allows for inconsistencies between the Handbook and the system that manages enrolment.

NSS contains data about students, courses, classes, pre-requisites, quotas, etc. but does not contain any representation of UNSW's degree program structures. Without such information in the NSS database, it is not possible to do any of the above three. So, in 2007 the COMP3311 class devised a data model that could represent program requirements and rules for UNSW degrees. This was built on top of an existing schema that represented all of the core NSS data (students, staff, courses, classes, etc.). The enhanced data model was named the MyMyUNSW schema.

The MyMyUNSW database includes information that encompasses the functionality of NSS, the UNSW Online Handbook, and the CATS (room allocation) database. In 2023, we trimmed the schema, removing all tables related to classes, rooms, textbooks, staff affiliations, etc. and revised how program requirements where represented. Note that there is also no information on subject pre-requisites in the new schema, so proper forward degree planning is not possible. The remaining data does allow us to check progression and graduation status.

The new MyMyUNSW data model, schema and database are described in a separate document. The data in the schema is drawn from 2019-2023 enrolment data, and the data has been anonymised; we are using only a tiny fraction of the available data. This results in courses having tiny (or no) enrolments, but still has enrolment data for 8000 students.

Setting Up

To install the MyMyUNSW database under your PostgreSQL server on vxdb2, simply run the following two commands (after ensuring that your server is running):

$ createdb ass2
$ psql ass2 -f /home/cs3311/web/23T3/assignments/ass2/files/ass2.dump

If everything proceeds correctly, the load output should look something like:

SET
SET
...
SET
CREATE TYPE
... a few of these
CREATE DOMAIN
... a few of these
CREATE TABLE
... a whole bunch of these
COPY n
... a whole bunch of these
ALTER TABLE
... a whole bunch of these
ALTER TABLE

You should get no ERROR messages. The database loading should take less than 5 seconds on vxdb2, assuming that vxdb2 is not under heavy load. (If you leave your assignment until the last minute, loading the database on vxdb2 may be considerably slower, thus delaying your work even more. The solution: at least load the database Right Now, even if you don't start using it for a while.) (Note that the ass2.dump file is 3MB in size; copying it under your home directory on the CSE machines is not a good idea.)

Note that the database is called ass2 (when you want to access it via psql or Python script). Throughout this document we refer to the database as "MyMyUNSW". Note also that this is not real data, although it was generated with much swizzling, from real data.

A useful thing to do initially is to get a feeling for what data is actually there. This may help you understand the schema better, and will make the descriptions of the exercises easier to understand. Look at the schema. Ask some queries. Do it now.

Examples ...

$ psql ass2
... PostgreSQL welcome stuff ...
ass2=# \d
... look at the schema ...
ass2=# select * from Students;
... look at the Students table; a list of zid's ...
ass2=# select p.id,p.fullname from People p join Students s on p.id=s.id;
... look at the names and UNSW ids of all students ...
ass2=# select p.id,p.fullname,s.phone from People p join Staff s on p.id=s.id;
... only one result because there's only one staff ...
ass2=# select count(*) from Course_enrolments;
... how many course enrolment records ...
ass2=# select * from dbpop();
... how many records in all tables ...
ass2=# ... etc. etc. etc.
ass2=# \q

You will find that many tables (e.g. Books, Buildings, etc.) are currently unpopulated; their contents are not needed for this assignment.

Summary on Getting Started

To set up your database for this assignment, run the following commands:

$ ssh vxdb2
... and then on vxdb2 ...
$ source /localhost/$USER/env
$ p1
... you shut down the server after your last session, didn't you?
$ createdb  ass2
$ psql  ass2  -f  /home/cs3311/web/23T3/assignments/ass2/files/ass2.dump
$ psql  ass2
... run some checks to make sure the database is ok
$ mkdir  Assignment2Directory
... make a working directory for Assignment 2
$ cd Assignment2Directory
$ unzip /home/cs3311/web/23T3/assignments/ass2/files/ass2.zip
... puts the template files in your working directory

The only messages produced by these commands should be those noted above. If you omit any of the steps, then things will not work as planned. If you subsequently ask questions on the Forums, where it's clear that you have not done and checked the above steps, the questions will not be answered.

Exercises

Q0 (2 marks)

Style mark.

Ugly, inconsistent layout of SQL queries and PLpgSQL functions will be penalised. It's hard to layout Python3 code wrong, given that indentation replaces brackets, but if you manage to make your Python code ugly, that will also be penalised. You should ensure that your Python variable names are understandable and consistent.

Q1 (3 marks)

Write a Python script q1.py to track the proportion of international students over terms from 19T1 to 23T3.

Use the Students table to determine who's local/international. Treat any student whose status is INTL as international; treat everyone else as local (yes, even Kiwis). Use the Program_enrolments table to determine which students are enrolled in a given term.

Display each term code on a separate line, using the following formatting:

f"{TermCode} {#Locals:6d} {#Internationals:6d} {Proportion:6.1f}"

The Proportion is simply #Locals/#Internationals.

Add a heading at the start of the output in the same format as on the Examples page).

Warning: some students appear to be enrolled in two programs in one term.

Q2 (3 marks)

Write a Python script q2.py to track the satisfaction in a specified subject over terms from 19T1 to 23T3.

The script takes one command line parameter (a subject code)

python3  q2.py  SubjectCode

Display results for each term on a separate line, using the following formatting:

f"{TermCode} {Satifaction:6d} {#Responses:6d} {#Students:6d}  {Convenor}"

If any of the attributes have a NULL value, use a "?" instead of the number or convenor name (and in this case you will need to use a different format string than the above). If a subject wasn't offered in a particular term, don't print any line for that term.

Add a heading at the start of the output in the same format as on the Examples page).

Q3 (5 marks)

Write a Python script q3.py that takes a program code or a stream code and produces a readable list of rules for that program or stream.

The script takes on command-line parameter:

python3  q3.py  (Stream code or Program code)

A number of different requirement types are given in the Grades and Rules page. More details on the precise output format for rules will be available in the Examples page. All of the rules stored in the database are given in the Grades and Rules page.

The requirements should be displayed in a specific order:

If there are multiple requirements of the same type, display them in order of Requirements.id.

Display the name of the requirement after the min/max output described below e.g.

all courses from Level 1 Core
at least 6 UOC courses from Database Electives
between 12 and 18 UOC courses from Comp Sci electives
1 stream from Engineering majors

The brown text is the requirement name.

Total UOC rules are an exception to this. The "Total UOC" is displyed before the min/max output.

When displaying rules, use the following for min and max

If the requirement is based on courses, add "UOC". If the requirement is based on streams, add the word "stream".

In the database, all of the rules are stored in the Requirements table. More details on the fields in this table are given in the Database page.

Q4 (5 marks)

Write a Python script q4.py that takes a command-line argument giving a student ID, and prints a transcript for that student. The template script already checks the command line argument.

The transcript should start with a two-line heading.

zID FamilyName, GivenNames
ProgramCode StreamCode ProgramName

The program and stream should be the ones most recently enrolled by the student.

Each line of the transcript should contain

CourseCode  Term  SubjectTitle  Mark  Grade  UOC

Use the following f-string to get the formatting right

f"{CourseCode} {Term} {SubjectTitle:<32s}{Mark:>3} {Grade:>2s}  {UOC:2d}uoc"

Entries should be ordered by term, and within the same term, by course code. You should also calculate a WAM and display this at the end of the course lines. How to use the grades and marks to determine the WAM is given in the Grades + Rules page. The precise format of the output will be available in the Examples page.

Some subject names are longer than 32 characters. To keep the table neat simply truncate any longer names to exactly 31 characters.

Note that there are two UOC totals in this question:

WAM = weighted_mark_sum / total_attempted_uoc

If either of the mark or grade is null, print a "-", right-aligned, where grade or mark would normally go.

To simplify this task, it would be useful to write a transcript(integer) function to extract transcript data in the correct order as a sequence of tuples containing the above fields. The argument to transcript() is the student's zID (i.e. the command-line argument).

Q5 (6 marks)

Write a Python script to show a student's progression through their program/stream, and what they still need to do to complete their degree. The script takes three command line parameters:

python3  q5.py  StudentID  [ ProgramCode  StreamCode ]

If no program/stream is given, use the program/stream for the student's most recent enrolment term. The script already checks the validity of the command-line arguments.

The output should look like a transcript, but with additional information to indicate which rule each course satisfies

CourseCode  Term  CourseTitle  Mark  Grade  UOC  NameOfRequirement

The order should be the same as for the transcript script (i.e. order by term, then by course code within the term).

You should keep track of which courses and how many UOC in which requirements have been completed. After the line for each of the courses taken, you should display a sequence of lines indicating which core courses have not been completed, and how many UOC from each group of electives remains to be done.

If you consider each requirement as a bucket, then the process of determining which requirement a course satisfies, is a process of determining which bucket a particular course belongs in. If the bucket for the most appropriate requirement is full, the course cannot be allocated to that requirement, and a new requirement must be sought. In the "worst" case, the course will end up in the free electives bucket. If the free electives bucket is full, and if all of the other buckets that the course potentially be allocated to are also full, then the course cannot be allocated to any requirement and does not count toward the degree. Such courses should have 0UC against them and have a note "Cannot be allocated".

The strategy for ordering the "to be completed" info

In other words, most specific to least specific.

Within groups (e.g stream Core's) order by Requirements.id. For Core requirements, print remaining UOC and the course codes of any not yet completed courses, in the order they appear in the group definition. For all other rule types, print remaining UOC and the name of the group. If a student has completed all UOCs for a rule, then no information on this rule needs to be printed.

If a student has satisfied all rules and enough UOC for the program, you should print

Eligible to graduate

instead of the "to be completed" text.

More details on the precise output format for rules will be available in the Examples page.

Submission

Submit this assignment by doing the following:

Login to Course Web Site > Assignments > Assignment 2 > Submit Your Work > Make Submission >
upload helpers.sql, helpers.py, trans, rules, prog > [Submit]

The helpers.sql file should contain all the views and functions that you've written to make your Python code simpler. It should be completely self-contained and able to load in a single pass, so that it can be auto-tested as follows:

Before you submit your solution, you should check that it will load correctly for testing by using something like the following operations:

$ dropdb ass2                  ... remove any existing DB
$ createdb ass2                ... create an empty database
$ psql ass2 -f ..../ass2.dump    ... load the MyMyUNSW schema and data
$ psql ass2 -f helpers.sql        ... load your SQL code

Note: if your database contains any views or functions that are not available in the helpers.sql file, you should add them to that file before you drop the database.

If your code does not load without errors, fix it and repeat the above until it does.

You must ensure that your helpers.sql file will load correctly (i.e., it has no syntax errors and it contains all of your view definitions in the correct order). If we need to manually fix problems with your helpers.sql file in order to test it (e.g., change the order of some definitions), you will be fined via a 1 mark penalty on your ceiling mark (i.e., the maximum you can score is 19 out of 20 marks).