COMP3311 24T2

Assignment 1
Queries and Functions on RugDB

Last updated: Monday 17th June 6:37am
Most recent changes are shown in red ... older changes are shown in brown.

[Assignment Spec] [Database Design] [Examples] [Testing]

Aims

This assignment aims to give you practice in

reading and understanding a small relational schema (RugDB)
implementing SQL queries and views to satisfy requests for information
implementing PLpgSQL functions to aid in satisfying requests for information

The goal is to build some useful data access operations on the RugDB database, which contains a wealth of information about Persian rugs.
One aim of this assignment is to use SQL queries (packaged as views) to extract such information. Another is to build PlpgSQL functions that can support higher-level activities, such as might be needed in a Web interface.

Summary

Submission: Login to Course Web Site > Assignments > Assignment 1 > [Submit] > upload ass1.sql
or,
on a CSE server, give cs3311 ass1 ass1.sql

Required Files: ass1.sql (contains both SQL views and PLpgSQL functions)

Deadline: 21:59:59 Friday 28th June 2024

Marks: 13 marks toward your total mark for this course

Late Penalty: 0.2 marks off the achieved mark for each hour late, for 5 days
any submission after 5 days scores 0 marks ... UNSW late penalty policy

How to do this assignment:

read this specification carefully and completely
create a directory for this assignment
copy the supplied files into this directory (see Setting Up)
login to vxdb2 and run your PostgreSQL server**
(or run a PostgreSQL server installed on your home machine)
load the database and start exploring
complete the tasks below by editing ass1.sql
test your work on vxdb2, which is where it will be marked
submit ass1.sql via WebCMS or give

Details of the above steps are given below. Note that you can put the files wherever you like; they do not have to be under your /localstorage directory. You should edit your SQL files on hosts other than vxdb2. The only time that you need to use vxdb2 is to manipulate your database. Since you can work at home machine, you don't have to use vxdb2 at all while developing your solution, but you should definitely test it there before submitting.

Background

In the Western world, Persia was historically the common name for Iran. Thus, the term "Persian rug" refers to rugs made in the region now known as Iran. However, Persia represents more than just a country; it is a rich and diverse culture. The patterns and styles of Persian rugs reflect different aspects of Persian history, serving not only as beautiful works of art but also as visual narratives of the region's heritage.

Persian rugs are meticulously crafted by skilled artisans over several months or even years, depending on the size of the rug. Each rug is composed of thousands of intricate knots, with each region of Iran boasting its own distinct styles and patterns. The colours, designs, and knotting techniques employed in these rugs represent the unique histories, artistic traditions, and cultural identities of the people from various areas of the country.

Watch these amazing videos to learn more about the fascinating process of creating Persian rugs:

Photo from:

pexels-matreding-10409984

Introduction

In order to work with a database, it is useful to have some background in the domain of data being stored. Here is a very quick tour of Persian Rugs. If you want to know more, see the Wikipedia page on Persian Rugs.

To build a database on rugs, we need to consider:

Persian Rug styles (e.g. Shahsavan, Ghashghai, Tabriz, etc., etc.)
Materials (e.g. varieties of piles and warp, and weft)
Factories, the facilities where rugs are crafted
Rugs, following a style, and made in a particular factory

Specific properties that we want to consider:

Knot Length = The length of each knot in a rug
Knot Per Feet = Number of total not in each square feet. More is usually better.
Each rug style has a range of “knot per length” for rugs in that style
for each rug, we would like to store:
- its name
- its style, actual knot length, actual knot per square meter (optional), year it was crafted
- Size of the rug, described in square feed
- the material used for its "rug stop”
- its materials (usually a partial list because factories don't want to reveal too much)
for each factory, we would like to store:
- its name,
- its location,
- the year it was founded,
- its website

The schema is described in more detail both as an ER model and an SQL schema in the schema page.

Doing this Assignment

The following sections describe how to carry out this assignment. Some of the instructions must be followed exactly; others require you to exercise some discretion. The instructions are targetted at people doing the assignment on vxdb2. If you plan to work on this assignment at home on your own computer, you'll need to adapt the instructions to local conditions.

If you're doing your assignment on the CSE machines, some commands must be carried out on vxdb2, while others can (and probably should) be done on a CSE machine other than vxdb2. In the examples below, we'll use vxdb2 to indicate that the comand must be done on vxdb2 and cse$ to indicate that it can be done elsewhere.

Setting Up

The first step in setting up this assignment is to set up a directory to hold your files for this assignment.

cse$ mkdir /my/dir/for/ass1
cse$ cd /my/dir/for/ass1
cse$ cp /home/cs3311/web/24T2/assignments/ass1/ass1.sql ass1.sql
cse$ ln -s /home/cs3311/web/24T2/assignments/ass1/ass1.dump ass1.dump

This gives you a template for the SQL views and SQL and PLpgSQL functions that you need to submit. You edit this file, (re)load the definitions into the database you created for the assignment, and test it there.

As supplied, the ass1.sql template file consists entirely of comments; remove the comments from whatever question you're working on. This allows you to load the ass1.sql file without getting syntax errors from the not-yet-completed parts of the template. Of course, don't forget to remove the comments from all completed questions before testing and submitting.

The next step is to set up your database:

... login to vxdb2, source env, run your server as usual ...
... if you already had such a database
vxdb$ dropdb ass1
... create a new empty atabase
vxdb$ createdb ass1
... load the database, saving the output in a file called log
vxdb$ psql ass1 -f ass1.dump 
... examine the database contents
vxdb$ psql ass1

The database loading should take less than 2 seconds on vxdb2, and will produce messages like:

CREATE DOMAIN
CREATE TYPE
SET
CREATE TABLE
COPY number
ALTER TABLE

The ass1.dump file contains the schema and data in a single file, along with a simple PLpgSQL function (dbpop()), which counts the number of tuples in each table.

If you're running PostgreSQL at home you can download the dump file here: ass1.dump.

Think of some questions you could ask on the database (e.g. like the ones in the lectures) and work out SQL queries to answer them.

One useful query is

ass1=# select * from dbpop();

This will give you a list of tables and the number of tuples in each. The dbpop() function is written in PLpgSQL, and makes use of the PostgreSQL catalog. We'll look at this later in the term.

Your Tasks

Answer each of the following questions by typing SQL or PLpgSQL into the ass1.sql file. You may find it convenient to work on each question in a temporary file, so that you don't have to keep loading all of the other views and functions each time you change the one you're working on. Note that you can add as many auxuliary views and functions to ass1.sql as you want. However, make sure that everything that's required to make all of your views and functions work is in the ass1.sql file before you submit.

Note #1: many of the queries are phrased in the singular e.g. "Find the rug that ...". Despite the use of "rug" (singular), it is possible that multiple rugs satisfy the query. If this is the case you should return all rugs that satisfy the query. Because of this you should not use LIMIT 1.

Note #2: the database is not a complete picture of rugs in the Real World. Treat each question as being prefaced by "According to the RugDB database ...".

Note #3: you can assume that the names for styles and factories are unique; you cannot assume this for rug names.

Note #4: do not worry about the order of tuples in your result; we will apply order by when testing the queries (see the ordering in the Examples page).

There are examples of the results of each view and function in the Examples page.

Q0 (2 marks)

Style mark

Given that you've already taken multiple programming courses, we should be able to assume that you'll express your code with good style conventions. But, just in case ...

You must ensure that your SQL queries follow a consistent style. The one I've been using in the lectures is fine. An alternative, where the word JOIN comes at the start of the line, is also OK. The main thing is to choose one style and use it consistently.

Similarly, PLpgSQL should be laid out like you would lay out any procedural programming language. E.g. bodies of loops should be indented from the FOR or WHILE statement that introduces them. E.g. the body of a function should be indented from the BEGIN...END.

Ugly, inconsistent layout of SQL queries and PLpgSQL functions will be penalised.

Q1 (1 mark)

Write a view Q1(province, nfactories) that returns a list of all provinces and the number of factories in each province.

The columns in the result are:

province = the name of a province
nfactories = a count of the number of factories in that province

Q2 (1 mark)

Write a view Q2(style, knot_length_diff) that determines which style(s) have the largest difference between their minimum and maximum knot lengths.

The columns in the result are:

style = the name of the style
knot_length_diff = the difference between the maximum and minimum knot lengths for that style

Q3 (2 marks)

Write a view Q3(style, lo_knot_length, hi_knot_length, min_knot_length, max_knot_length) that gives a list of rug styles satisfying the properties:

the minimum and maximum knot lengths for the style are different
some rugs crafted in the style has a knot length outside the min/max range

The columns in the result are:

style = the name of the style
lo_knot_length = lowest knot length of any rug made in that style
hi_knot_length = highest knot length of any rug made in that style
min_knot_length = the minimum knot length for the style
max_knot_length = the maximum knot length for the style

Q4 (2 marks)

Write a view Q4(factory, rating) that returns the factory (or factories) with the maximum average rating for all their rugs.

The columns in the result are:

factory = the name of the factory
rating = average rating for rated rug by that factory

To avoid factories with a single high-rated rug coming out on top, we only consider factories that have at least five rated rug. If rugs are crafted collaboratively, give the rating to both the factories involved. Make sure that you compute the average rating using floating point numbers; if you use integers, PostgreSQL truncates to an integer like C does. Don't include rugs with no rating (i.e. rugs.rating is NULL).

The rating value should be returned as numeric(3,1).

Q5 (2 marks)

One of the most important characteristic of a rug is the density of its knots. One way to represent this is through knot per feet.
To measure it, experts use a simple technique. They take a 1-foot long ruler and place it on the back of the rug. Then, they count the total number of knots within that 1-foot length.
They only count the knots in a 1-foot length and not the total knot in a square feet.

Remember, the size of the rug that is given to you is in square feet, (ft^2) while the total knots measure is in feet (ft).
If you want to calculate the total number of knots in a rug, you have to:

Total number of knots =  knot_per_foot x knot_per_foot x size

You will notice that many rugs do not have knot per feet value in the database and its value is null. Unfortunately this can happen in real world.
If you find a rug which knot per feet is not given, you need to use the average industry value which is 50.

Write an SQL function Q5(pattern text) whose argument is a pattern (regex) representing part of a rug name.
The function returns a set of 0 or more tuples for any rugs whose name matches the pattern (regex), and with the following fields:

rug = full name of a rug
size_and_stoper = the size of the rug in sq ft followed by the material of its rug_stop (e.g. 440sf leather)
total_knots = the total knots in a rug

You must define the function as follows:

Q5(pattern text) returns table(rug text, size_and_stoper text, total_knots numeric)

The total knots value is of type numeric(8,0).

You will need to determine how to construct the size_and_stoper string.

Q6 (3 marks)

Write an SQL function Q6(pattern text) whose argument is a string representing part of a province name.
The function returns a set of 0 or more tuples for any provinces whose name contains the pattern as a substring (case-insensitive) (not a regex), and with the following fields:

province = full name of a province
first = the earliest year that a rug was crafted in that province, according to our DB
nrugs = the number of rugs crafted in that province
rating = the average rating of rugs crafted in that province

You must define the function as follows

Q6(pattern text) returns table(province text, first integer, nrugs integer, rating numeric)

The rating value should be cast to type numeric(3,1) within the function.

Q7 (3 marks)

Write a PLpgSQL function Q7(_rugID integer) whose argument is an integer representing a rug ID (Rugs.id) value. The function returns a single text string, formatted as follows:

if the ID does not exist in the rugs table, simply return the string:
```
No such rug (<_rugID>)
```
if the ID does exist, the first line contains the name of the rug, enclosed in quotes:
```
"<Name of rug>"
```
if the rug has no materials recorded, simply add a second line to the return string and return the two-line string, e.g.
```
"<Name of rug>"
  no materials recorded
```
Note that there are exactly two spaces before the no materials.
otherwise append a string containing "contains:" (preceded by two spaces) and then append a string containing a list of materials, one per line, in order of the material names:
- each line starts with exactly four spaces
- then the name of the material
- then the type of the material, enclosed in parentheses
```
"<Name of rug>"
  contains:
    <material 1 name> (<material 1 type>)
    <material 2 name> (<material 2 type>)
```

Note that the return value of this function is a single text string. It will likely contain embedded newline characters, but there should be no trailing newline character.
There are examples of the return values in the Examples page, in case the above is not clear enough.

Note also that, in its output, psql uses a plus + symbol to indicate that there is an embedded newline. The plus symbol is simply an artifact of the way psql displays strings.
Do not embed plus symbols into your return string.

Q8 (4 marks)

Write a PLpgSQL function Q8(pattern text) whose argument is a string representing part of a rug name. The function returns a set of 0 or more tuples for any rugs whose name contains the pattern (case-insensitive matching), and with the following fields:

rug = full name of a rug
factory = which factory (or factories (plus-separated)) the rug was made in
piles = a comma-separated list of the names of piles used in the rug

If the rug is a collaboration rug (crafted_by more than one factory), the names of all factories involved must be included, separated by '+' characters, and appear in alphabetical order of the factory names.

The names of the piles in the list of piles contained in the rug must be separated by ',' characters, and must appear in alphabetical order.

If a rug has no pile ingredients recorded, then the piles string should be set to:

no piles recorded

The function needs to be defined differently to the functions above that returned table(...). For this function, we need to define a new tuple type and have the function return a setof tuples of that type:

drop type if exists RugPiles cascade;
create type RugPiles as (rug text, factory text, piles text);

The drop type statement is included so that you can reload the ass1.sql file multiple times without generating errors. It does, however, generate a NOTICE, which looks like an error, but is actually harmless.

Q8(pattern text) returns setof RugPiles

Q9 (4 marks)

Write a PLpgSQL function Q9(factoryID integer) whose argument is an integer, possibly containing a Factory.id value. The function returns a set of Collab tuples (see below) for the factory, and with the following fields:

factory = full name of a factory, or NULL
collaborator = full name of another factory, or the string 'none'

The Collab tuples give the names, in alphabetical order, of all the factories that the indicated factory (factoryID) has made collaboration rug with.

There are a number of different cases for what appears in the Collab tuples:

factoryID is not a valid Factories.id value
- factory has the value 'No such factory (factoryID)'
- collaborator has the value 'none'
the factory has never made a collaboration rug
- factory contains the name of the factory whose id is factoryID
- collaborator has the value 'none'
the factory has collaborated with exactly one other factory
- factory contains the name of the factory whose id is factoryID
- collaborator contains the name of the collaborating factory
the factory has collaborated with more than one other factory
- there are multiple tuples for this factory
- in the first tuple, factory contains the name of the factory whose id is factoryID
- in all subsequent tuples, factory has the value NULL
- in all tuples, collaborator contains the name of a collaborating factory

drop type if exists Collab cascade;
create type Collab as (factory text, collaborator text);

Q9(factoryID integer) returns setof Collab

Note that the function always returns at least one tuple, even if factoryID is not valid.

Note also that this is different to question Q6, where the list of list of materials was formed by string concatenation; in this case, each collaborating factory is in a separate tuple.
In case the above description is not clear, there are many examples in the Example page.

Submission and Testing

We will test your submission as follows:

create a testing subdirectory
create a new database TestingDB and initialise it with ass1.dump
load your work via the command: psql TestingDB -f ass1.sql (using your ass1.sql)
run auto-marking on your views and functions
drop the TestingDB database
create a new database TestingDB and initialise it with an unseen dump file (same schema, different data)
load your work via the command: psql TestingDB -f ass1.sql (using your ass1.sql)
run auto-marking on your views and functions

Note that there is a time-limit on the execution of queries. If any query takes longer than 3 seconds to run (you can check this using the \timing flag) your mark for that query will be reduced.

Your submitted code must be complete so that when we do the above, your ass1.sql will load without errors. If your code does not work when installed for testing as described above and the reason for the failure is that your ass1.sql did not contain all of the required definitions, you will be penalised by a 2 mark penalty. If you code does not load because your definitions are in the wrong order, there will be a 1 mark penalty.

Before you submit, it would be useful to test whether the files you submit will work on vxdb2, by doing the following:

vxdb2$ dropdb ass1
vxdb2$ createdb ass1
vxdb2$ psql ass1 -f ass1.dump
vxdb2$ psql ass1 -f ass1.sql

These commands may produce information messages, but they should not produce any errors.

A Clear Understanding of Responsibilities

Our Commitment to You

Dear students, we value transparency and strive to ensure a clear understanding of our responsibilities and expectations. In this regard, we would like to outline some of our responsibilities to you:

Assignment Release and Marking

We will release assignments on time and ensure they are as error-free as possible.
Your assignments will be marked within two week
If any updates or minor corrections are required, we will update the specification and clearly communicate these changes.

Addressing Errors and Providing Support

In the event of a significant error that impacts the assignment, we will extend the deadline to accommodate the time constraints our students may face. If this happens, we will communicate it clearly.
We are committed to answering your questions, whether in person or online. For online inquiries, we will make sure to respond within two business days.
Regular consultation hours and extra weekly help sessions will be conducted to address any questions and provide assistance.

Your Responsibilities

To ensure a fair and productive learning experience, we kindly request that you:

Read and fully understand the assignment requirements.
Complete the assignment without engaging in any form of cheating or plagiarism. For detailed information on academic integrity, please refer to the following link: https://www.student.unsw.edu.au/plagiarism/integrity
If you face any challenges, seek assistance through the provided channels, such as consultations, help sessions or the forum.
Allow up to two business days for your online questions to be answered.
Regularly check the specification page and notice board for any updates or changes.
Perform comprehensive testing and checking of your solutions.
Test your solutions on the CSE server before submitting.

Have fun, Armin

Submission:	Login to Course Web Site > Assignments > Assignment 1 > [Submit] > upload `ass1.sql` or, on a CSE server, `give cs3311 ass1 ass1.sql`
Required Files:	`ass1.sql` (contains both SQL views and PLpgSQL functions)
Deadline:	21:59:59 Friday 28th June 2024
Marks:	13 marks toward your total mark for this course
Late Penalty:	0.2 marks off the achieved mark for each hour late, for 5 days any submission after 5 days scores 0 marks ... UNSW late penalty policy