Week 02 Workshop

Preparation

Before the workshop you should have watched the lectures. Some workshops have pre-workshop tasks which are optional, but useful. Completing labs before workshops is also optional (but may be useful)!

Revision

Your tutor will quickly explain over the starter code, and explain how to set every second line to be black.

If you have any questions, now is a great time to ask them!

Exercise:
Workshop 2 - Rust Train-ing

In this workshop, we will be using Rust to play with some data about NSW Trains!

The purpose of this workshop is to:

  • Further practice basic rust features and error handling.
  • Experiment with Rust's many collection types.
  • Have fun!

This tutorial is *not* designed to require any data analysis experience; and it is not our goal to teach this to you; however it will assume simple mathematical concepts and some creativity!

Pre-Workshop Work (Optional)

Following feedback on last week's tutorial, we want to provide some pre-workshop work. This is entirely optional, but it will help to get the most out of the workshop.

For this week's pre-workshop work, you should:

  • Fetch the workshop code, from this tar file
  • Read the data from trains.csv into an appropriate data structure.
  • For each year of data, find the most and least used stations.
  • For each 10th of a degree of latitude and longitude, find the most and least used stations.
  • Which station has had it's usage increase the most over the last years?
  • (advanced) Use serde.rs and rust-csv to do your parsing.
  • (advanced) Find the distance between every station and every other station; and find the one station that is on average closest to every other one.

In the workshop

In the workshop, you will be asked to get into teams of 3 people. Teams do not all have to submit together, but we encourage them to share code and work on different problems. If someone has already done the pre-workshop work, start with that code. If nobody has, get the starter code from this tar file. If more than two people have done the work, commence a duel to the death (or, discuss which code is better for which tasks, and use the most appropriate code/a combination of the two).

Teams can pick tasks to work on from the list below. Once a task is done, they should call over a tutor to have a chat about the task. Tutors will also be happy to provide help on rust issues (both related to getting code working, and design). Teams can "choose their own adventure", but we encourage teams with lots of experience to pick harder tasks as a group. Teams should also use both functional and imperative approaches, so we can compare the two.

The Tasks

  • Find the most and least used stations on the NSW network at each time of day.
  • Allow a user to search for a station, and show it's busiest times of day, and busiest year.
  • Which station has had it's yearly utilisation increase the most in the last 5 years?
  • Which station had the biggest percentage change in use over 2020?
  • What is the north-most, south-most, east-most and west-most station?
  • Find the two closest stations and the two furthest away from eachother.
  • Sort stations by their distance from central, and by their usage. Is the order similar, or different? Are there outliers?
  • (hard) To help the NSW government with their plan for Sydney, group stations together into "regions". One approach to this might be to start with the least used station, and "group" it with it's closest neighbour. Repeat this until you've got a certain amount of groups left.
  • (hard) A meteor is headed for sydney! It is headed for a train station, but we don't know which one. It will destroy all stations within 2 kilometers of it. For each 4-hour period, make a list of which stations would be best or worst to hit.
  • (hard) What area of sydney has the densest coverage of train stations? (this is very open ended).
  • (hard, and maths theory involved) Find a list of the outermost stations of sydney (i.e. if you drew lines between them, no station would be outside them)
  • (very hard) Make a graph of an interesting result found above, and post it to /r/dataisbeautiful.

At the end, we'll come back together to review interesting bugs/errors, and interesting ways of completing the tasks. We'll also discuss functional and imperative approaches.