Week 02 Workshop

Exercise:
Workshop 2 - Rust Train-ing

In this workshop, we will be using Rust to play with some data about NSW Trains!

The purpose of this workshop is to:

  • Further practice basic rust features and error handling.
  • Experiment with Rust's many collection types.
  • Have fun!

This tutorial is *not* designed to require any data analysis experience; and it is not our goal to teach this to you; however it will assume simple mathematical concepts and some creativity!

In the workshop, you will be asked to get into pair-programming groups of 2 people. Only one person should be coding work on different problems. If someone has already done the pre-workshop work, start with that code. If nobody has, get the starter code from this tar file. If more than two people have done the work, commence a duel to the death (or, discuss which code is better for which tasks, and use the most appropriate code/a combination of the two).

Teams can pick tasks to work on from the list below. Once a task is done, they should call over a tutor to have a chat about the task. Tutors will also be happy to provide help on rust issues (both related to getting code working, and design). Teams can "choose their own adventure", but we encourage teams with lots of experience to pick harder tasks as a group. Teams should also use both functional and imperative approaches, so we can compare the two. There are more tasks than can reasonably be completed in a day!

We have decided to provide some basic unit tests for this exercise. This should allow you to spend less time writing similar tests yourself and more time working on the implementation. You will find a struct Solution in the main.rs file, which contains stubs for most, but not all, of the queries you should implement. Once you have filled some of these in, you can run "cargo test" locally to execute the unit tests (we will cover unit tests more in future weeks!). Note that these tests are not marked in any way, so there is no need to submit anything.

The Tasks

  • Use the provided starter code (and the useful code in a seperate file) to represent the CSV data in a useful way. The extra code you have been provided should be copied into your main function ( you may need to fix bugs that arise when you copy it across).
  • NOTE: We define a station's use or busyness as the total amount of entries + exists it has, aggregated over all years. However, if a station has no data for a certain column (e.g. the entries/exit field is Option::None), we ignore this value, instead of e.g. using a default of 0 instead.
  • What is the north-most, south-most, east-most and west-most station?
  • Find the most and least used (total entries + exits) stations on the NSW network at each time of day. The reference solution does not count stations with no data as having no uses (neither entries nor exits), but ignores those stations instead.
  • Allow a user to search for a station, and show it's busiest times of day, and busiest year.
  • Which station had its yearly utilisation (total entries + exits) increase the most from 2016 (inclusive) to 2020 (inclusive)?
  • Which station had the biggest percentage change in utilisation (total entries + exits) from 2019 to 2020?
  • Find the two closest stations and the two furthest away from eachother.
  • Sort stations by their distance from central, and by their usage. Is the order similar, or different? Are there outliers?
  • (hard) A meteor is headed for sydney! It is headed for a train station, but we don't know which one. It will destroy all stations within 2 kilometers of it. For each 4-hour period, make a list of which stations would be best or worst to hit.
  • (very hard) Make a graph of an interesting result found above, and post it to /r/dataisbeautiful.

At the end, we'll come back together to review interesting bugs/errors, and interesting ways of completing the tasks. We'll also discuss functional and imperative approaches.