COMP3331/9331 Computer Networks and Applications

HTTP Implementation - Assignment for Term 2, 2025

Due: 16:59 Thursday, 31 July 2025 (Week 9)

Date Description
29/05/2025 Initial release



Introduction


Background

The Hypertext Transfer Protocol (HTTP) is the backbone of the World Wide Web, enabling everything from browsing websites and streaming videos to online banking, shopping, and social media. Every time you load a webpage, send a message, or access cloud-based applications, HTTP facilitates communication between your device and remote servers, making modern internet usage seamless and efficient.

HTTP allows the use of intermediaries to handle requests through a chain of connections. One common type of intermediary is a proxy, a message-forwarding agent that sits between a client (such as a Web browser) and an origin server. Instead of connecting directly to the origin server, the client sends its request to the proxy, which then forwards it on behalf of the client. The origin server’s response is then relayed back through the proxy to the client.

Because a proxy handles both incoming and outgoing requests, it effectively acts as both a client and a server—a server to the clients making requests and a client to the servers fulfilling them.

Figure X: The proxy as a middleman
Figure 1: The proxy as a middleman

Proxies serve a variety of purposes, including:


Task

In this assignment, you will implement your own HTTP/1.1 proxy in C, Java or Python. Your proxy should:

Your proxy must compile and run within the CSE environment. Ensure it is thoroughly tested in that environment.

You are only permitted to use the basic libraries for socket programming. You must not use any ready-made server or HTTP libraries to implement any aspects of this assignment. Doing so will likely result in a mark of zero. If in doubt, please check with course staff on the forum.

This is an individual assignment and is worth 20 marks.


Scope

This assignment specification is the authoritative reference for your implementation. While it is based on HTTP standards, to simplify your task certain aspects may deviate from or contradict official specifications. In such cases, the requirements outlined here take precedence. If you encounter any ambiguity, seek clarification via the course forum rather than relying on official specifications.


Learning Objectives

By completing this assignment, you will gain a deeper understanding of:


Resources


Deliverables


Interface

Your proxy must accept 4 command-line arguments:

It should be initiated as follows:

$ ./proxy <port> <timeout> <max_object_size> <max_cache_size> # for C $ java Proxy <port> <timeout> <max_object_size> <max_cache_size> # for Java $ python3 proxy.py <port> <timeout> <max_object_size> <max_cache_size> # for Python

For example, running one of:

$ ./proxy 60893 10 1024 1048576 # for C $ java Proxy 60893 10 1024 1048576 # for Java $ python3 proxy.py 60893 10 1024 1048576 # for Python

Would mean the proxy should:

You must ensure that none of these values are hard-coded. They must be configurable via the command-line arguments.


HTTP Concepts


Requests vs. Responses

HTTP is a stateless request/response protocol used for exchanging messages over a network connection. These messages fall into two categories: requests, which clients send to initiate an action on the server, and responses, which servers return to fulfill those requests.

Both types of messages share a common structure. They begin with a start-line, followed by zero or more header field lines (collectively called the headers or header section), an empty line marking the end of the headers, and an optional message body. The line terminator for the start-line and header fields is the sequence CRLF (\r\n, representing carriage return and line feed).

Structurally, the only difference between request and response messages is the start-line:

Anatomy of an HTTP message
Figure 2: Anatomy of an HTTP message (Source: MDN Web Docs)


HTTP Methods

The request method defines the purpose of the request and what the client expects as a successful result.


Request Target Forms

The request target specifies the resource on which the method should be applied. The form of the request target varies depending on the HTTP method and whether the request is sent through a proxy.


Response Status Codes

HTTP response status codes indicate whether a specific HTTP request has been successfully completed. Responses are grouped into five classes:

  1. Informational responses (100 – 199) - Note: your proxy will not encounter these during marking
  2. Successful responses (200 – 299)
  3. Redirection responses (300 – 399)
  4. Client error responses (400 – 499)
  5. Server error responses (500 – 599)

Header Fields

Header fields provide metadata about an HTTP message. They follow this format, where the field name and field value are colon separated:

field-name: field-value

Field names are case-insensitive, and there should be no leading or trailing whitespace.

Field values themselves do not have any leading or trailing whitespace, however a field value may be surrounded by optional whitespace.

Some header fields may be multi-valued, as an ordered, comma-separated list:

field-name: field-value-1, field-value-2, ..., field-value-n

As part of this assignment, before forwarding a message, your proxy will need to identify, and potentially remove, modify, or insert, particular header fields. These fields are briefly described here. Any other fields that may be present should simply be forwarded unchanged.

  1. Connection / Proxy-Connection

    The Connection header allows the sender to list desired control options for the current connection. Most notably, it controls whether the network connection stays open (i.e. persists) after the current transaction finishes.

    If a sender does not intend to keep the connection open beyond the current request-response exchange, then they must specify a close directive:

    Connection: close

    Otherwise, they may specify a list of desired control options, for example:

    Connection: keep-alive, upgrade

    In HTTP/1.1, connections are persistent by default, meaning unless there's an explicit close directive in the message, the connection should remain open, regardless of whether there is a keep-alive directive.

    Some legacy user agents may send a non-standard Proxy-Connection header instead of or in addition to Connection, attempting to control connection persistence with proxies. This is not part of the official HTTP specification, but proxies should handle both headers appropriately. Effectively, they should be regarded as equivalent, except where both are present, in which case Connection takes precedence.

    Note, the header values, like the header name, are not case sensitive.

  2. Transfer-Encoding

    The Transfer-Encoding header lists any transformations that have been applied to the content in order to form the message body.

    For example, the following header indicates that the content has been compressed in gzip format and that it will be sent as a series of chunks:

    Transfer-Encoding: gzip, chunked

    Chunked transfer encoding allows data to be sent in multiple parts, each prefixed with its size, without knowing the total size in advance.

    If a message is received with both a Transfer-Encoding and a Content-Length header, the Transfer-Encoding overrides the Content-Length.

    Note, the header values, like the header name, are not case sensitive.

  3. Content-Length

    If a message doesn’t have a Transfer-Encoding header, the Content-Length header can specify the expected size of the content in bytes.

    When a message includes content, Content-Length can help determine where the data and message end.

    For messages without content, it indicates the size of the selected representation. For example, a response to a HEAD request does not include content, but may include a Content-Length.

  4. Via

    The Via header is used for tracking message forwards. The syntax is a comma-separated list of <protocol-version> <pseudonym> pairs.

    For example, the following header indicates that the message passed through an intermediary that supports HTTP/1.0 with the pseudonym foo, then an intermediary that supports HTTP/1.1 with the pseudonym bar.

    Via: 1.0 foo, 1.1 bar

Message Body

The message body (if any) is used to carry content for the request or response. The message body is identical to the content unless a transfer coding has been applied.

The rules for determining when a message body is present in a message differ for requests and responses:

The length of a message body is determined by one of the following (in order of precedence):

  1. Any response to a HEAD request and any response with a 204 (No Content) or 304 (Not Modified) status code is always terminated by the first empty line after the header fields, regardless of the header fields present in the message, and thus cannot contain a message body.
  2. If a Transfer-Encoding header field is present in a response, the message body length is determined by reading the connection until it is closed by the server.
  3. If a Content-Length header field is present without Transfer-Encoding, its decimal value defines the expected message body length in bytes. If the sender closes the connection or the recipient times out before the indicated number of bytes are received, the recipient must consider the message to be incomplete and close the connection.
  4. If this is a request message and none of the above are true, then the message body length is zero (no message body is present).
  5. Otherwise, this is a response message without a declared message body length, so the message body length is determined by the number of bytes received prior to the server closing the connection.

Message Parsing

As a general guide, when parsing an HTTP message, the typical approach is:

  1. Read the start-line into a structured representation.
  2. Read each header field line into a hash table (keyed by field name) or similar, until encountering an empty line.
  3. Use the parsed data to determine whether a message body is expected.
  4. If a message body is indicated, read it as a stream, either for the specified length or until the connection is closed.

Requirements & Features


Basic Non-Persistent Proxy

In the simplest case, your proxy will be non-persistent and handle requests sequentially, supporting GET, HEAD, and POST. While it must accommodate multiple client connections over time, it only needs to handle one connection at a time, with each connection limited to a single request-response exchange.

This is illustrated in the following example of a typical non-persistent transaction, starting with Figure 3.

Important reminder: All header field names, and all header field values that we are dealing with directly, should be treated as case-insensitive.

  1. Client-Proxy Connection
  2. Client Request
  3. Proxy-Server Connection
  4. Client Request Forwarding

A client sending a GET request via a non-persistent proxy
Figure 3: A client sending a GET request via a proxy

  1. Server Response
  2. Proxy-Server Connection Termination
  3. Server Response Forwarding
  4. Client-Proxy Connection Termination

A server responding to a GET request via a non-persistent proxy
Figure 4: A server sending a response to a GET request via a non-persistent proxy

HEAD and POST requests follow quite similarly in the usual case, just accounting for potential differences in Message Body.

A basic test to perform a simple sanity check of your proxy is provided.


Basic Persistent Proxy

In this case, your proxy builds on the basic non-persistent proxy by allowing a client connection to be reused for multiple requests, which once again may include any combination of GET, HEAD, or POST requests.

This is illustrated in the Figure 5 example of a typical persistent session. It mirrors the basic non-persistent proxy example, with the minor differences noted below.

Important reminder: We do not expect your proxy to maintain persistent server connections.

A client sending multiple GET requests via a persistent proxy
Figure 5: A client sending multiple GET requests via a persistent proxy


HTTP Tunneling with CONNECT

At this point your proxy is quite functional, but it's limited to HTTP traffic, which is increasingly rare as most websites and services have transitioned to HTTPS for security and privacy.

However, an HTTP proxy implementing the CONNECT method can handle HTTPS traffic by tunneling encrypted connections, allowing it to support modern web applications securely.

Consider the typical example in Figure 6:

  1. Client-Proxy Connection
  2. Client Request
  3. Proxy-Server Connection
  4. Proxy Response
  5. Raw Data Relay
  6. Connection Termination

A client sending a CONNECT request to a proxy
Figure 6: A client sending a CONNECT request to a proxy

When implementing support for the CONNECT method, the proxy must be able to blindly forward traffic in both directions without any awareness of message framing. This means that once the tunnel is established, the proxy simply relays raw data between the client and the server, without interpreting or modifying it. A naïve single-threaded approach that reads from one socket and then writes to the other won’t work reliably, as it may block indefinitely if one side isn't actively sending data. This can lead to deadlocks where neither endpoint is receiving data because the proxy is stuck waiting on a read operation. To avoid this, implementations typically use multi-threading, non-blocking I/O (e.g., select(), poll()), or asynchronous event loops to ensure data is forwarded as soon as it's available on either side.


Error Handling

Connections may close unexpectedly at any time, whether due to network conditions or intentional termination. Your proxy must handle such events gracefully to ensure robustness. Failure to do so may cause it to break under even basic test scenarios.

Additionally, your proxy should handle explicit error conditions as outlined below. In most cases, it should generate and send an HTTP response with the appropriate status code and reason phrase. The response must include a representation (i.e., a response body) providing some minimal information about the error, including the mentioned required phrases. You may use any text-based format, such as text/html or text/plain, but ensure the response adheres to that format and includes properly set Content-Type and Content-Length headers.

Explicit Error Conditions


Caching

Your proxy should implement an in-memory cache that stores cacheable responses. This is to reduce the response time and network bandwidth consumption on future equivalent requests.

Parameters for the cache are provided as command-line arguments to the proxy, specifically:

When considering the object or cache size, your proxy must only count bytes used to store the actual objects. Metadata, such as response headers and timestamps, will also need to be stored, but for the purposes of these calculations should be ignored.

We take a very simplified view of HTTP caching, as outlined in this section, ignoring all cache directives that may be included in messages.

A response is only cacheable if it is:

If caching such a response would cause the cache to exceed max_cache_size, then one or more responses must first be evicted using a least recently used (LRU) replacement policy. No more responses should be evicted than necessary.

Upon receiving a GET request, your proxy should first check its cache to see if the request can be satisfied by a stored response. If it can, a "cache hit" occurs, and the proxy should respond directly using the stored response. Otherwise, a "cache miss" occurs, and the proxy should forward the request to the origin server.

The normalised target URL of a GET request is the "cache key", which is used to identify a stored response. Normalisation of the target URL considers the following rules:

For example, all of the following URLs are equivalent:

http://example.com http://example.com/ HTTP://EXAMPLE.COM/ HTTP://EXAMPLE.COM:80/ HTTP://EXAMPLE.COM:0080/

While none of the following URLs are equivalent:

http://example.com/ http://example.com/index.html http://example.com/INDEX.HTML http://example.com:8080/INDEX.HTML http://example.com:8080/INDEX.HTML?FOO=BAR

Request methods other than GET should simply bypass the cache.

If your proxy supports concurrency, then you should be particularly careful about how it interacts with any shared data structures.

Note, marking will rely entirely on your logging to assess the operation of your cache. If your proxy does not produce a log, then it will not be possible to award any marks for caching.


Logging

Your proxy should write to standard output a log of each HTTP transaction as it's completed. The log should follow a variant of the Common Log Format, with the following syntax:

host port cache date request status bytes

Where:

Here is an example:

127.0.0.1 56150 - [22/May/2025:10:17:24 +1000] "HEAD http://www.example.org/ HTTP/1.1" 200 0 127.0.0.1 56149 M [22/May/2025:10:17:24 +1000] "GET http://www.example.org/ HTTP/1.1" 200 648 127.0.0.1 56154 - [22/May/2025:10:17:24 +1000] "POST http://httpbin.org/post HTTP/1.1" 200 480 127.0.0.1 56150 H [22/May/2025:10:17:25 +1000] "GET http://www.example.org/ HTTP/1.1" 200 648 127.0.0.1 56156 - [22/May/2025:10:17:25 +1000] "CONNECT api.github.com:443 HTTP/1.1" 200 0 127.0.0.1 56149 M [22/May/2025:10:17:24 +1000] "GET http://httpstat.us/404?sleep=5000 HTTP/1.1" 404 13

A basic test to perform a simple sanity check of your log format is provided.


Concurrency

Extend your proxy to handle multiple client connections concurrently. This means your proxy should be able to process requests from multiple clients at the same time, rather than handling them sequentially. You may use threading, multiprocessing, or asynchronous I/O to achieve this. Ensure proper synchronisation when accessing any shared resources, such as the cache, to prevent race conditions.

Reminder: You must not use any ready-made server or HTTP libraries to implement any aspects of this assignment.


Report

Submit a short report (no more than 3 pages) named report.pdf. The report should cover the following sections:

  1. Programming Language and Code Organisation

  2. High-Level Design

  3. Data Structures

  4. Limitations

  5. Acknowledgments


Tips on Getting Started


This is a complex assignment, and the best way to tackle a complex task is to start early and to do it in stages.

Before attempting this asssignment, we advise that you read this specification and the FAQ in full, more than once, and finish Lab 2 and Lab 3.

Lab 3, in particular, may serve as an excellent starting point for this assignment.

  1. Understand Socket Basics

  2. Understand HTTP/1.1 Basics

  3. Data Structure Design

  4. Plan and Document

  5. Break Down the Problem

  6. Start with Logging

  7. Handle Errors Early and Gracefully

  8. Handle Basic Proxy Functionality

  9. Implement Client-Proxy Persistence

  10. Introduce Caching

  11. Test and Debug Incrementally

  12. Focus on Concurrency Later

  13. Read the Requirements Carefully


Testing and Debugging


HTTP is a ubiquitous protocol, so fortunately there are many tools and services that you can use to debug and test your proxy. Simply bypassing your proxy also gives you a convenient mechanism to determine the expected response for a given request.

Some tools and services are outlined in Useful Tools for Testing and Debugging. Marking will utilise similar tools. During development you should endeavour to use multiple user agents and communicate with as many origin servers as possible.

It is imperative that you rigorously test your code to ensure that all possible (and logical) interactions can be correctly executed. Test, test, and test.


Submission


Please ensure that you use the mandated file names of report.pdf and, for the entry point of your application, one of:

If you are using C or Java, then you must additionally submit a Makefile. This is because we need to know how to resolve any dependencies. See Sample Client-Server Programs and Networking Programming Resources for a guide on writing a Makefile.

After running make, we should have one of the following executable files:

Submission is via give using the following command syntax:

$ give cs3331 assign <file1> [<file2> ... <fileN>]

Note, this is the same command for both COMP3331 and COMP9331 students.

If your codebase does not rely on a directory structure then you may submit the files directly. For example, assuming your implementation is in C, and you additionally have helper.c and helper.h files that your Makefile expects to find in the same directory as proxy.c:

$ give cs3331 assign report.pdf Makefile proxy.c helper.c helper.h

If your codebase relies on some directory structure, for example you've created helper functions or classes in a sub-directory to your main program, you must first tar the parent directory as assign.tar. For instance, assuming a directory assign contains all the relevant files and sub-directories (including your report), open a terminal and navigate to the parent directory, then execute:

$ tar -cvf assign.tar assign $ give cs3331 assign assign.tar

Please do not submit any build artefacts, test files/programs, or other particulars that are not required to compile and run your application.

Upon running give, ensure that your submission is accepted. You may submit often. Only your last submission will be marked.

Emailing your code to course staff will not be considered as a submission.

Submitting the wrong files, failing to submit certain files, failing to complete the submission process, or simply failing to submit, will not be considered as grounds for re-assessment.

If you wish to validate your submission, you may execute:

$ 3331 classrun -check assign # show submission status $ 3331 classrun -fetch assign # fetch most recent submission

Important: It is your responsibility to ensure that your submission is accepted, and that your submission is what you intend to have assessed. No exceptions.


Late Submission Policy

Late submissions will incur a 5% per day penalty, for up to 5 days, calculated on the achieved mark. Each day starts from the deadline and accrues every 24 hours.

For example, an assignment otherwise assessed as 12/20, submitted 49 hours late, will incur a 3 day x 5% = 15% penalty, applied to 12, and be awarded 12 x 0.85 = 10.2/20.

Submissions after 5 days from the deadline will not be accepted unless an extension has been granted, as detailed in Special Consideration and Equitable Learning Services.


Special Consideration and Equitable Learning Services

Applications for Special Consideration must be submitted to the university via the Special Consideration portal. Course staff do not accept or approve special consideration requests.

Students who are registered with Equitable Learning Services must email cs3331@cse.unsw.edu.au to request any adjustments based on their Equitable Learning Plan.

Any requested and approved extensions will defer late penalties and submission closure. For example, a student who has been approved for a 3 day extension, will not incur any late penalties until 3 days after the standard deadline, and will be able to submit up to 8 days after the standard deadline.


Plagiarism


Group submissions will not be allowed. Your programs must be entirely your own work. Plagiarism detection software will be used to compare all submissions pairwise (including submissions for similar assessments in previous years, if applicable) and serious penalties will be applied, including an entry on UNSW's plagiarism register.

You are not permitted to use code generated with the help of automatic tools such as GitHub Copilot, ChatGPT, Google Bard.

Please refer to the online sources to help you understand what plagiarism is and how it is dealt with at UNSW:


Marking Rubric


Important Reminder: Your proxy must compile and run within the CSE environment. Ensure it is thoroughly tested in that environment.

Functionality Marks
Basic Non-Persistent Proxy:
  - GET 2
  - HEAD 1
  - POST 1
Basic Persistent Proxy:
  - GET only 1
  - GET + HEAD + POST 2
Via Header 1
CONNECT 2
Explicit Error Conditions 2
Logging 1
Caching 2
Non-Persistent Concurrency 1
Persistent Concurrency 1
Stress Test 1
Report 1
Code Quality 1
Total 20

No particular coding style is mandated, just ensure your code style is consistent, your code is clean, and your code is adequately documented.


⇧ Back to top