Due: 16:59 Thursday, 31 July 2025 (Week 9)
Date | Description |
---|---|
29/05/2025 | Initial release |
The Hypertext Transfer Protocol (HTTP) is the backbone of the World Wide Web, enabling everything from browsing websites and streaming videos to online banking, shopping, and social media. Every time you load a webpage, send a message, or access cloud-based applications, HTTP facilitates communication between your device and remote servers, making modern internet usage seamless and efficient.
HTTP allows the use of intermediaries to handle requests through a chain of connections. One common type of intermediary is a proxy, a message-forwarding agent that sits between a client (such as a Web browser) and an origin server. Instead of connecting directly to the origin server, the client sends its request to the proxy, which then forwards it on behalf of the client. The origin server’s response is then relayed back through the proxy to the client.
Because a proxy handles both incoming and outgoing requests, it effectively acts as both a client and a server—a server to the clients making requests and a client to the servers fulfilling them.
Figure 1: The proxy as a middleman
Proxies serve a variety of purposes, including:
In this assignment, you will implement your own HTTP/1.1 proxy in C, Java or Python. Your proxy should:
CONNECT
, GET
, HEAD
, and POST
methods.Your proxy must compile and run within the CSE environment. Ensure it is thoroughly tested in that environment.
You are only permitted to use the basic libraries for socket programming. You must not use any ready-made server or HTTP libraries to implement any aspects of this assignment. Doing so will likely result in a mark of zero. If in doubt, please check with course staff on the forum.
This is an individual assignment and is worth 20 marks.
This assignment specification is the authoritative reference for your implementation. While it is based on HTTP standards, to simplify your task certain aspects may deviate from or contradict official specifications. In such cases, the requirements outlined here take precedence. If you encounter any ambiguity, seek clarification via the course forum rather than relying on official specifications.
By completing this assignment, you will gain a deeper understanding of:
proxy.c
(if implementing in C)Proxy.java
(if implementing in Java)proxy.py
(if implementing in Python)Makefile
(not required for Python)
make
should produce an executable named:
proxy
(if implementing in C)Proxy.class
(if implementing in Java)report.pdf
Your proxy must accept 4 command-line arguments:
port
: the TCP port the proxy will listen on for client connections. We recommend using a random port number between 49152 and 65535 (the dynamic port number range).timeout
: a strictly positive integer, in seconds. It's the duration a client or server may be idle or unresponsive.max_object_size
: a strictly positive integer, in bytes. It's the maximum object size that can be cached.max_cache_size
: a strictly positive integer, in bytes, and at least equal to max_object_size
. It's the total amount of object data that may be cached.It should be initiated as follows:
$ ./proxy <port> <timeout> <max_object_size> <max_cache_size> # for C
$ java Proxy <port> <timeout> <max_object_size> <max_cache_size> # for Java
$ python3 proxy.py <port> <timeout> <max_object_size> <max_cache_size> # for Python
For example, running one of:
$ ./proxy 60893 10 1024 1048576 # for C
$ java Proxy 60893 10 1024 1048576 # for Java
$ python3 proxy.py 60893 10 1024 1048576 # for Python
Would mean the proxy should:
60893
(avoid this port on CSE, other students will likely try and use it simply because it's in the spec!).10
seconds.1024
bytes.1048576
bytes.You must ensure that none of these values are hard-coded. They must be configurable via the command-line arguments.
HTTP is a stateless request/response protocol used for exchanging messages over a network connection. These messages fall into two categories: requests, which clients send to initiate an action on the server, and responses, which servers return to fulfill those requests.
Both types of messages share a common structure. They begin with a start-line, followed by zero or more header field lines (collectively called the headers or header section), an empty line marking the end of the headers, and an optional message body. The line terminator for the start-line and header fields is the sequence CRLF (\r\n
, representing carriage return and line feed).
Structurally, the only difference between request and response messages is the start-line:
Figure 2: Anatomy of an HTTP message (Source: MDN Web Docs)
The request method defines the purpose of the request and what the client expects as a successful result.
GET
: Requests a resource from the server.
HEAD
: Requests the headers (only) that would by returned if it were a GET
request.
POST
: Sends data to the server, often to create or update a resource.
CONNECT
: Establishes a tunnel to a server, typically used for HTTPS over a proxy. The client sends CONNECT
, and if successful, raw data flows between the client and the server.
The request target specifies the resource on which the method should be applied. The form of the request target varies depending on the HTTP method and whether the request is sent through a proxy.
Origin-Form
/
as the path within the origin-form of a request-target.Example:
GET /index.html?query=123 HTTP/1.1
Absolute-Form
CONNECT
requests).http
), hostname, optional port, path, and query string.
80
for http
.Example:
HEAD http://example.com:8080/index.html?query=123 HTTP/1.1
Authority-Form
CONNECT
method when requesting a proxy to establish a tunnel to a remote server.Example:
CONNECT example.com:443 HTTP/1.1
HTTP response status codes indicate whether a specific HTTP request has been successfully completed. Responses are grouped into five classes:
Header fields provide metadata about an HTTP message. They follow this format, where the field name and field value are colon separated:
field-name: field-value
Field names are case-insensitive, and there should be no leading or trailing whitespace.
Field values themselves do not have any leading or trailing whitespace, however a field value may be surrounded by optional whitespace.
Some header fields may be multi-valued, as an ordered, comma-separated list:
field-name: field-value-1, field-value-2, ..., field-value-n
As part of this assignment, before forwarding a message, your proxy will need to identify, and potentially remove, modify, or insert, particular header fields. These fields are briefly described here. Any other fields that may be present should simply be forwarded unchanged.
Connection
/ Proxy-Connection
The Connection
header allows the sender to list desired control options
for the current connection. Most notably, it controls whether the network
connection stays open (i.e. persists) after the current transaction
finishes.
If a sender does not intend to keep the connection open beyond the current
request-response exchange, then they must specify a close
directive:
Connection: close
Otherwise, they may specify a list of desired control options, for example:
Connection: keep-alive, upgrade
In HTTP/1.1, connections are persistent by default, meaning unless there's an
explicit close
directive in the message, the connection should remain open,
regardless of whether there is a keep-alive
directive.
Some legacy user agents may send a non-standard Proxy-Connection
header
instead of or in addition to Connection
, attempting to control connection persistence with proxies. This is not part of the official HTTP specification,
but proxies should handle both headers appropriately. Effectively, they
should be regarded as equivalent, except where both are present, in which case Connection
takes precedence.
Note, the header values, like the header name, are not case sensitive.
Transfer-Encoding
The Transfer-Encoding
header lists any transformations that have been
applied to the content in order to form the message body.
For example, the following header indicates that the content has been
compressed in gzip
format and that it will be sent as a series of chunks:
Transfer-Encoding: gzip, chunked
Chunked transfer encoding allows data to be sent in multiple parts, each prefixed with its size, without knowing the total size in advance.
If a message is received with both a Transfer-Encoding
and a
Content-Length
header, the Transfer-Encoding
overrides the
Content-Length
.
Note, the header values, like the header name, are not case sensitive.
Content-Length
If a message doesn’t have a Transfer-Encoding
header, the Content-Length
header can specify the expected size of the content in bytes.
When a message includes content, Content-Length
can help determine where the data and message end.
For messages without content, it indicates the size of the selected representation.
For example, a response to a HEAD
request does not include content, but may
include a Content-Length
.
Via
The Via
header is used for tracking message forwards. The syntax is a
comma-separated list of <protocol-version> <pseudonym>
pairs.
For example, the following header indicates that the message passed through
an intermediary that supports HTTP/1.0 with the pseudonym foo
, then an
intermediary that supports HTTP/1.1 with the pseudonym bar
.
Via: 1.0 foo, 1.1 bar
The message body (if any) is used to carry content for the request or response. The message body is identical to the content unless a transfer coding has been applied.
The rules for determining when a message body is present in a message differ for requests and responses:
Content-Length
header field.The length of a message body is determined by one of the following (in order of precedence):
HEAD
request and any response with a 204
(No Content) or 304
(Not Modified) status code is always terminated by the first empty line after the header fields, regardless of the header fields present in the message, and thus cannot contain a message body.Transfer-Encoding
header field is present in a response, the message body length is determined by reading the connection until it is closed by the server.
Content-Length
header field is present without Transfer-Encoding
, its decimal value defines the expected message body length in bytes. If the sender closes the connection or the recipient times out before the indicated number of bytes are received, the recipient must consider the message to be incomplete and close the connection.As a general guide, when parsing an HTTP message, the typical approach is:
In the simplest case, your proxy will be non-persistent and handle requests sequentially, supporting GET
, HEAD
, and POST
. While it must accommodate multiple client connections over time, it only needs to handle one connection at a time, with each connection limited to a single request-response exchange.
This is illustrated in the following example of a typical non-persistent transaction, starting with Figure 3.
Important reminder: All header field names, and all header field values that we are dealing with directly, should be treated as case-insensitive.
80
.GET
, so no
body is expected and the message is considered complete.Connection
header with Connection: close
.Proxy-Connection
header.Via
header with 1.1 [your zID]
.
Figure 3: A client sending a GET request via a proxy
GET
request and has a status code of 200
,
therefore a body is expected.Transfer-Encoding
has been specified, but a
Content-Length
of 1256 bytes is indicated, so continue to read data
if and as necessary until the full body is received.Connection
header with Connection: close
.Via
header with 1.1 [your zID]
.
Figure 4: A server sending a response to a GET request via a non-persistent proxy
HEAD
and POST
requests follow quite similarly in the usual case, just accounting for potential differences in Message Body.
A basic test to perform a simple sanity check of your proxy is provided.
In this case, your proxy builds on the basic non-persistent proxy by allowing a client connection to be reused for multiple requests, which once again may include any combination of GET
, HEAD
, or POST
requests.
This is illustrated in the Figure 5 example of a typical persistent session. It mirrors the basic non-persistent proxy example, with the minor differences noted below.
Important reminder: We do not expect your proxy to maintain persistent server connections.
Connection: close
header.Connection
header needs to be set according to the client preference.close
the Connection
, the client has timed out the
proxy, or the proxy has timed out the client.
Figure 5: A client sending multiple GET requests via a persistent proxy
CONNECT
At this point your proxy is quite functional, but it's limited to HTTP traffic, which is increasingly rare as most websites and services have transitioned to HTTPS for security and privacy.
However, an HTTP proxy implementing the CONNECT
method can handle HTTPS traffic
by tunneling encrypted connections, allowing it to support modern web applications
securely.
Consider the typical example in Figure 6:
CONNECT
, the request target is expected in authority-form.CONNECT
, no body is expected.443
.HTTP/1.1 200 Connection Established
response to the
client, with no additional headers and no body.
Figure 6: A client sending a CONNECT request to a proxy
When implementing support for the CONNECT
method, the proxy must be able to blindly forward traffic in both directions without any awareness of message framing. This means that once the tunnel is established, the proxy simply relays raw data between the client and the server, without interpreting or modifying it. A naïve single-threaded approach that reads from one socket and then writes to the other won’t work reliably, as it may block indefinitely if one side isn't actively sending data. This can lead to deadlocks where neither endpoint is receiving data because the proxy is stuck waiting on a read operation. To avoid this, implementations typically use multi-threading, non-blocking I/O (e.g., select()
, poll()
), or asynchronous event loops to ensure data is forwarded as soon as it's available on either side.
Connections may close unexpectedly at any time, whether due to network conditions or intentional termination. Your proxy must handle such events gracefully to ensure robustness. Failure to do so may cause it to break under even basic test scenarios.
Additionally, your proxy should handle explicit error conditions as outlined below. In most cases, it should generate and send an HTTP response with the appropriate status code and reason phrase. The response must include a representation (i.e., a response body) providing some minimal information about the error, including the mentioned required phrases. You may use any text-based format, such as text/html
or text/plain
, but ensure the response adheres to that format and includes properly set Content-Type
and Content-Length
headers.
Handling Client Timeouts
timeout
seconds, the proxy should gracefully close
the socket and move on.
Malformed CONNECT
request
443
is specified:
400 Bad Request
with "invalid port
" mentioned in the body.Invalid Request Target
400 Bad Request
with "no host
" mentioned in the body.421 Misdirected Request
with "proxy address
" mentioned in the
body.Connection Issues
502 Bad Gateway
with "connection refused
" mentioned in the body.502 Bad Gateway
with "could not resolve
" mentioned in the body.502 Bad Gateway
with "closed unexpectedly
" mentioned in the body.timeout
seconds:
504 Gateway Timeout
with "timed out
" mentioned in the body.Your proxy should implement an in-memory cache that stores cacheable responses. This is to reduce the response time and network bandwidth consumption on future equivalent requests.
Parameters for the cache are provided as command-line arguments to the proxy, specifically:
max_object_size
in bytes, a strictly positive integer.max_cache_size
in bytes, an integer at least equal to max_object_size
.When considering the object or cache size, your proxy must only count bytes used to store the actual objects. Metadata, such as response headers and timestamps, will also need to be stored, but for the purposes of these calculations should be ignored.
We take a very simplified view of HTTP caching, as outlined in this section, ignoring all cache directives that may be included in messages.
A response is only cacheable if it is:
GET
request;200
; andmax_object_size
.If caching such a response would cause the cache to exceed max_cache_size
, then one or more responses must first be evicted using a least recently used (LRU) replacement policy. No more responses should be evicted than necessary.
Upon receiving a GET
request, your proxy should first check its cache to see if the request can be satisfied by a stored response. If it can, a "cache hit" occurs, and the proxy should respond directly using the stored response. Otherwise, a "cache miss" occurs, and the proxy should forward the request to the origin server.
The normalised target URL of a GET
request is the "cache key", which is used to identify a stored response. Normalisation of the target URL considers the following rules:
For example, all of the following URLs are equivalent:
http://example.com
http://example.com/
HTTP://EXAMPLE.COM/
HTTP://EXAMPLE.COM:80/
HTTP://EXAMPLE.COM:0080/
While none of the following URLs are equivalent:
http://example.com/
http://example.com/index.html
http://example.com/INDEX.HTML
http://example.com:8080/INDEX.HTML
http://example.com:8080/INDEX.HTML?FOO=BAR
Request methods other than GET
should simply bypass the cache.
If your proxy supports concurrency, then you should be particularly careful about how it interacts with any shared data structures.
Note, marking will rely entirely on your logging to assess the operation of your cache. If your proxy does not produce a log, then it will not be possible to award any marks for caching.
Your proxy should write to standard output a log of each HTTP transaction as it's completed. The log should follow a variant of the Common Log Format, with the following syntax:
host port cache date request status bytes
Where:
host
: is the IP address of the client which made the request.port
: is the port of the client which made the request.cache
: is -
for all requests other than GET
, in which case it is either H
for a cache hit or M
for a cache miss.date
: is the date, time, and time zone that the request was received, in strftime
format %d/%b/%Y:%H:%M:%S %z
, enclosed in square brackets. Please note that if your proxy supports concurrency, then transactions may appear to be logged out of order and is to be expected.request
: is the request line from the client, as received by the proxy, enclosed in double quotation marks.status
: is the HTTP status code returned to the client.bytes
: is the size of the object returned to the client, not including the response headers.Here is an example:
127.0.0.1 56150 - [22/May/2025:10:17:24 +1000] "HEAD http://www.example.org/ HTTP/1.1" 200 0
127.0.0.1 56149 M [22/May/2025:10:17:24 +1000] "GET http://www.example.org/ HTTP/1.1" 200 648
127.0.0.1 56154 - [22/May/2025:10:17:24 +1000] "POST http://httpbin.org/post HTTP/1.1" 200 480
127.0.0.1 56150 H [22/May/2025:10:17:25 +1000] "GET http://www.example.org/ HTTP/1.1" 200 648
127.0.0.1 56156 - [22/May/2025:10:17:25 +1000] "CONNECT api.github.com:443 HTTP/1.1" 200 0
127.0.0.1 56149 M [22/May/2025:10:17:24 +1000] "GET http://httpstat.us/404?sleep=5000 HTTP/1.1" 404 13
A basic test to perform a simple sanity check of your log format is provided.
Extend your proxy to handle multiple client connections concurrently. This means your proxy should be able to process requests from multiple clients at the same time, rather than handling them sequentially. You may use threading, multiprocessing, or asynchronous I/O to achieve this. Ensure proper synchronisation when accessing any shared resources, such as the cache, to prevent race conditions.
Reminder: You must not use any ready-made server or HTTP libraries to implement any aspects of this assignment.
Submit a short report (no more than 3 pages) named report.pdf
. The report should cover the following sections:
Programming Language and Code Organisation
Makefile
).High-Level Design
Data Structures
Limitations
Acknowledgments
This is a complex assignment, and the best way to tackle a complex task is to start early and to do it in stages.
Before attempting this asssignment, we advise that you read this specification and the FAQ in full, more than once, and finish Lab 2 and Lab 3.
Lab 3, in particular, may serve as an excellent starting point for this assignment.
Understand Socket Basics
socket()
– Creating a socket with appropriate domain (AF_INET
for IPv4) and type (SOCK_STREAM
for TCP).bind()
– Binding a socket to an IP address and port (for servers).listen()
– Marking a socket as passive to accept incoming connections.accept()
– Accept a new client connection.connect()
– Initiating a connection from the client to the server.recv()
calls.Understand HTTP/1.1 Basics
Connection
.Data Structure Design
Plan and Document
Break Down the Problem
GET
, before moving to HEAD
and POST
.CONNECT
.Start with Logging
Handle Errors Early and Gracefully
Handle Basic Proxy Functionality
Implement Client-Proxy Persistence
Connection
header.Introduce Caching
Test and Debug Incrementally
Focus on Concurrency Later
Read the Requirements Carefully
HTTP is a ubiquitous protocol, so fortunately there are many tools and services that you can use to debug and test your proxy. Simply bypassing your proxy also gives you a convenient mechanism to determine the expected response for a given request.
Some tools and services are outlined in Useful Tools for Testing and Debugging. Marking will utilise similar tools. During development you should endeavour to use multiple user agents and communicate with as many origin servers as possible.
It is imperative that you rigorously test your code to ensure that all possible (and logical) interactions can be correctly executed. Test, test, and test.
Please ensure that you use the mandated file names of report.pdf
and, for the entry point of your application, one of:
proxy.c
proxy.py
Proxy.java
If you are using C or Java, then you must additionally submit a Makefile
. This is because we need to know how to resolve any dependencies. See Sample Client-Server Programs and Networking Programming Resources for a guide on writing a Makefile
.
After running make
, we should have one of the following executable files:
proxy
(for C)Proxy.class
(for Java)Submission is via give
using the following command syntax:
$ give cs3331 assign <file1> [<file2> ... <fileN>]
Note, this is the same command for both COMP3331 and COMP9331 students.
If your codebase does not rely on a directory structure then you may submit the files directly. For example, assuming your implementation is in C, and you additionally have helper.c
and helper.h
files that your Makefile
expects to find in the same directory as proxy.c
:
$ give cs3331 assign report.pdf Makefile proxy.c helper.c helper.h
If your codebase relies on some directory structure, for example you've created helper functions or classes in a sub-directory to your main program, you must first tar
the parent directory as assign.tar
. For instance, assuming a directory assign
contains all the relevant files and sub-directories (including your report), open a terminal and navigate to the parent directory, then execute:
$ tar -cvf assign.tar assign
$ give cs3331 assign assign.tar
Please do not submit any build artefacts, test files/programs, or other particulars that are not required to compile and run your application.
Upon running give
, ensure that your submission is accepted. You may submit often. Only your last submission will be marked.
Emailing your code to course staff will not be considered as a submission.
Submitting the wrong files, failing to submit certain files, failing to complete the submission process, or simply failing to submit, will not be considered as grounds for re-assessment.
If you wish to validate your submission, you may execute:
$ 3331 classrun -check assign # show submission status
$ 3331 classrun -fetch assign # fetch most recent submission
Important: It is your responsibility to ensure that your submission is accepted, and that your submission is what you intend to have assessed. No exceptions.
Late submissions will incur a 5% per day penalty, for up to 5 days, calculated on the achieved mark. Each day starts from the deadline and accrues every 24 hours.
For example, an assignment otherwise assessed as 12/20, submitted 49 hours late, will incur a 3 day x 5% = 15% penalty, applied to 12, and be awarded 12 x 0.85 = 10.2/20.
Submissions after 5 days from the deadline will not be accepted unless an extension has been granted, as detailed in Special Consideration and Equitable Learning Services.
Applications for Special Consideration must be submitted to the university via the Special Consideration portal. Course staff do not accept or approve special consideration requests.
Students who are registered with Equitable Learning Services must email cs3331@cse.unsw.edu.au to request any adjustments based on their Equitable Learning Plan.
Any requested and approved extensions will defer late penalties and submission closure. For example, a student who has been approved for a 3 day extension, will not incur any late penalties until 3 days after the standard deadline, and will be able to submit up to 8 days after the standard deadline.
Group submissions will not be allowed. Your programs must be entirely your own work. Plagiarism detection software will be used to compare all submissions pairwise (including submissions for similar assessments in previous years, if applicable) and serious penalties will be applied, including an entry on UNSW's plagiarism register.
You are not permitted to use code generated with the help of automatic tools such as GitHub Copilot, ChatGPT, Google Bard.
Please refer to the online sources to help you understand what plagiarism is and how it is dealt with at UNSW:
Important Reminder: Your proxy must compile and run within the CSE environment. Ensure it is thoroughly tested in that environment.
Functionality | Marks |
---|---|
Basic Non-Persistent Proxy: | |
- GET |
2 |
- HEAD |
1 |
- POST |
1 |
Basic Persistent Proxy: | |
- GET only |
1 |
- GET + HEAD + POST |
2 |
Via Header |
1 |
CONNECT |
2 |
Explicit Error Conditions | 2 |
Logging | 1 |
Caching | 2 |
Non-Persistent Concurrency | 1 |
Persistent Concurrency | 1 |
Stress Test | 1 |
Report | 1 |
Code Quality | 1 |
Total | 20 |
No particular coding style is mandated, just ensure your code style is consistent, your code is clean, and your code is adequately documented.