COMP(2041|9044) Shell Style Guide
Contents
Shell Language & Shell Scripts
Language Features
Shell scripts must be written in POSIX-compliant shell syntax.
The POSIX definition of the "Shell Command Language" can be found in IEEE Std 1003.1-2024.
For simplicity, however, we consider shell features that are available in the dash(1) shell to be POSIX-compliant. In other words, if your shell script executes correctly with dash, you can assume it is POSIX-compliant.
Your shell scripts cannot contain shell features from bash(1), zsh(1), or other shells.
Language Features
The goal of this course is to teach you how to write shell scripts that are portable. We want your shell scripts to be compatible on as many platforms as possible with minimal to no modification. To achieve this, your shell scripts should be written in the subset of shell that is common to all shell languages; this is what POSIX describes.
Other shell languages like bash and zsh implement extensions on top of POSIX. Writing shell scripts that do not rely on these extensions allows scripts to be used in both bash and zsh, as well as other shells.
File Extensions
Shell scripts must have a
.shfile extension, unless an activity specifies otherwise.An activity may specify to use no file extension.
File Extensions
Tools and editors may rely on the file extension to detect a source file's language.
If a script is made available on a user's
PATH, the extension may be omitted so that the script can be executed like any other command.
File Extensions
test.sh <- correct file extension
autotest <- correct, assuming the activity specified no extension
File Extensions
test.shell <- incorrect file extension
build <- incorrect, assuming the activity specified to include an extension
Hashbang
Shell scripts must start with a hashbang line.
The hashbang line must specify dash(1) as the shell interpreter.
Hashbang
The hashbang line (also called a shebang line) is used to specify the shell interpreter to use. A hashbang line starts with
#!and contains the path to a shell interpreter.As we are using the dash shell, we need to specify the path to the dash interpreter. The hashbang can directly specify the path to the dash interpreter, or it can specify the path to the env(1) utility, which will then run the dash interpreter. Using the env utility is recommended.
Specifying sh(1) as the interpreter might run dash(1) or bash(1), depending on the platform, and thus is not allowed.
The hashbang characters
#!must be the first two characters in the file.The hashbang is not valid if there is whitespace before the
#!characters.The hashbang is valid if there is whitespace after the
#!characters and before the path. Including a single space after the hashbang is recommended as it is easier to read the path.
Hashbang
#! /usr/bin/env dash <- recommended
#!/bin/dash <- also acceptable
Hashbang
#! /usr/bin/env bash <- not allowed, as it specifies the bash interpreter
#!/bin/sh <- not allowed, as it is ambiguous which shell will be used
Permissions
Shell scripts must have read and execute permissions enabled for all users.
Permissions
In order to execute a shell script with
./file.sh, the file must have read and execute permissions enabled.To enable these permissions, use the
chmodcommand:chmod a+rx file.shor
chmod 755 file.sh
Without the execute permission, the file can still be run with
dash file.shThe read permission is still required though.
Permissions
$ ls -l isPrime.sh
-rwxr-xr-x 1 cs2041 cs2041 2.2K May 30 02:37 isPrime.sh <- correct permissions
Permissions
$ ls -l isEven.sh isOdd.sh
-rw-r--r-- 1 cs2041 cs2041 2.2K May 30 02:37 isEven.sh <- missing execute permission
-rwxr-x--- 1 cs2041 cs2041 2.2K May 30 02:37 isOdd.sh <- other has no permissions
Recommended Code Layout
Header Comment
All programs must start with a header comment.
Header Comment
The header comment must be at the top of the file, immediately after the hashbang line, with a single empty line between the hashbang line and the header comment.
There must be a single empty line between the header comment and the first line of your program.
The header comment should contain at least:
- the name of the activity
- the name of the file
- the name, zID, and email address of the author
- the date it was written
- a description of what the program does.
The preferred style for a header comment is:
# ACTIVITY-NAME-HERE # FILE-NAME-HERE # # This program was written by YOUR-NAME-HERE (YOUR-ZID-HERE) <YOUR-EMAIL-HERE> # on DATE-HERE # # DESCRIPTION-HEREIf the program is being maintained over a long period of time, the header comment should also contain a changelog.
# ACTIVITY-NAME-HERE # FILE-NAME-HERE # # DESCRIPTION-HERE # # DATE-HERE # YOUR-NAME-HERE (YOUR-ZID-HERE) <YOUR-EMAIL-HERE> # - CHANGE-HERE # # DATE-HERE # YOUR-NAME-HERE (YOUR-ZID-HERE) <YOUR-EMAIL-HERE> # - CHANGE-HERE
Header Comment
#! /usr/bin/env dash
# COMP2041/9044 Lab06 - A Shell Script that Prints Itself
# quine.sh
#
# This program was written by Dylan Brotherston (z5115658) <d.brotherston@unsw.edu.au>
# on May 30, 2022
#
# This program is a quine.
# A program prints its own source code (minus comments).
b=\' c=\\ a='echo b=$c$b c=$c$c a=$b$a$b; echo $a'
echo b=$c$b c=$c$c a=$b$a$b; echo $a
#! /usr/bin/env dash
# COMP2041/9044 Assignment 1 - My Big Shell Project
# meaning_of_life.sh
#
# This program calculates and prints the meaning of life.
#
# 2024-02-05
# Dylan Brotherston (z5115658) <d.brotherston@unsw.edu.au>
# - Update the meaning of life to 42
#
# 2023-09-17
# Dylan Brotherston (z5115658) <d.brotherston@unsw.edu.au>
# - Initial version
echo 42
Header Comment
#!/bin/dash
is_prime() {
local n i
n=$1
i=2
while test $i -lt $n
do
test $((n % i)) -eq 0 &&
return 1
i=$((i + 1))
done
return 0
}
i=0
while test $i -lt 1000
do
is_prime $i &&
echo $i
i=$((i + 1))
done
Implementation Comments
Comments must be included when code may be unclear to the reader.
Comments should be on the line before the code they are describing, rather than on the same line.
Comments should be indented to the same level as the code they are describing.
Comments must be written in English.
Implementation Comments
Where your code may be unclear to a reader, or require further explanation, you should leave brief comments describing the purpose of the code you have written. Make sure that comments describe why your code is doing what it does and not just stating what the code does.
Comments that are not in English cannot be assessed; they're equivalent to no comments at all.
Implementation Comments
#! /usr/bin/env dash
# COMP2041/9044 Lab06 - Hello World Again
# hello.sh
#
# This program was written by Dylan Brotherston (z5115658)
# on May 30, 2022
#
# This program simply says hello.
# for each name in the command line
for name in $@; do
# Use grep to match a name that looks like a zID
if grep -E 'z[0-9]{7}'; then
# If the name is a zID, then use `acc` to find the corresponding name
# This allows us to say hello with the person's full name even if they only give use a zID
# This means that even on a CSE server `./hello.sh $(whoami)` will work
echo "Hello, $(acc format='$NAME')"
else
# If the name is not a zID, then just say hello to the name
echo "Hello, $name!"
fi
done
Control Flow – if, while,
until, for
Control flow structures must be formatted in a consistent style.
Control Flow – if, while,
until, for
One, and only one, style should be used to format control flow statements.
The style demonstrated in lectures, where the opening block-delimiting keyword is on the line following the control flow keyword, is the recommended style. The
thenanddomust be indented to the same level as their respectiveif,while,until, orfor.if [ "$x" -gt 0 ] then echo "true" fi while [ "$x" -lt "$y" ] do x=$((x + 1)) done for arg in "$@" do echo "$arg" doneThe style presented in Google's guide, where the opening block-delimiting keyword is on the same line as the control flow keyword, is also acceptable. The keywords
thenanddoshould be on the same line as their respectiveif,while,until, orfor, and they should be separated from the condition by a single semicolon and a single space.if [ "$x" -gt 0 ]; then echo "x is positive" fi while [ "$x" -lt "$y" ]; do x=$((x + 1)) done for arg in "$@"; do echo "$arg" doneAcross all styles, the
fi,elif,done, andelsecontrol flow keywords must be indented to the same level as their correspondingif,while,until, orforkeyword.
Control Flow – if, while,
until, for
if ! [ $a -gt $b ]
then
echo "a is not greater than b"
elif ! [ $a -lt $b ]
then
echo "a not is less than b"
else
echo "a is equal to b"
fi
if [ $a -gt $b ]; then
echo "a is greater than b"
elif [ $a -lt $b ]; then
echo "a is less than b"
else
echo "a is equal to b"
fi
while [ $a -lt $b ]; do
isPrime $a
a=$((a-1))
done
until [ $a -ge $b ]; do
isPrime $a
a=$((a-1))
done
for file in $files; do
rm "$files"
done
Control Flow – if, while,
until, for
# Too many spaces after the semicolon
while [ $a -lt $b ]; do
isPrime $a
a=$((a-1))
done
# `do` and `done` should be indented in line with `until`
until [ $a -ge $b ]
do
isPrime $a
a=$((a-1))
done
# Too many empty lines before the `do`
for file in $files
do
rm "$files"
done
Control Flow – case
The body and alternatives of case statements must be formatted in a consistent style.
Control Flow – case
A
casestatement is used to select a single case from a list of cases.The style presented in the lectures places alternatives at the same level as the
casekeywords and then indents the bodies for each alternative.case word in pattern1) commands1 ;; pattern2) commands2 ;; esacIf both the alternatives and the bodies are short, a single line format can also be used.
case $# in 0) echo "You forgot to supply the argument" ;; 1) filename=$1 ;; *) echo "You supplied too many arguments" ;; esacIn Google's guide, the alternatives are indented by one level, and then the bodies of the alternatives are indented one level further. A similar style is used for single-line alternatives and bodies.
case "${expression}" in a) variable="…" some_command "${variable}" "${other_expr}" ;; absolute) actions="relative" another_command "${actions}" "${other_expr}" ;; *) error "Unexpected expression '${expression}'" ;; esac case "${flag}" in a) aflag='true' ;; b) bflag='true' ;; f) files="${OPTARG}" ;; v) verbose='true' ;; *) error "Unexpected option ${flag}" ;; esacOne style should be used consistently.
Control Flow – case
case $command in
login)
token="$(login_user "$user" "$password")"
if [ -z "$token" ]; then
echo "Login failed"
exit 1
else
echo "Login successful"
fi
;;
logout)
logout_user "$token"
;;
stats)
cat /proc/self/stat
cat /proc/self/status
cat /proc/self/sched
cat /proc/self/schedstat
cat /proc/self/limits
cat /proc/self/io
;;
cmd)
cat /proc/self/cmdline
cat /proc/self/environ
;;
*)
echo "Unknown command: $command" >&2
exit 1
;;
esac
case $level in
0) echo "Tutorial level" ;;
[1-9]) echo "Level $level" ;;
10) echo "Last level" ;;
*)
echo "Unknown level: $level" >&2
exit 1
;;
esac
Control Flow – Indentation
The body of control flow statements must be indented.
Control Flow – Indentation
The body of
whileloops,untilloops,forloops, andifstatements must be indented. Shell doesn't use braces for control flow like C. Instead, shell uses matching keywords to indicate the start and end of control flow. Everything between the start and end keyword must be indented.Between any pair of keywords, the indentation level should increase by one level. Acceptable indentations for one level are 4 spaces, 2 spaces or 1 tab. The chosen indentation should be used consistently throughout the code. Always indent by one level, and always use the same number of spaces or tabs to represent one level. Never use a mixture of spaces and tabs.
Control Flow – Indentation
while [ "$i" -lt 10 ]
do
echo "$i"
i=$((i+1))
n=$((n+i))
done
for arg in $@
do
echo "$arg"
done
if [ ! -e "$file_name" ]
then
echo "File does not exist"
fi
Control Flow – Indentation
# inconsistent indentation
while [ "$i" -lt 10 ]
do
echo "$i"
i=$((i+1))
n=$((n+i))
done
# no indentation
for arg in $@
do
echo "$arg"
done
# way too much indentation
if [ ! -e "$file_name" ]
then
echo "File does not exist"
fi
# mixing tabs and spaces
if [ "$1" = "$2" ]
then
echo "arg[1] and arg[2] are equal"
echo "that's a little redundant don't you think?"
fi
Nesting Depth
Avoid overly deep nesting.
Nesting Depth
While strict nesting depth requirements aren't specified in this course, you should consider whether you can simplify your approach once you reach 5 levels of nesting.
Spaces
Use a space after keywords such as
if, while, for, returnUse a space on each side of binary arithmetic operators such as
* / % + - < > <= >= == !=Do not include a space after prefix unary arithmetic operators such as
+ - ~ !Use a space on each side of I/O redirection constructs such as
< > >> 2> 2>>
Spaces
if [ $((a + b > c)) ]
then
i=$((i + 1))
echo "$i" >> "$log_file"
fi
Spaces
if [ $((a+b>c)) ] <- too many spaces after language keyword
then
i=$((i+1)) <- no spacing around binary operator
echo "$i">>"$log_file" <- no spacing around redirection construct
fi
Vertical Whitespace
Use vertical whitespace (blank lines between code) occasionally to indicate where the related parts of your code are.
Using no blank lines between lines of code indicates the lines should be understood together, or are closely related.
Using exactly one blank line indicates two sections of code are distinct – they split your code into "paragraphs".
Using exactly two blank line indicates two sections of code are entirely unrelated.
Using more than two blank lines is (almost) never appropriate.
Vertical Whitespace
Vertical whitespace is like chocolate – some of it significantly improves your life, but too much can also be its own problem; and when used in the wrong place, it can be very confusing.
Vertical Whitespace
# The two variables here are closely related, and should not be separated.
x_position=0
y_position=0
# The two if statements here are separate ideas and could be separated.
if [ "$x_position" -gt 0 ]
then
echo "X position is positive."
fi
if [ "$instruction" -eq "$INCREASE_X_POSITION" ]
then
x_position=$((x_position + 1))
fi
# We have used two empty lines here to indicate the the topic has changed.
echo "Please choose an item to use..."
Statements
Only one executable statement should be used per line of code.
Statements
When a single line contains multiple statements, it can be difficult to read.
Statements
if test $# = 3
then
url=$1 <- each statement is on a separate line
regexp=$2
email_address=$3
else
echo "Usage: $0 <url> <regex> <email-address>" 1>&2
exit 1
fi
Statements
if test $# = 3
then
url=$1; regexp=$2; email_address=$3 <- multiple statements on a single line
else
echo "Usage: $0 <url> <regex> <email-address>" 1>&2
exit 1
fi
Line Width
Keep lines under 80 characters.
Line Width
Break long lines up to keep them under 80 characters, unless keeping it as a longer line is significantly more readable.
At 120 characters, serious effort should be made to split or shorten the line.
Variables
Variable Names
Descriptive variable names should always be used where possible.
Short variable names such as
x,i,jorkare acceptable if there is no appropriate long descriptive name. This is often the case for variables used as loop counters.Variable names must begin with a lowercase letter.
Multi-word variable names should be in
snake_case.Constants should be in
SHOUTING_SNAKE_CASE.
Declaring Variables
Declare variables close to where they are first used.
Do not use the same variable name for multiple purposes.
Declaring Variables
Having a variable close to the place it is first used makes it easy to find and understand.
If you need a new variable, use a different name – reusing variable names can lead to subtle and hard to find bugs.
Functions
Function Purpose
Functions should have one clearly defined purpose.
Function Names
Function names should be descriptive, typically containing multiple words, and formatted in
snake_case.
Function Comments
Every function, which is not short and obvious, must have a comment describing its purpose and any side-effects the function has.
The comment should be placed above the function's implementation.
Function Comments
# Updates next available job identifier; mutates variable `job_id`
increment_job_id() {
job_id=$((job_id + 1))
}
Function Arguments
At the beginning of a function, assign the positional parameters in use to named variables.
Function Arguments
A function's behaviour can be unclear when positional parameters are used directly.
Function Arguments
print_log_entry() {
timestamp_in_s=$1
event_id=$2
message=$3
printf '%d: %d: %s\n' "$timestamp_in_s" "$event_id" "$message"
}
Function Arguments
# BAD - it's unclear what each field of the log entry is
print_log_entry() {
printf '%d: %d: %s\n' "$1" "$2" "$3"
}
Commands
Builtin vs. External Commands
Given a choice between invoking a shell builtin and invoking a separate process, choose the builtin.
Builtin vs. External Commands
Shell builtins don't spawn a new process, so any overhead due to process creation is avoided.
Command Options
When executing a command that supports options, use long options whenever possible.
Command Options
Short options are great when using the command line interactively but not when writing shell scripts. Short options are not descriptive, and when uncommon options are used, they can be confusing to readers.
Long options are self-descriptive and make it easier to understand what the program is doing.
Not all commands support long options though. If you are using a command that does not support long options, you should use short options instead. When using short options, you should use the combining syntax whenever possible, and extra comments should be included to explain what the options do.
Command Options
jq --compact-output --slurp
# non-standard command so use long options instead of the equivalent `-c -s`
grep -E
# common command, and common option, so using the short options is fine
cut -d":" -f"1,2" --only-delimited
# `--only-delimited` can be written as `-s` but is uncommon so use long option
# Don't read muttrc, operate in read-only mode, and list all mailboxes
mutt -nRy
# `mutt` does not support long options, so use short options instead
# use the combining syntax to avoid writing `-n -R -y`
Command Options
curl -s -o /dev/null -X 'GET' "https://cgi.cse.unsw.edu.au/~cs2041/current/index.html"
# `curl` supports long options; they should be used instead
nc -w "${TIMEOUT}" -N "${HOST_IP}" "${TCP_PORT}"
# `nc` does not support long options, and a comment describing the short options is absent
Pipelines
If a pipeline is shorter than 80 characters, it should remain on a single line.
If a pipeline is longer than 80 characters, it must be broken across multiple lines. Only a single command should be on each line.
Pipelines
Long pipelines can be difficult to understand; breaking a pipeline up helps improve readability.
When splitting a pipeline across multiple lines, a consistent style must be used.
Pipelines
# This pipeline is less than 80 characters, so it should remain on one line
grep -E '^COMP(2041|9044)' enrollments.txt | grep -E 'F$' | wc -l
# The pipelines below are greater than 80 characters, so they should be
# broken across multiple lines, using a consistent style
grep -E 'COMP(2041|9044)' enrollments.txt |
cut -d'|' -f4 |
cut -d/ -f1 |
sort |
uniq -c |
sort -nr
# This is also an acceptable style
grep -E 'COMP(2041|9044)' enrollments.txt \
| cut -d'|' -f4 \
| cut -d/ -f1 \
| sort \
| uniq -c \
| sort -nr
Pipelines
grep -E 'COMP(2041|9044)' enrollments.txt | cut -d'|' -f4 | cut -d/ -f1 | sort | uniq -c | sort -nr
# Pipeline is too long and should be broken across multiple lines
grep -E 'COMP(2041|9044)' enrollments.txt | cut -d'|' | cut -d/ -f1 |
sort \
| uniq -c \
| sort -nr
# Not all commands of this long pipeline are on separate lines, and the style used is inconsistent
Checking Return Values
The return values of commands should be examined after they are executed.
Checking Return Values
Commands can fail, so it is important to handle error cases or unexpected behaviour may occur.
Use
$?or check directly via anifstatement.
Checking Return Values
# Check directly with an if statement
if ! mv "${file}" "${dest_dir}/"
then
echo "Unable to move ${file} to ${dest_dir}" >&2
exit 1
fi
# Alternatively, check the exit status in $?
mv "${file}" "${dest_dir}/"
if [ $? != 0 ]; then
echo "Unable to move ${file} to ${dest_dir}" >&2
exit 1
fi
Checking Return Values
# BAD - If cp fails the file may be lost
cp "${file}" "${dest_dir}/"
rm "${file}"
Command Substitution
Use
$(command), rather than`command`when performing command substitution.
Command Substitution
The
`command`syntax is problematic if nesting is required.
Static Verification & Correctness
ShellCheck
ShellCheck should be used to statically analyse shell scripts.
ShellCheck
Syntax errors reported by shell interpreters are sometimes cryptic and uninformative. Static analysis can help identify a wide range of common issues before a script is executed, which can help ensure correctness.
ShellCheck identifies issues, reports their locations, and provides recommendations for appropriate fixes. Each issue is identified by a specific code, which can be referred to for further information on the ShellCheck wiki.
Occasionally, ShellCheck can produce warnings that are false positives. For example, consider SC2086, which recommends quoting variables to avoid globbing and word splitting. There are occasions where word splitting is desired; passing several command arguments via one variable is one such case.
A warning can be ignored by adding a disable comment. To disable SC2046, you would add
# shellcheck disable=SC2046before the line that triggers the warning.
Avoid These Shell Features
eval
Do not use
eval.
eval
When (mis)used with untrusted data,
evalcan provide a vector for code injection attacks.Assessment activities in this course can be completed without
eval.