COMP(2041|9044) Shell Style Guide

Contents

Shell Language & Shell Scripts Recommended Code Layout Variables Functions Commands Static Verification & Correctness Avoid These Shell Features

Shell Language & Shell Scripts

Language Features

Shell scripts must be written in POSIX-compliant shell syntax.

The POSIX definition of the "Shell Command Language" can be found in IEEE Std 1003.1-2024.

For simplicity, however, we consider shell features that are available in the dash(1) shell to be POSIX-compliant. In other words, if your shell script executes correctly with dash, you can assume it is POSIX-compliant.

Your shell scripts cannot contain shell features from bash(1), zsh(1), or other shells.

Language Features

The goal of this course is to teach you how to write shell scripts that are portable. We want your shell scripts to be compatible on as many platforms as possible with minimal to no modification. To achieve this, your shell scripts should be written in the subset of shell that is common to all shell languages; this is what POSIX describes.

Other shell languages like bash and zsh implement extensions on top of POSIX. Writing shell scripts that do not rely on these extensions allows scripts to be used in both bash and zsh, as well as other shells.

File Extensions

Shell scripts must have a .sh file extension, unless an activity specifies otherwise.

An activity may specify to use no file extension.

File Extensions

Tools and editors may rely on the file extension to detect a source file's language.

If a script is made available on a user's PATH, the extension may be omitted so that the script can be executed like any other command.

File Extensions

    test.sh     <- correct file extension
    autotest    <- correct, assuming the activity specified no extension
    
File Extensions

    test.shell  <- incorrect file extension
    build       <- incorrect, assuming the activity specified to include an extension
    
Hashbang

Shell scripts must start with a hashbang line.

The hashbang line must specify dash(1) as the shell interpreter.

Hashbang

The hashbang line (also called a shebang line) is used to specify the shell interpreter to use.
A hashbang line starts with #! and contains the path to a shell interpreter.

As we are using the dash shell, we need to specify the path to the dash interpreter. The hashbang can directly specify the path to the dash interpreter, or it can specify the path to the env(1) utility, which will then run the dash interpreter. Using the env utility is recommended.

Specifying sh(1) as the interpreter might run dash(1) or bash(1), depending on the platform, and thus is not allowed.

The hashbang characters #! must be the first two characters in the file.

The hashbang is not valid if there is whitespace before the #! characters.

The hashbang is valid if there is whitespace after the #! characters and before the path.
Including a single space after the hashbang is recommended as it is easier to read the path.

Hashbang

#! /usr/bin/env dash	<- recommended

#!/bin/dash         	<- also acceptable
Hashbang

#! /usr/bin/env bash	<- not allowed, as it specifies the bash interpreter

#!/bin/sh           	<- not allowed, as it is ambiguous which shell will be used
Permissions

Shell scripts must have read and execute permissions enabled for all users.

Permissions

In order to execute a shell script with ./file.sh, the file must have read and execute permissions enabled.

To enable these permissions, use the chmod command:

        chmod a+rx file.sh
    

or

        chmod 755 file.sh
    

Without the execute permission, the file can still be run with

        dash file.sh
    

The read permission is still required though.

Permissions

    $ ls -l isPrime.sh
    -rwxr-xr-x 1 cs2041 cs2041 2.2K May 30 02:37 isPrime.sh	<- correct permissions
    
Permissions

    $ ls -l isEven.sh isOdd.sh
    -rw-r--r-- 1 cs2041 cs2041 2.2K May 30 02:37 isEven.sh 	<- missing execute permission
    -rwxr-x--- 1 cs2041 cs2041 2.2K May 30 02:37 isOdd.sh	<- other has no permissions
    

Recommended Code Layout

Header Comment

All programs must start with a header comment.

Header Comment

The header comment must be at the top of the file, immediately after the hashbang line, with a single empty line between the hashbang line and the header comment.

There must be a single empty line between the header comment and the first line of your program.

The header comment should contain at least:

  • the name of the activity
  • the name of the file
  • the name, zID, and email address of the author
  • the date it was written
  • a description of what the program does.

The preferred style for a header comment is:

        # ACTIVITY-NAME-HERE
        # FILE-NAME-HERE
        #
        # This program was written by YOUR-NAME-HERE (YOUR-ZID-HERE) <YOUR-EMAIL-HERE>
        # on DATE-HERE
        #
        # DESCRIPTION-HERE
    

If the program is being maintained over a long period of time, the header comment should also contain a changelog.

        # ACTIVITY-NAME-HERE
        # FILE-NAME-HERE
        #
        # DESCRIPTION-HERE
        #
        # DATE-HERE
        # YOUR-NAME-HERE (YOUR-ZID-HERE) <YOUR-EMAIL-HERE>
        # - CHANGE-HERE
        #
        # DATE-HERE
        # YOUR-NAME-HERE (YOUR-ZID-HERE) <YOUR-EMAIL-HERE>
        # - CHANGE-HERE
    
Header Comment

        #! /usr/bin/env dash

        # COMP2041/9044 Lab06 - A Shell Script that Prints Itself
        # quine.sh
        #
        # This program was written by Dylan Brotherston (z5115658) <d.brotherston@unsw.edu.au>
        # on May 30, 2022
        #
        # This program is a quine.
        # A program prints its own source code (minus comments).

        b=\' c=\\ a='echo b=$c$b c=$c$c a=$b$a$b; echo $a'
        echo b=$c$b c=$c$c a=$b$a$b; echo $a
    

        #! /usr/bin/env dash

        # COMP2041/9044 Assignment 1 - My Big Shell Project
        # meaning_of_life.sh
        #
        # This program calculates and prints the meaning of life.
        #
        # 2024-02-05
        # Dylan Brotherston (z5115658) <d.brotherston@unsw.edu.au>
        # - Update the meaning of life to 42
        #
        # 2023-09-17
        # Dylan Brotherston (z5115658) <d.brotherston@unsw.edu.au>
        # - Initial version

        echo 42
    
Header Comment

    #!/bin/dash
    is_prime() {
        local n i
        n=$1
        i=2
        while test $i -lt $n
        do
            test $((n % i)) -eq 0 &&
                return 1
            i=$((i + 1))
        done
        return 0
    }

    i=0
    while test $i -lt 1000
    do
        is_prime $i &&
            echo $i
        i=$((i + 1))
    done
    
Implementation Comments

Comments must be included when code may be unclear to the reader.

Comments should be on the line before the code they are describing, rather than on the same line.

Comments should be indented to the same level as the code they are describing.

Comments must be written in English.

Implementation Comments

Where your code may be unclear to a reader, or require further explanation, you should leave brief comments describing the purpose of the code you have written. Make sure that comments describe why your code is doing what it does and not just stating what the code does.

Comments that are not in English cannot be assessed; they're equivalent to no comments at all.

Implementation Comments

    #! /usr/bin/env dash

    # COMP2041/9044 Lab06 - Hello World Again
    # hello.sh
    #
    # This program was written by Dylan Brotherston (z5115658) 
    # on May 30, 2022
    #
    # This program simply says hello.

    # for each name in the command line
    for name in $@; do
        # Use grep to match a name that looks like a zID
        if grep -E 'z[0-9]{7}'; then
            # If the name is a zID, then use `acc` to find the corresponding name
            # This allows us to say hello with the person's full name even if they only give use a zID
            # This means that even on a CSE server `./hello.sh $(whoami)` will work
            echo "Hello, $(acc format='$NAME')"
        else
            # If the name is not a zID, then just say hello to the name
            echo "Hello, $name!"
        fi
    done
    
Control Flow – if, while, until, for

Control flow structures must be formatted in a consistent style.

Control Flow – if, while, until, for

One, and only one, style should be used to format control flow statements.

The style demonstrated in lectures, where the opening block-delimiting keyword is on the line following the control flow keyword, is the recommended style. The then and do must be indented to the same level as their respective if, while, until, or for.

        if [ "$x" -gt 0 ]
        then
            echo "true"
        fi

        while [ "$x" -lt "$y" ]
        do
            x=$((x + 1))
        done

        for arg in "$@"
        do
            echo "$arg"
        done
    

The style presented in Google's guide, where the opening block-delimiting keyword is on the same line as the control flow keyword, is also acceptable. The keywords then and do should be on the same line as their respective if, while, until, or for, and they should be separated from the condition by a single semicolon and a single space.

        if [ "$x" -gt 0 ]; then
            echo "x is positive"
        fi

        while [ "$x" -lt "$y" ]; do
            x=$((x + 1))
        done

        for arg in "$@"; do
            echo "$arg"
        done
    

Across all styles, the fi, elif, done, and else control flow keywords must be indented to the same level as their corresponding if, while, until, or for keyword.

Control Flow – if, while, until, for

    if ! [ $a -gt $b ]
    then
        echo "a is not greater than b"
    elif ! [ $a -lt $b ]
    then
        echo "a not is less than b"
    else
        echo "a is equal to b"
    fi

    if [ $a -gt $b ]; then
        echo "a is greater than b"
    elif [ $a -lt $b ]; then
        echo "a is less than b"
    else
        echo "a is equal to b"
    fi

    while [ $a -lt $b ]; do
        isPrime $a
        a=$((a-1))
    done

    until [ $a -ge $b ]; do
        isPrime $a
        a=$((a-1))
    done

    for file in $files; do
        rm "$files"
    done
    
Control Flow – if, while, until, for

    # Too many spaces after the semicolon
    while [ $a -lt $b ];       do
        isPrime $a
        a=$((a-1))
    done

    # `do` and `done` should be indented in line with `until`
    until [ $a -ge $b ]
        do
        isPrime $a
        a=$((a-1))
        done

    # Too many empty lines before the `do`
    for file in $files

    do
        rm "$files"
    done
    
Control Flow – case

The body and alternatives of case statements must be formatted in a consistent style.

Control Flow – case

A case statement is used to select a single case from a list of cases.

The style presented in the lectures places alternatives at the same level as the case keywords and then indents the bodies for each alternative.

    case word in
    pattern1)
        commands1
        ;;
    pattern2)
        commands2
        ;;
    esac
    

If both the alternatives and the bodies are short, a single line format can also be used.

    case $# in
    0) echo "You forgot to supply the argument" ;;
    1) filename=$1 ;;
    *) echo "You supplied too many arguments" ;;
    esac
    

In Google's guide, the alternatives are indented by one level, and then the bodies of the alternatives are indented one level further. A similar style is used for single-line alternatives and bodies.

    case "${expression}" in
        a)
            variable="…"
            some_command "${variable}" "${other_expr}"
            ;;
        absolute)
            actions="relative"
            another_command "${actions}" "${other_expr}"
            ;;
        *)
            error "Unexpected expression '${expression}'"
            ;;
    esac

    case "${flag}" in
        a) aflag='true' ;;
        b) bflag='true' ;;
        f) files="${OPTARG}" ;;
        v) verbose='true' ;;
        *) error "Unexpected option ${flag}" ;;
    esac
    

One style should be used consistently.

Control Flow – case

    case $command in
    login)
        token="$(login_user "$user" "$password")"
        if [ -z "$token" ]; then
            echo "Login failed"
            exit 1
        else
            echo "Login successful"
        fi
        ;;
    logout)
        logout_user "$token"
        ;;
    stats)
        cat /proc/self/stat
        cat /proc/self/status
        cat /proc/self/sched
        cat /proc/self/schedstat
        cat /proc/self/limits
        cat /proc/self/io
        ;;
    cmd)
        cat /proc/self/cmdline
        cat /proc/self/environ
        ;;
    *)
        echo "Unknown command: $command" >&2
        exit 1
        ;;
    esac

    case $level in
    0)     echo "Tutorial level" ;;
    [1-9]) echo "Level $level"   ;;
    10)    echo "Last level"     ;;
    *)
        echo "Unknown level: $level" >&2
        exit 1
        ;;
    esac
    
Control Flow – Indentation

The body of control flow statements must be indented.

Control Flow – Indentation

The body of while loops, until loops, for loops, and if statements must be indented. Shell doesn't use braces for control flow like C. Instead, shell uses matching keywords to indicate the start and end of control flow. Everything between the start and end keyword must be indented.

Between any pair of keywords, the indentation level should increase by one level. Acceptable indentations for one level are 4 spaces, 2 spaces or 1 tab. The chosen indentation should be used consistently throughout the code. Always indent by one level, and always use the same number of spaces or tabs to represent one level. Never use a mixture of spaces and tabs.

Control Flow – Indentation

    while [ "$i" -lt 10 ]
    do
        echo "$i"
        i=$((i+1))
        n=$((n+i))
    done

    for arg in $@
    do
        echo "$arg"
    done

    if [ ! -e "$file_name" ]
    then
        echo "File does not exist"
    fi
    
Control Flow – Indentation

    # inconsistent indentation
    while [ "$i" -lt 10 ]
    do
        echo "$i"
      i=$((i+1))
            n=$((n+i))
    done

    # no indentation
    for arg in $@
    do
    echo "$arg"
    done

    # way too much indentation
    if [ ! -e "$file_name" ]
    then
                   echo "File does not exist"
    fi

    # mixing tabs and spaces
    if [ "$1" = "$2" ]
    then
        echo "arg[1] and arg[2] are equal"
    	echo "that's a little redundant don't you think?"
    fi
    
Nesting Depth

Avoid overly deep nesting.

Nesting Depth

While strict nesting depth requirements aren't specified in this course, you should consider whether you can simplify your approach once you reach 5 levels of nesting.

Spaces

Use a space after keywords such as

        if, while, for, return
    

Use a space on each side of binary arithmetic operators such as

        *  /  %  +  -  <  >  <=  >=  ==  !=
    

Do not include a space after prefix unary arithmetic operators such as

        +  -  ~  !
    

Use a space on each side of I/O redirection constructs such as

        <  >  >>  2> 2>>
    
Spaces

    if [ $((a + b > c)) ]
    then
        i=$((i + 1))
        echo "$i" >> "$log_file"
    fi
    
Spaces

    if   [ $((a+b>c)) ]             <- too many spaces after language keyword
    then
        i=$((i+1))                  <- no spacing around binary operator
        echo "$i">>"$log_file"      <- no spacing around redirection construct
    fi
    
Vertical Whitespace

Use vertical whitespace (blank lines between code) occasionally to indicate where the related parts of your code are.

Using no blank lines between lines of code indicates the lines should be understood together, or are closely related.

Using exactly one blank line indicates two sections of code are distinct – they split your code into "paragraphs".

Using exactly two blank line indicates two sections of code are entirely unrelated.

Using more than two blank lines is (almost) never appropriate.

Vertical Whitespace

Vertical whitespace is like chocolate – some of it significantly improves your life, but too much can also be its own problem; and when used in the wrong place, it can be very confusing.

Vertical Whitespace


    # The two variables here are closely related, and should not be separated.
    x_position=0
    y_position=0

    # The two if statements here are separate ideas and could be separated.
    if [ "$x_position" -gt 0 ]
    then
        echo "X position is positive."
    fi

    if [ "$instruction" -eq "$INCREASE_X_POSITION" ]
    then
        x_position=$((x_position + 1))
    fi


    # We have used two empty lines here to indicate the the topic has changed.
    echo "Please choose an item to use..."

    
Statements

Only one executable statement should be used per line of code.

Statements

When a single line contains multiple statements, it can be difficult to read.

Statements

    if test $# = 3
    then
        url=$1      <- each statement is on a separate line
        regexp=$2
        email_address=$3
    else
        echo "Usage: $0 <url> <regex> <email-address>" 1>&2
        exit 1
    fi
    
Statements

    if test $# = 3
    then
        url=$1; regexp=$2; email_address=$3     <- multiple statements on a single line
    else
        echo "Usage: $0 <url> <regex> <email-address>" 1>&2
        exit 1
    fi
    
Line Width

Keep lines under 80 characters.

Line Width

Break long lines up to keep them under 80 characters, unless keeping it as a longer line is significantly more readable.

At 120 characters, serious effort should be made to split or shorten the line.

Variables

Variable Names

Descriptive variable names should always be used where possible.

Short variable names such as x, i, j or k are acceptable if there is no appropriate long descriptive name.
This is often the case for variables used as loop counters.

Variable names must begin with a lowercase letter.

Multi-word variable names should be in snake_case.

Constants should be in SHOUTING_SNAKE_CASE.

Declaring Variables

Declare variables close to where they are first used.

Do not use the same variable name for multiple purposes.

Declaring Variables

Having a variable close to the place it is first used makes it easy to find and understand.

If you need a new variable, use a different name – reusing variable names can lead to subtle and hard to find bugs.

Functions

Function Purpose

Functions should have one clearly defined purpose.

Function Names

Function names should be descriptive, typically containing multiple words, and formatted in snake_case.

Function Comments

Every function, which is not short and obvious, must have a comment describing its purpose and any side-effects the function has.

The comment should be placed above the function's implementation.

Function Comments

    # Updates next available job identifier; mutates variable `job_id`
    increment_job_id() {
        job_id=$((job_id + 1))
    }
    
Function Arguments

At the beginning of a function, assign the positional parameters in use to named variables.

Function Arguments

A function's behaviour can be unclear when positional parameters are used directly.

Function Arguments

    print_log_entry() {
        timestamp_in_s=$1
        event_id=$2
        message=$3
        printf '%d: %d: %s\n' "$timestamp_in_s" "$event_id" "$message"
    }
    
Function Arguments

    # BAD - it's unclear what each field of the log entry is

    print_log_entry() {
        printf '%d: %d: %s\n' "$1" "$2" "$3"
    }
    

Commands

Builtin vs. External Commands

Given a choice between invoking a shell builtin and invoking a separate process, choose the builtin.

Builtin vs. External Commands

Shell builtins don't spawn a new process, so any overhead due to process creation is avoided.

Command Options

When executing a command that supports options, use long options whenever possible.

Command Options

Short options are great when using the command line interactively but not when writing shell scripts. Short options are not descriptive, and when uncommon options are used, they can be confusing to readers.

Long options are self-descriptive and make it easier to understand what the program is doing.

Not all commands support long options though. If you are using a command that does not support long options, you should use short options instead. When using short options, you should use the combining syntax whenever possible, and extra comments should be included to explain what the options do.

Command Options

    jq --compact-output --slurp
    # non-standard command so use long options instead of the equivalent `-c -s`

    grep -E
    # common command, and common option, so using the short options is fine

    cut -d":" -f"1,2" --only-delimited
    # `--only-delimited` can be written as `-s` but is uncommon so use long option

    # Don't read muttrc, operate in read-only mode, and list all mailboxes
    mutt -nRy
    # `mutt` does not support long options, so use short options instead
    # use the combining syntax to avoid writing `-n -R -y`
    
Command Options

    curl -s -o /dev/null -X 'GET' "https://cgi.cse.unsw.edu.au/~cs2041/current/index.html"
    # `curl` supports long options; they should be used instead

    nc -w "${TIMEOUT}" -N "${HOST_IP}" "${TCP_PORT}"
    # `nc` does not support long options, and a comment describing the short options is absent
    
Pipelines

If a pipeline is shorter than 80 characters, it should remain on a single line.

If a pipeline is longer than 80 characters, it must be broken across multiple lines.
Only a single command should be on each line.

Pipelines

Long pipelines can be difficult to understand; breaking a pipeline up helps improve readability.

When splitting a pipeline across multiple lines, a consistent style must be used.

Pipelines

    # This pipeline is less than 80 characters, so it should remain on one line
    grep -E '^COMP(2041|9044)' enrollments.txt | grep -E 'F$' | wc -l

    # The pipelines below are greater than 80 characters, so they should be
    # broken across multiple lines, using a consistent style
    grep -E 'COMP(2041|9044)' enrollments.txt |
    cut -d'|' -f4 |
    cut -d/ -f1 |
    sort |
    uniq -c |
    sort -nr

    # This is also an acceptable style
    grep -E 'COMP(2041|9044)' enrollments.txt \
      | cut -d'|' -f4 \
      | cut -d/ -f1 \
      | sort \
      | uniq -c \
      | sort -nr
    
Pipelines

    grep -E 'COMP(2041|9044)' enrollments.txt | cut -d'|' -f4 | cut -d/ -f1 | sort | uniq -c | sort -nr
    # Pipeline is too long and should be broken across multiple lines

    grep -E 'COMP(2041|9044)' enrollments.txt | cut -d'|' | cut -d/ -f1 |
    sort \
      | uniq -c \
      | sort -nr
    # Not all commands of this long pipeline are on separate lines, and the style used is inconsistent
    
Checking Return Values

The return values of commands should be examined after they are executed.

Checking Return Values

Commands can fail, so it is important to handle error cases or unexpected behaviour may occur.

Use $? or check directly via an if statement.

Checking Return Values

    # Check directly with an if statement
    if ! mv "${file}" "${dest_dir}/"
    then
        echo "Unable to move ${file} to ${dest_dir}" >&2
        exit 1
    fi

    # Alternatively, check the exit status in $?
    mv "${file}" "${dest_dir}/"
    if [ $? != 0 ]; then
        echo "Unable to move ${file} to ${dest_dir}" >&2
        exit 1
    fi
    
Checking Return Values

    # BAD - If cp fails the file may be lost
    cp "${file}" "${dest_dir}/"
    rm "${file}"
    
Command Substitution

Use $(command), rather than `command` when performing command substitution.

Command Substitution

The `command` syntax is problematic if nesting is required.

Static Verification & Correctness

ShellCheck

ShellCheck should be used to statically analyse shell scripts.

ShellCheck

Syntax errors reported by shell interpreters are sometimes cryptic and uninformative. Static analysis can help identify a wide range of common issues before a script is executed, which can help ensure correctness.

ShellCheck identifies issues, reports their locations, and provides recommendations for appropriate fixes. Each issue is identified by a specific code, which can be referred to for further information on the ShellCheck wiki.

Occasionally, ShellCheck can produce warnings that are false positives. For example, consider SC2086, which recommends quoting variables to avoid globbing and word splitting. There are occasions where word splitting is desired; passing several command arguments via one variable is one such case.

A warning can be ignored by adding a disable comment. To disable SC2046, you would add

        # shellcheck disable=SC2046
        

before the line that triggers the warning.

Avoid These Shell Features

eval

Do not use eval.

eval

When (mis)used with untrusted data, eval can provide a vector for code injection attacks.

Assessment activities in this course can be completed without eval.