Advice For Readable Source code

Programs that you write will generally consist of a collection of text files. These files have two distinct purposes. The first is to be read by computers, in order to instruct the computer to perform the desired task. Secondly, they are to be read by humans. In the minimal sense, they are at least read by the author. Throughout your degree your programs will be read by human markers. In industry and research they are read by colleagues, employers or even customers. To write good code you must be aware of both audiences.

Note also that, with the exception of single-use programs (e.g. a script to transform some data to a new format) and some junk programs written by slack programmers - written so badly that they cannot be read or effectively used - the original time spent implementing a program is estimated at one-third of the total time spent on the program, as a good program is likely to be adapted and modified as time goes on, and the needs of the company (or individual) change. Extra features may be added; bugs may be found and fixed. The person modifying the program might be you or it might be your colleagues or it might be some programmer who in the future may work for you. Any of these people will think less of you if they can't figure out what the heck your code is doing, because you didn't comment it right, or didn't write it right. If you're modifying your own code, you may be surprised to find how hard it is to follow (or remember how to use it) if you didn't comment it properly when you first wrote it.

Your compiler or interpreter will soon enough inform you when a file in not readable by the computer (e.g., syntax errors). You have four main techniques available to you to make your work human readable: formatting, comments, structure and choice of identifiers.

Formatting refers to how the text looks on the screen (or page). It includes such aspects as indentation, alignment, the use of capital and lower case, underscores, spaces, blank lines. When reading a program, the visual cues provided by formatting can emphasise the conceptual structure of the work.

Comments are purely for the human readers. They are the notes within the program which are ignored by the computer (which most computer languages allow). Comments should concentrate on clarifying and identifying your work. The should not be verbose, but succinct and clear. They should aid understanding.

Structure refers to how the computing task is divided into sub-tasks. Given a particular task there are may ways it might be carved up, and many of the different decompositions will be similar in their computational performance. Unless there are overwhelming efficiency reasons, the structure of your program should be chosen to make is easy to understand by another human. Remember that most compilers include powerful optimisers so there is no need to write terse code purely in the belief that it will run faster.

Identifiers are the labels and names which the author chooses in the program. This includes things such as variable and procedure names. Most programming languages give the author much scope for choice here. That choice should be used wisely. Choose identifiers that are easily readable and recognisable. The names of identifiers should instantly suggest some meaning, and the suggested meaning should be appropriate to the task for which the named construct is used.

The following gives specific advice and examples for:

Head Comments
Unit Comments
Explanatory Comments
Indentation
White Space
Capitalisation and Underscores
Identifiers
Structure

Head Comments

Also called "header comments". Each file should have a head comment which appears prominently at the start of the file. The head comment provides important identification information and gives an overview of the entire file. Typically it consists of 5 to 25 lines. The head comment should include at least the following:

The name of the program or module which the file is for
Name of the originating author
Original date of creation
A short description explaining what the file is designed to achieve
Instructions for compilation and use.
The instructions for compilation might just refer to a Makefile or similar, or might be explicit instructions for one or more particular platforms.
The instructions for use could be brief instructions for running the program from a command line, or might refer to the user instruction manual, giving an on-line location for it.

Note that examples of code shown in class might abbreviate or omit header comments in order to fit them onto the screen in the lecture theatre. However, they can be vital for somebody (marker/maintainer/modifier/manager) coming to your code "cold", particularly if you are not available to explain (or if months/years have passed, and you don't remember :-( ).

Additional things which may be found in the head comment are:

Where the source code can be found if this is not obvious (in case somebody is working from a printout of your code).
A short description explaining the overall technique implemented in the file (i.e. how it works)
A history of changes made to the work
Known bugs/deficiencies/limitations
Notices of copyright or other assertions of ownership

An example (in C):

/***********************************************************************
* PPM
*
* A module for loading PPM files into canvases and saving canvases
* as PPM files.
*
* Author: Karl Koder
* E-mail: karlk@...
* Date: 29-Jan-2007
*
* Instructions for use: ...
*
* Modification History: ...
*
* Known bugs and deficiencies: ...
*
***********************************************************************/

Unit Comments

Most computer languages allow the programmer to divide the computational task up into sub-tasks, such as procedures, predicates, methods, functions, etc. Each unit should have its own comment which is clearly identifiable at the start of the unit. Each unit comment is typically 1 to 5 lines and should include:

The name of the unit (procedure, predicate, method, function, etc.)
A short description explaining what the unit is designed to achieve
A short description of how it works
Any assumptions made by the unit which are expected to hold for it to work

You should also consider a one word comment at the end of the procedure. This is particularly useful when the units are longer than a screen, and when scrolling up.

An example (in C):

/* -----------------------------------------------------------------
** ToLower
**
** Convert all characters in the string to lower case.
** Assumes 'string' is terminated with the NULL character.
*/
void ToLower (char* string)
{
    while (*string) {
        *string = tolower (*string);
        string++;
    }
} /* ToLower */

Explanatory Comments

Explanatory comments are those comments which are inserted into the code to explain difficult or potentially ambiguous parts of the program. These generally are at the end of a line (a short note explaining one line of code) or precede a section of code.

When you have multiple end-of-line comments in a row, it is a good idea to line them up which makes the comments clearly visible

Remember that the humans reading your program are very likely to be programmers who are experienced in the language you are using. Therefore your comments should concentrate on explaining your work. They should not be verbose, but succinct and clear. They should not merely repeat the code, but should offer explanation and clarification.

An example (in Prolog):

% euler (X) report the euler number for the image
%
euler (X) :-
    count_regions (1, NumBlack),   % count black regions
    count_regions (0, NumWhite),   % count white regions
    Holes is NumWhite - 1,         % subtract 1 for the image border
    X is NumBlack - Holes.         % compute the genus

An example (in C):

    
    /*
    ** Skip leading white space
    */
    c = fgetc (inputFile);
    while (isspace(c)) {
        c = fgetc (inputFile);
    }

    /*
    ** Count the frame and write any pending files
    */
    gFrameCount++;
    Writer_DoWrites ();

Try not to get to fancy with your art-work with the comment characters. Remember that the comments are there to help, not to distract. Arguably,

/* Skip leading white space */

would be as good as or better than the three line comment above.

Also, you should normally limit your comments to what the likely readers of your code are likely to find helpful. [Programming language instructors might thus put in comments for their special audience that would be unnecessary and annoying for experienced programmers.] The example below (in Prolog)

Example of bad commenting (every line commented):

factorial(0, 1).                     % Factorial of 0 is 1.
factorial(N, FactN) :-
    N > 0,                           % N is positive
    Nminus1 is N - 1,                % Calculate N minus 1
    factorial(Nminus1, FactNminus1), % recursion
    FactN is N * FactNminus1.        % N! = N * (N - 1)!

Of these comments, only the first and last are even faintly justifiable, and even these are probably too obvious, as, particularly for the last comment, the variable names tell you everything you need to know. It looks pretty, with all the %-signs neatly lined up, but it forces the reader to check through all the unnecessary comments in case there is anything important there. This code needs a unit comment, and probably nothing else, though a beginning Prolog programmer might be justified in pointing out that the line N > 0, is there to make sure that the rule is only used in the case where N is positive. (If you're not sure why this is so important, try leaving out N > 0, and test the function on say N = 2.) So the comment would be "% only use rule if N > 0. This style of commenting is probably a legacy of assembly language programming, where many lines do need a comment, and it is traditional and in some cases mandatory to line up the comments vertically, as in the example above. Even in this case, however, it is usually only necessary to have a comment every few lines, unless the coder is doing something really tricky (and which is probably going to come back and bite them on the bum somewhere down the track).

How to write even worse comments!

It's easy - just write comments that are actually wrong. sad face
Review your comments in a cool moment, and make sure that what they say is true.

Indentation

You should use indentation to emphasise the conceptual structure of your code. It is preferable to use spaces rather than tabs as different text file readers may have tab stops set at different columns. A good indentation size is four spaces. Eight spaces is far too much. An indentation of two spaces will not provide a good visual cue.

It is important that indentation is consistent throughout the file and that indents are aligned.

An example (in C):

/*
** Canvas_AddNoise
**
** Add Gaussian noise to the image in the canvas
*/
void Canvas_AddNoise (Canvas* canvas, double noise)
{
    int    i;
    double val;
    
    for (i = 0; i < canvas->numPixels; i++) {
        
        val = canvas->pixels [i] + NormalRandomDouble (0.0, noise);
        
        /* check the bounds of the pixel value  */
        if (val <= 0.0) {
            canvas->pixels [i] = 0;
        } else if (val >= 255.0) {
            canvas->pixels [i] = 255;
        } else {
            canvas->pixels [i] = val;
        }
    }
} /* Canvas_AddNoise */

White Space

White space is the extra spaces and blank lines introduced into the file to improve readability. Blank lines should exists between units (procedures, predicates, etc.) in the file. Blank lines can be included within a unit to emphasise blocks of processing. Spaces should be included around punctuation marks as per normal written English.

An example (in C):

	/* accumulate sums */
	
	cd->a = 0.0;
	cd->b = 0.0;
	cd->c = 0.0;
	cd->d = 0.0;
	cd->e = 0.0;
	cd->f = 0.0;
	
	for (y = 0; y < cd->height; y++) {
		dy = y * 2;
		for (x = 0; x < cd->width; x++) {
			dx  = x * 2;
			
			cd->a += geometry->x * dx;
			cd->b += geometry->x * dy;
			cd->c += geometry->x;
			cd->d += geometry->y * dx;
			cd->e += geometry->y * dy;
			cd->f += geometry->y;
			
			geometry++;
		}
	}
	
	/* compute parameters */
	
	cd->a /= cd->s_xx;
	cd->d /= cd->s_xx;
	
	cd->b = (cd->b - cd->a * cd->s_xy) / cd->m3;
	cd->e = (cd->e - cd->d * cd->s_xy) / cd->m3;
	
	cd->c = ((cd->c - cd->a * cd->s_x) - cd->b * cd->m4) / cd->m6;
	cd->f = ((cd->f - cd->d * cd->s_x) - cd->e * cd->m4) / cd->m6;

	cd->b -= cd->c * cd->m5;
	cd->e -= cd->f * cd->m5;
	
	cd->a = (cd->a - cd->c * cd->m2) - cd->b * cd->m1;
	cd->d = (cd->d - cd->f * cd->m2) - cd->e * cd->m1;

Note that the identifiers in this code aren't very meaningful - see below.

Capitalisation and Underscores

Most programming languages allow you to use a mixture of upper and lower case. This flexibility can be used to improve the readability of the program. Avoid using all capital case (e.g., COUNT). Most of the type we read in our everyday lives is in lower case and this is generally read more quickly and easily

You can use capital letters at the start of words to distinguish the individual words in multi-word identifiers (e.g., FindTransform). Word demarcation may also be done using underscores (e.g., find_transform). It is common to reserve identifiers that start with a capital letter for procedure or type names, and reserve identifiers that start with lower case for variables and parameters.

Whatever capitalisation and underscore style you use - be consistent.

Identifiers

Most programming languages give the author much scope for choice of identifiers for procedures, variables, parameters, etc. When choosing identifiers consider the following points:

identifiers should be mnemonic, i.e. easily and quickly suggesting their intended purpose
not too long, greater than 20 characters might warrant rethinking a better description
do not contract a long phrase into an unrecognisable identifier (e.g., "var_hldng_cur_cnt", variable for holding the current count)

It is not uncommon to include within the identifier some naming convention. For example, all global variables starting with "g", or all constants starting with "k", or all private variables starting with "p". Only do this if it enhances readability.

Whatever style you use for identifiers - be consistent.

Structure

Most computer languages allow the programmer to divide the computational task up into sub-tasks, such as procedures, predicates, methods, functions, etc. The structure of a program refers to how tasks are divided up into sub-tasks.

Most compilers include powerful optimisers so there is no need to write terse code in the belief that it will run faster. Unless there are overwhelming efficiency reasons, the structure of your program should be chosen to make it easily understood by another person.

When designing a structure for your program consider the following points:

units should be loosely coupled to other units, i.e., there should be only a small amount of information that passes between modules (e.g., in the form of parameters or global variables)
the parts within a unit should be tightly coupled, i.e., all the functionality within a unit should be closely related together. Don't put two separate pieces of functionality into the same procedure.
put regularly used functionality into its own unit
if you find yourself copying and pasting text, then perhaps this is a common piece of functionality that should be a separate procedure (or other unit)

Spelling

It is courteous to the reader to ensure that the spelling and grammar of your comments is correct. Use a spelling checker, even if you are a good speller, to catch the typographical errors that you didn't notice. The identifiers that aren't proper words may be a nuisance in the spelling checking process, but that's life.

Remember, too, that a spelling checker cannot be guaranteed to find all errors, where a typographical error converts a word into another word (my favorite example is where → whore) or you may be using the wrong word (e.g. complement versus compliment, principle vs principal.) In the first version of this paragraph, I typed maybe instead of may be, for example. So use a spelling checker, but don't feel you can rely on it.

A spelling checker will not fix your grammar. :-( Getting someone else to proof-read your material may help.

© Barry Drake (2001)
© modifications by Bill Wilson, 2006, 2008
2006 version
Last updated:
UNSW's CRICOS Provider No. is 00098G