Assignment 1: Paragraph Formatting

Due: Friday Week 8 at 8pm

Version 1.1

A few inconsistencies have been found between pifm, the spec, and the program's behaviour. These have been corrected, and 6721 pifm now validates the new behaviour.

>get_data had the wrong return signature. It is now correct.
fits should return a bool, not an int.
"lines contains all words...for all I" should have a lower-case I.
make_line blanks should just be an int.
Some clarifications around the right-aligned text were added (added in blue).
Some extra information about what can replace TODO (added in blue).

Version 1.1

Part 4's ensures function did not take into account the last line. This has now been fixed.

Version 1.2

The wording of three invariants was updated slightly to ensure consistency and correctness.

"word_lines[i] could not be any longer, for all i" is now "each element in word_lines could not be any longer",
"lines[i] contains all words from word_lines[i] for all i" is now "lines[i] contains all words from word_lines[i] for all 0 <= i < len(lines)",
"every element of lines starts with the right amount of spaces" is now "each element of lines starts with the right amount of spaces",

Version 1.3

The invariant lines[i] contains all words from word_lines[i] for all 0 <e; i <e; len(lines) has had a note added to explain this could be better worded as ... word_lines[j] for all 0 <e; j <e; len(lines) ~~(in other words, the variable lines[i] has a different i to the rest of the invariant.)~~

Clarified the question in this forum post.

Defined cost for a single-worded line.

Version 1.3.1

The previous edit to the lines[i] line still didn't make sense -- I've corrected it (for the final time). The intent is that words[j] contains everything in word_lines[j] -- in other words, that we haven't missed any words when coping from word_lines to lines.

Version 1.4

A short "Markers Notes" section was added to address some questions about marking.

Version 1.4.1

Notes added about costs: List[int]; and from sys import stderr added to fix LSPs complaining.

Introduction

LATEX typesets paragraphs so that they fill each line exactly, at both the left- and right-hand sides. It does that by putting just enough blank space between words to get the desired width overall. That’s the program we will be writing in this assignment, more or less — but our program will only approximate LATEX’s approach, because we will be using a fixed-width font.

As you will see, however — the assignment is not about how to typeset. It’s about how to write programs using assertions and invariants, both to help find the program and to increase the likelihood that it is correct. Typesetting just happens to be an intricate-enough problem to make it interesting.

The example below shows on the left how some text in a fixed-width font can be made into a paragraph exactly 37 columns wide (except for the last line). The “greedy” typesetting (at left) fills each line as much as possible before moving on to the next. Then enough extra space is added to each line except the last, between the words, so that every line finishes exactly at the right-hand margin.

In general, greedy algorithms take decisions based on the “current” situation, and do not look ahead to see whether a different decision now might lead to a better outcome later on. We can do better than greedy (as Knuth showed).

Below on the right, the same text is typeset but now using a “clever” algorithm –based on dynamic programming– that optimises (in this case, minimises) the average amount of extra space that must be added between words in each line.

This paper  discusses a  new approach
to the  problem of dividing  the text
of  a   paragraph   into   lines   of
approximately equal  length.  Instead
of simply  making decisions  one line
at a  time, the method  considers the
paragraph as  a  whole, so  that  the
final  appearance  of  a  given  line
might be  influenced by  the text  on
succeeding lines.  A system  based on
three   simple   primitive   concepts
called  boxes,  glue,  and  penalties
provides   the    ability   to   deal
satisfactorily with a wide variety of
typesetting  problems  in  a  unified
framework, using  a single  algorithm
that determines  optimum breakpoints.
The algorithm  avoids backtracking by
a judicious  use of the techniques of
dynamic    programming.     Extensive
computational   experience   confirms
that the  approach is  both efficient
and    effective     in     producing (14/3)
high-quality   output.    The   paper
concludes with  a  brief  history  of
line-breaking   methods,    and    an
appendix   presents    a   simplified
algorithm that requires comparatively
few resources.

The Greedy Approach

This paper  discusses a  new approach
to  the   problem  of   dividing  the
text of  a  paragraph into  lines  of
approximately equal  length.  Instead
of  simply   making   decisions   one
line at  a time, the method considers
the  paragraph   as   a   whole,   so (14/5)
that  the   final  appearance   of  a
given  line   might   be   influenced
by  the  text  on  succeeding  lines.
A  system   based  on   three  simple
primitive  concepts   called   boxes,
glue,  and   penalties  provides  the
ability to deal satisfactorily with a
wide variety  of typesetting problems
in  a   unified  framework,  using  a
single  algorithm   that   determines
optimum  breakpoints.  The  algorithm
avoids backtracking  by  a  judicious
use  of  the  techniques  of  dynamic
programming. Extensive  computational
experience confirms that the approach
is  both   efficient  and   effective
in  producing   high-quality  output.
The  paper  concludes  with  a  brief
history of line-breaking methods, and
an  appendix  presents  a  simplified
algorithm that requires comparatively
few resources.

The Dynamic-Programming Approach

The paragraph above is set in a column of 37 spaces over 29 lines in both cases: but a greedy strategy is used on the left and a clever (dynamic programming) one on the right.

The greedy paragraph took text into the second line “because it could”, but paid for it later with ugly extra white space in the last ten lines.

We define the cost of a typesetting to be the maximum over all lines (except the last) of the average white space between adjacent words on each line separately.

In this assignment you will program a clever paragraph-maker in Python3.

Assertions and Invariants

Although this assignment asks you to write the clever, dynamic-programming algorithm, and test it, it is not about that algorithm itself, nor is it about dynamic programming. (You will learn more about those things in other courses.)

This assignment is about how to use assertions, invariants and static reasoning to write complicated programs and, by doing so, increase the chance that they will be correct. The dynamic programming we do here is just one example of where it really matters. While we will use automated tools to help us mark your assignments, in reality, we could (and sometimes might) actually read your code, and mark it like an essay.

How this assignment works

While this assignment focuses on how to use assertions, invariants and static reasoning, you will nevertheless be submitting (Python) code. You will be given a starter file, called assignment.py. You will also be given a tool-kit of utilities we markers will use to aid us in understanding and marking your code. You will submit assignment.py, after filling in the lines we have left with TODO() functions.

For each of the parts, you should type in the code yourself — really. As you type things out, you will be forced to think about the code, and will be more effective when answering the questions. Typing in the code is one of the tricks of the iFM trade. Trust us — you will get more out of the assignment if you type the code yourself.

You can replace TODO functions with one, or muliple lines. You can also create additional variables if you want to. Be careful, though -- make sure your extra variables aren't replacing the variables in your assertions!

Pifm

pifm is a libary designed to help you write code in the iFM style, and to help us mark you. To run your code, you will need pifm installed.

The simplest way to use it is to run 6721 pifm on a CSE computer. This will run a python interpreter with pifm already installed for you.

If you would like to install it yourself, feel free to do so, but ensure it still works on a CSE machine.

We won't provide specific installation instructions; but the files in /web/cs6721/pifm_public/ should be sufficient to run pifm on your own machine.

Submission and Due Date Information

This assignment is due Friday of Week 8 at 8pm AEST (22nd of July). The standard UNSW late penalty applies -- 5% subtracted from your assignment's final score for every day or part day late; capped at 5 days (120 hours), after which no submission is possible.

You will submit using give. The submission command is

give cs6721 assignment assignment.py

Or, you can submit using the give online system.

The General Strategy

Your program will be run with two command line arguments:

A character to indicate what printing strategy to use (G for greedy, M for minimum cost, S for the shortest optimal paragraph), and
a non-negative integer line-width, called context.LINE_WIDTH.

It will also be given a series of words, on standard input, to format.

Those words will be collected by our starter code into a sequence called context.words of words (character strings).

The output of the program will be a paragraph, built from the words in context.words, in which all lines (except possibly the last) are exactly context.LINE_WIDTH characters wide, achieved by adding blanks between words where necessary. The program prints out that paragraph.

In the first section, the printed paragraph will be constructed greedily.

After the first section, we will change our program so that the printed paragraph will be of optimal (minimum) cost, where:

the cost of a single line is the average number of blanks that are needed between each adjacent pair of words, and
the overall cost (i.e. of the whole paragraph) is the largest cost of any single line (except the last). If the line has a single word, the cost is defined to be the number of spaces on that line. For example, if the words on a line have lengths 4,5,9,1,3,8 (as in the first line of Fig. 1) and the line-width context.LINE_WIDTH is 37, then 37−(4+5+9+1+3+8), that is 7 blanks must be distributed over the 5 inter-word spaces. Since we can use only whole blanks, we might mix 1- and 2-blank spaces in the pattern 1,2,1,2,1. Or perhaps 2,2,1,1,1; or maybe 1,1,1,2,2. But gaps-lengths must not differ by more than one within a single line. Thus the pattern 1,1,1,1,3 is not allowed.

Now, there are six parts to the assignment:

Part 0 will be to setup the assignment, and understand what is required of you.
Part 1 will be to implement a simple, greedy program for printing out words.
Part 2 will be to implement a program that goes from the end of the list of words towards the beginning, thus calculating the minimum possible cost of making a paragraph for successively longer suffixes of the word list. (That’s the dynamic-programming part.)
In Part 3, you will use the the results of Part 2, but now starting from the beginning of the list of words, to split the word-list into lines such that each line fits within the width and the cost of that line is no worse than the minimum achievable for the whole paragraph (as was calculated by Part 2).
In Part 4 you will use the results of Part 3 to print out the actual paragraph, complete with any extra blanks necessary so that each line (except the last) is exactly context.LINE_WIDTH characters wide.
Finally, in Part 5 you will implement a modification to Part 4 which causes it to print out the least number of lines, while maintaining minimum cost.

Part 0: Getting Started

The Python3 code below contains all the preliminary code you will need for this exercise.

You should look at each part individually.

Pifm Initialise: Pifm is a COMP6721 utility to help us mark your work. This initialisation will check whether you are using any functionality we do not allow, and will make sure that the functions we are marking are present.
Propositions to use: You will be using invariants that we have already provided, but some of them are difficult to write out simply using Python. Those propositions are empty (for now), but describe the in English the propositions our invariants will use. During marking, we will fill those in with real checks.
Scanning in: For this assignment, gathering the input is uninteresting, and so we have implemented that for you. You should ensure you understand what format the input will be in; but you can call this function in your program without needing to make any changes.
Helpers: These are simply functions we have written to make our lives easier. You do need to change, remove or add them.
the Context tuple: Some functions we will write would have very long function signatures, or require the use of global variables, when actually they just require a few values. This tuple is used to store those values. Think of it as a struct.

You should copy this code to the top of your file -- it contains functions you will need in the assignment:

### Paragraph.py --- Program Synopsis
#
# % python3 Paragraph.py MODE WIDTH < WORDS
#
# The column WIDTH (non-negative integer) is given as the only argument;
# WORDS comes from standard input, over as many lines as it takes, separated
# by white space. EOF from the terminal is ^D (as usual).
#
# The first section of the program prints words right-aligned, greedily.
#
# The remaining sections print WORDS in lines of exactly WIDTH characters,
# except possibly the last line, which might be shorter. They minimise the
# largest average white space between words in order to achieve the right-margin
# justification --- i.e. that the right margin is a straight vertical line.
#
# Although the average white space needed in a line might not be a
# whole number of blanks, the program simulates that in a fixed-width
# font by using a mixture of "just more than average" and "just less
# than average" inter-word gaps.

from typing import Tuple, List

# Setup the `pifm` code.

from pifm import *
pifm_initialise('22T2_ass1')

########## PROPOSITIONS TO USE ###########
# Do not modify this section.

def word_lines_contains_all_words(context, l, word_lines):
    # NOTE -- Do not fill this in.
    # However, you should check for yourself that your invariants are true.
    return True

def word_lines_cannot_be_any_longer(context, word_lines):
    # NOTE -- Do not fill this in.
    # However, you should check for yourself that your invariants are true.
    return True

def lines_contain_all_words_from_word_lines(lines, word_lines):
    # NOTE -- Do not fill this in.
    # However, you should check for yourself that your invariants are true.
    return True

def lines_start_with_right_amount_of_spaces(lines):
    # NOTE -- Do not fill this in.
    # However, you should check for yourself that your invariants are true.
    return True

def costs_are_correct(costs, n):
    # NOTE -- Do not fill this in.
    # However, you should check for yourself that your invariants are true.
    return True

def costs_to_i_are_minimum(ci, n, i):
    # NOTE -- Do not fill this in.
    # However, you should check for yourself that your invariants are true.
    return True

def cost_of_line_is_lte_first_cost(costs, word_lengths_per_line):
    # NOTE -- Do not fill this in.
    # However, you should check for yourself that your invariants are true.
    return True

def best_i_is_correct(context, best_i):
    # NOTE -- Do not fill this in.
    # However, you should check for yourself that your invariants are true.
    return True

########## SCANNING IN DATA ###########
# This will not be used in marking.
# We suggest you do not modify it.

def get_data() -> Tuple[List[str], str, int]:
    from sys import argv, stdin, stderr, exit

    if len(argv) != 3:
        print(f"Usage: python3 {argv[0]} MODE WIDTH", file=stderr)
        exit(1)

    mode = argv[1]
    if mode not in ["G", "M", "S"]:
        print(
            "Mode must be 'G' (greedy), 'M' (minimum cost), or 'S' (shortest). ",
            file=stderr
        )
        exit(1)

    line_width = int(argv[2])

    if line_width < 0:
        print("Line width must be a non-negative integer.", file=stderr)
        exit(1)

    # Get the words from standard input into words, and lengths into word_lengths.

    words = []
    for line in stdin.readlines():
        words = words + line.split()

    # Check that the input satisfies the program’s precondition.
    for word in words:
        if len(word) > line_width:
            print("1053: No word may be longer than line_width.")
            exit(1)

    return (words, mode, line_width)


########## HELPERS ###########

def TODO():
    """
    Calls to this function indicate code you need to complete.
    """
    return None

def flatten_list(list_of_lists):
    """
    concatenate a list of lists.
    """
    return sum(list_of_lists, [])

def get_word_lengths(words):
    return list(map(len,words))

# The `Context` tuple is a series of values
# that are used many times throughout the program.
#
# They are effectively global constants, but we pass them around
# so that functions can be tested individually.
#
# NUM_WORDS    : the number of words scanned in from stdin.
# LINE_WIDTH   : the number of characters allowed on each line.
# words        : the list of words scanned in from stdin.
# word_lengths : word_lengths[i] is len(words[i])

from collections import namedtuple
Context = namedtuple('Context', ['NUM_WORDS', 'LINE_WIDTH', 'words', 'word_lengths'])

You should copy this code to the bottom of your file -- it contains the "main" code you will be using. You do not need to type this yourself.

########## Main Program ###########

def main():
    from sys import stderr

    words, mode, line_width = get_data()

    NUM_WORDS = len(words)
    word_lengths = get_word_lengths(words)
    context = Context(NUM_WORDS, line_width, words, word_lengths)

    if mode == 'G':
        # Greedy
        lines = make_paragraph_lines_greedily(context)
        for line in lines:
            print(line)
    elif mode == 'M':
        # Minimum Cost
        costs = get_minimum_costs(context)
        word_lengths_per_line = get_word_lengths_per_line(context, costs, word_lengths)
        print_paragraph_lines(context, word_lengths_per_line)
    elif mode == 'S':
        # Shortest
        costs = get_minimum_costs(context)
        word_lengths_per_line = get_shortest_word_lengths_per_line(context, costs, word_lengths)
        print_paragraph_lines(context, word_lengths_per_line)
    else:
        print("Mode must be 'G', 'M', or 'S'", file=stderr)
        return 1


if __name__ == "__main__":
    import sys

    sys.exit(main() or 0)

Part 1: The Greedy Approach

As a warm up, let us first write a greedy program. Complete the code shown below for Part 1. You will have to complete two functions: fits, which uses the context (particularly context.LINE_WIDTH) to determine if the words between l and h fit in the line length; and make_paragraph_lines_greedily.

For added fun, you should print out right aligned text. This means that you should have one space between each word, except the start of each line should have however many additional spaces ensure that the lines fit to both margins. Previously this said (except the last) which was not correct. The assignment has been updated with examples for you to test against

########## PART 1 ###########

def fits(context: Context, l: int, h: int) -> bool:
    requires("fits_bounds", 0 <= l < h <= context.NUM_WORDS)
    return TODO()

def make_paragraph_lines_greedily(context: Context):
    word_lines = TODO()
    l = TODO()

    while TODO():
        holds_invariant(
            "word_lines contains all words[:l]",
            word_lines_contains_all_words(context, l, word_lines)
        )
        holds_invariant(
            "each element in word_lines could not be any longer",
            word_lines_cannot_be_any_longer(context, word_lines)
        )
        h = TODO()
        while TODO():
            holds_invariant(
                "l to h fits inside a line",
                fits(context, l, h)
            )
            TODO()

        word_lines = TODO()
        l = TODO()

    lines = TODO()

    i = TODO()
    while i != len(word_lines):
        # NOTE: the below invariant could be better worded as
        # lines[j] contains all words from word_lines[j] for all 0 <= j <= len(lines)
        holds_invariant(
            "lines[i] contains all words from word_lines[i] for all 0 <= i < len(lines)",
            lines_contain_all_words_from_word_lines(lines, word_lines)
        )
        holds_invariant(
            "each element of lines starts with the right amount of spaces",
            lines_start_with_right_amount_of_spaces(lines)
        )
        TODO()

    return TODO()

Examples

The first paragraph of this part, greedily formatted for width 30.

    As a warm up, let us first
       write a greedy program.
 Complete the code shown below
  for Part 1. You will have to
       complete two functions:
        `fits`, which uses the
 `context` to determine if the
 words between `l` and `h` fit
       in the line length; and
       `print_lines_greedily`.

The same text, of width 40.

      As a warm up, let us first write a
 greedy program. Complete the code shown
      below for Part 1. You will have to
   complete two functions: `fits`, which
  uses the `context` to determine if the
    words between `l` and `h` fit in the
line length; and `print_lines_greedily`.

Part 2: Calculaing Costs using Dynamic Programming

The dynamic-programming aspect of this algorithm is that, in order to minimise (optimise) the cost of the whole paragraph, we actually find the optimum costs for all its suffixes along the way. Thus we introduce a sequence of real numbers costs[0:N], and the postcondition of this first part will be that "For all n in [:N] we have costs[n] == "the optimal achievable cost of typsettting the suffix words[n:] in line-width LINE_WIDTH."

That means in particular that when the postcondition is established, the element costs[0] will be the optimal achievable cost for the whole paragraph. (But we do not yet know how to achieve that optimal cost.)

A code skeleton for all that is shown below: again, you are to type this in and fill the the TODO functions.

########## PART 2 ###########

def cost(context: Context, l: int, h: int) -> float:
    requires("l to h fits inside a line", fits(context, l, h))
    blanks_needed = TODO()
    words = TODO()
    TODO()

# ERRATA: costs should be a `List[float]`, not a `List[int]`.
def overall_cost(context: Context, costs: List[int], l: int, h: int) -> float:
    requires("l and h are in the right order", 0 <= l < h)
    requires("l and h fit inside a line", fits(context, l, h))

    return TODO() if h < context.NUM_WORDS else 0

def get_minimum_costs(context: Context):
    n = context.NUM_WORDS
    costs = [0] * n

    while n != 0:
        holds_invariant("costs contains the correct value", costs_are_correct(costs, n))
        n = TODO()
        i = TODO()
        costs_to_i = TODO()

        while TODO():
            holds_invariant("costs_to_i is the smallest possible", costs_to_i_are_minimum(costs_to_i, n, i))
            i = i + 1
            costs_to_i = TODO()

        costs[n] = costs_to_i

    return costs

Thus for Part 2, you should fill-in the sections indicated by the "TODO" functions. Type the whole thing in; and test it together with the code you used for Part 0. TODO: how do you test it all together.

Part 3: Calculate the paragraph lines.

In Part 3, the results of Part 2, especially the costs sequence, are used to split the original list context.words of words into the separate lines that will achieve the optimal-cost layout for the paragraph overall.

This time we start from the front of context.word_lengths, making the lines of the eventual paragraph one by one. For each of those lines we take as many words as are necessary to reach a “rest of paragraph” point where the overall cost is sufficiently low. (This Part 3 is a greedy algorithm, not a dynamic-programming algorithm as Part 2 was.)

Fill in the code where marked as TODO(). Test it in combination with your earlier code by printing the relevant variables and checking that against your test input.

########## PART 3 ###########

# ERRATA: costs should be a `List[float]`, not a `List[int]`.
def get_word_lengths_per_line(context: Context, costs: List[int], word_lengths: List[int]):
    word_lengths_per_line = TODO()
    n = TODO()

    while n != context.NUM_WORDS:
        holds_invariant(
            "word_lengths contains all words up to n",
            word_lengths == flatten_list(word_lengths_per_line) + word_lengths[n:]
        )
        holds_invariant(
            "costs[n] less than 0 if we're not on the last line.",
            costs[n] <= costs[0] if n < context.NUM_WORDS else True
        )
        holds_invariant(
            "cost of the line <= first cost",
            cost_of_line_is_lte_first_cost(costs, word_lengths_per_line)
        )

        i = TODO()

        # take the first possible choice for this line that minimizes the cost.
        while TODO():
            holds_invariant("words[n] to words[i] fits", fits(context, n, i))
            i = TODO()

        word_lengths_per_line = TODO()
        n = TODO()

    return TODO()

Part 4: Print the actual paragraph.

This last part of the "dynamic" approach prints out the formatted paragraph, filling every line (except the last) so that it takes exactly context.NUM_WORDS spaces. Given that the lengths of the paragraph’s lines are now available in word_lengths_per_line, that seems an easy job:

something like

word_list = context.words
for words in word_lengths_per_line:
    length = len(ws)
    line = word_list[:length]
    word_list = word_list[length:]
    print(line) # Blanks needed here.

would do... sort of.

That is, except for the “blanks needed here” issue. The main job in this part is the print_paragraph_lines function, which inserts those blanks. In that function there are, as usual, TODO()s to fill in.

An important note: you are not required insert the needed blanks in any particular order: you can put all longer ones at the front, or at the back: for now, consider it to be a matter of personal taste. But you must not have gaps whose lengths differ by more than one within a single line. Thus for example to get 7 blanks into 3 gaps you can have 2,2,3 or 2,3,2 or 3,2,2 — but they are the only possibilities. You cannot have 1,2,4 or similar.

########## PART 4 ###########

from math import floor,ceil

def make_line(context: Context, words: List[str], blanks: int):
    requires("the number of blanks is greater than 0", blanks >= 0)
    requires("there is at least one word to include", words != [])

    # If there’s only one word, just put all blanks at the end.
    if TODO():
        return TODO()

    # Otherwise, the usual case...
    # blanks is the number of blanks to distribute,
    # gaps is the number of inter-word gaps into which they must fit,
    # and each gap must receive between floor(blanks/gaps) and ceil(blanks/gaps) blanks.

    gaps = TODO()
    blanks_left = TODO()
    gaps_left = TODO()
    line = TODO()

    for todo in TODO():
        holds_invariant(
            "we haven't distributed too many or too few blanks",
            floor(blanks / gaps) <= (blanks_left / gaps_left) <= ceil(blanks / gaps)
        )
        if TODO():
            blanks_here = TODO()
        else:
            blanks_here = TODO()

        # HINT: blanks_here must be between floor(blanks/gaps) and ceil(blanks / gaps)

        line += TODO()
        TODO()

    ensures("blanks_left is 0, because of the invariant", blanks_left == 0)
    ensures(
        "line is exactly LINE_WIDTH long, except the last line.",
        len(line) == context.LINE_WIDTH or words == context.words[-len(words):]
    )
    return line

You may copy the code below; you do not have to re-write it yourself:

########## PART 4 - Helper Code ###########

def print_paragraph_lines(context: Context, word_lengths_per_line: List[List[int]]):
    max_blanks = 0 # For reporting only.
    line_count = 0 # For reporting only.
    words_left = context.words
    real_lines = []
    for line in word_lengths_per_line:
        line_words = words_left[:len(line)]
        words_left = words_left[len(line):]

        gaps_here = len(line) - 1
        if words_left == []:
            # Printing the last line
            text = make_line(context, line_words, gaps_here)
            print(text)
        else:
            blanks_needed_here = context.LINE_WIDTH-sum(line)
            text= make_line(context, line_words, blanks_needed_here)
            max_blanks= max(
                max_blanks,
                blanks_needed_here/gaps_here if len(line)>1 else blanks_needed_here
            )
            real_lines.append(text)
            print("%s %d/%d = %.2f." % ( \
                text, blanks_needed_here, \
                gaps_here,blanks_needed_here/gaps_here \
                if len(line)>1 else blanks_needed_here)
            )
        line_count= line_count+1

    print(" "*context.LINE_WIDTH+"^")
    numbers = "1234567890"*floor(context.LINE_WIDTH/10)
    print(("==> 567890"+numbers)[:context.LINE_WIDTH]+"|")
    print("==> The maximum (fractional) white space for column width" + \
        " %3d was %.2f over %d lines." % \
        (context.LINE_WIDTH, max_blanks,line_count))

    return real_lines

Part 5: A Smarter Approach

In Part 3, we determined which words went on which lines by finding the first word on a line where we could break before it and keep the whitespace in the paragraph minimized. We were guaranteed by the algorithm in Part 2 that there must be at least one way to break up the paragraph following this procedure but there might be more than one choice for how to do this. A consequence of picking the first word that minimized whitespace is that we might end up using more lines overall.

In this section, we will implement a minor change to our stragegy. We will copy Part 3, but we will always pick the longest line which minimizes the rest-of-paragraph whitespace cost.

########## PART 5 ###########

# ERRATA: costs should be a `List[float]`, not a `List[int]`.
def get_shortest_word_lengths_per_line(context: Context, costs: List[int], word_lengths: List[int]):
    word_lengths_per_line = TODO()
    n = TODO()

    while n != context.NUM_WORDS:
        holds_invariant(
            "word_lengths_ok",
            word_lengths == flatten_list(word_lengths_per_line) + word_lengths[n:]
        )
        holds_invariant(
            "costs_ok",
            costs[n] <= costs[0] if n < context.NUM_WORDS else True
        )
        holds_invariant(
            "cost_lte_first_cost",
            cost_of_line_is_lte_first_cost(costs, word_lengths_per_line)
        )

        i = TODO()
        best_i = None
        # take the last possible choice for this line that minimizes the cost.
        while fits(context, n, i) and i != context.NUM_WORDS:
            holds_invariant(
                "best_i is the largest index before the end" + \
                "that minimizes the cost, if it exists",
                best_i_is_correct(context, best_i)
            )
            TODO()

        if best_i is None:
            best_i = TODO()


        word_lengths_per_line = TODO()
        n = TODO()

    return word_lengths_per_line

We can see the difference in the two strategies here:

IN  the  days   when   everybody   started
fair,  Best  Beloved,  the  Leopard  lived
in  a  place  called   the   High   Veldt.
’Member   it   wasn’t   the   Low   Veldt,
or   the   Bush   Veldt,   or   the   Sour
Veldt,  but  the  ’sclusively  bare,  hot,
shiny High Veldt,  where  there  was  sand
and sandy-coloured  rock  and  ’sclusively
tufts  of   sandy-yellowish   grass.   The
Giraffe  and  the  Zebra  and  the   Eland
and  the   Koodoo   and   the   Hartebeest
lived there;  and  they  were  ’sclusively
sandy-yellow-brownish   all   over;    but
the  Leopard,  he  was   the   ’sclusivest
sandiest-yellowish-brownest of  them  alla
greyish-yellowish  catty-shaped  kind   of
beast,  and  he  matched  the  ’sclusively
yellowish-greyish-brownish colour  of  the
High Veldt to  one  hair.  This  was  very
bad for the  Giraffe  and  the  Zebra  and
the rest of them; for he would lie down by
a  ’sclusively  yellowish-greyish-brownish
stone  or  clump  of   grass,   and   when
the Giraffe or  the  Zebra  or  the  Eland
or  the  Koodoo  or   the   Bush-Buck   or
the Bonte-Buck came by he  would  surprise
them  out  of  their  jumpsome  lives.  He
would  indeed!  And,   also,   there   was
an  Ethiopian  with  bows  and  arrows  (a
’sclusively greyish-brownish-yellowish man
he was then), who lived on the High  Veldt
with the Leopard; and the two used to hunt
togetherthe Ethiopian with  his  bows  and
arrows, and the Leopard  ’sclusively  with
his teeth and clawstill  the  Giraffe  and
the Eland and the Koodoo  and  the  Quagga
and all the rest of them didn’t know which
way to jump,  Best  Beloved.  They  didn’t
indeed!

Simple Dynamic Approach
Line-width 42, cost 3.33, lines used 39.

IN the days when everybody  started  fair,
Best Beloved, the Leopard lived in a place
called the High Veldt. ’Member  it  wasn’t
the Low Veldt, or the Bush Veldt,  or  the
Sour Veldt, but the ’sclusively bare, hot,
shiny High Veldt, where there was sand and
sandy-coloured   rock   and    ’sclusively
tufts  of   sandy-yellowish   grass.   The
Giraffe  and  the  Zebra  and  the   Eland
and  the   Koodoo   and   the   Hartebeest
lived there;  and  they  were  ’sclusively
sandy-yellow-brownish   all   over;    but
the  Leopard,  he  was   the   ’sclusivest
sandiest-yellowish-brownest of them  all-a
greyish-yellowish  catty-shaped  kind   of
beast,  and  he  matched  the  ’sclusively
yellowish-greyish-brownish colour  of  the
High Veldt to one hair. This was very  bad
for the Giraffe and the Zebra and the rest
of  them;  for  he  would  lie   down   by
a  ’sclusively  yellowish-greyish-brownish
stone or clump  of  grass,  and  when  the
Giraffe or  the  Zebra  or  the  Eland  or
the  Koodoo  or  the  Bush-Buck   or   the
Bonte-Buck  came  by  he  would   surprise
them  out  of  their  jumpsome  lives.  He
would  indeed!  And,   also,   there   was
an  Ethiopian  with  bows  and  arrows  (a
’sclusively greyish-brownish-yellowish man
he was then), who lived on the High  Veldt
with the Leopard; and the two used to hunt
together-the Ethiopian with his  bows  and
arrows, and the Leopard  ’sclusively  with
his teeth and claws-till the  Giraffe  and
the Eland and the Koodoo  and  the  Quagga
and all the rest of them didn’t know which
way to jump,  Best  Beloved.  They  didn’t
indeed!

Shorter Line Dynamic Approach
Line-width 42, cost 3.33, lines used 38.

Administrivia

Marking Notes

The intent of this assignment is to assess your understanding of the techniques and tools taught in this course. As such, marks will be awarded for writing code based on the invariants. Full marks will not be awarded for code that "works" but does not satisfy the invariants -- even if your code can create the correct answer; if you've not respected the invariants, you have missed the point of the assignment.

This assignment will be marked using a combination of automated and manual techniques.

We are not going to assign exact marks to specific sections, functions or "TODO"s. There will be partial marks for completing some functions or TODOs, but they will depend on exactly what work is or is not completed.

Each section will have a similar (but not necessarily equal) weighting. We expect that the first section (greedy formatting) will be worth slightly less than other sections.

Assignment Conditions

Joint work is not permitted on this assignment.

This is an individual assignment.

The work you submit must be entirely your own work. Submission of any work even partly written by any other person is not permitted.

The only exception being if you use small amounts (< 10 lines) of general purpose code (not specific to the assignment) obtained from a site such as Stack Overflow or other publicly available resources. You should attribute the source of this code clearly in an accompanying comment.

Assignment submissions will be examined, both automatically and manually for work written by others.

Do not request help from anyone other than the teaching staff of COMP6721.

Do not post your assignment code to the course forum - the teaching staff can view assignment code you have recently autotested or submitted with give.

Rationale: this assignment is an individual piece of work. It is designed to develop the skills needed to produce an entire working program. Using code written by or taken from other people will stop you learning these skills.
The use of code-synthesis tools, such as GitHub Copilot, is not permitted on this assignment.

Rationale: this assignment is intended to develop your understanding of basic concepts. Using synthesis tools will stop you learning these fundamental concepts.
Sharing, publishing, distributing your assignment work is not permitted.

Do not provide or show your assignment work to any other person, other than the teaching staff of COMP6721. For example, do not share your work with friends.

Do not publish your assignment code via the internet. For example, do not place your assignment in a public GitHub repository.

Rationale: by publishing or sharing your work you are facilitating other students to use your work, which is not permitted. If they submit your work, you may become involved in an academic integrity investigation.
Sharing, publishing, distributing your assignment work after the completion of COMP6721 is not permitted.

For example, do not place your assignment in a public GitHub repository after COMP6721 is over.

Rationale:COMP6721 sometimes reuses assignment themes, using similar concepts and content. If students in future terms can find your code and use it, which is not permitted, you may become involved in an academic integrity investigation.

Violation of the above conditions may result in an academic integrity investigation with possible penalties, up to and including a mark of 0 in COMP6721 and exclusion from UNSW.

Relevant scholarship authorities will be informed if students holding scholarships are involved in an incident of plagiarism or other misconduct. If you knowingly provide or show your assignment work to another person for any reason, and work derived from it is submitted - you may be penalised, even if the work was submitted without your knowledge or consent. This may apply even if your work is submitted by a third party unknown to you.

If you have not shared your assignment, you will not be penalised if your work is taken without your consent or knowledge.

For more information, read the UNSW Student Code , or contact the course account.