15 Jan 2023

Why don't we inline our tests with our code?

One of my favourite things about TDD is that our tests act as documentation for our code. If I'm reading an unfamiliar codebase, and I know that it was properly TDDed, the first place I look is the tests. Well written tests tell me what the intent of some code is. Then if I'm interested in getting into the weeds of how it works, I can always read the code itself later.

So why do we always hide our tests away from our code? Lots of best-practices put them in a completely different directory tree, but even golang encourages some_file.go to be tested in some_file_test.go.

Of course, if I'm writing a high-level test that tests my whole program, it probably belongs in its own top-level directory. But I want to think about the low-level unit-tests that we write so many of. What if we wrote tests as if they were docstrings?

I'd like to take a leaf out of Knuth's literate programming book. Let's think of our code as a document. Let's introduce each function or procedure in an order that makes narrative sense, and let's describe it (with a test!) as we introduce it.

Let's try.

Example: Advent of Code Day 5

During my 2022/2023 winter holidays, I had a little play with the lovely Advent of Code. Below are extensive spoilers for Day 5, Part 1.

Narrative Choices

Before I post my code, it's worth taking a moment to notice that if we're thinking of a computer program as a document, then we have to make narrative choices.

Who is the target audience for this document?
What can I assume they already know?
What is their goal when reading this document?

When I wrote the code below, I was playing a silly game. I was solving a puzzle, and writing mostly to my future self. The result is a program that reads like a live-tweet of a puzzle-solving session.

I think this is absolutely fine – and this happens a lot at work. If you've ever spiked on a problem, this is the kind of thing you might have come up with. It's not thoroughly TDDed (there are just enough tests to help me move on to the next bit of the puzzle), and it's got plenty of room for refactors. For ideas about where these refactors might go, see What's Next? below.

An example spike, with inline tests.

So here's my python spike of Advent of Code 2022 Day 5, Part 1. With inline tests.

#!/usr/bin/env python3

import sys

"""
Day 5 of the 2022 Advent of code involves manipulating stacks of data. We're
presented with an initial state that looks something like this:
"""

test_initial_state = """\
    [D]    
[N] [C]    
[Z] [M] [P]
 1   2   3 
"""

"""
Notice that every line of our `initial_state` contains the same number of
characters. So there are a bunch of spaces at the end of many of the lines.

We're also going to get a list of instructions that look something like this:
"""

test_instructions = """\
move 1 from 2 to 1
move 3 from 1 to 3
move 2 from 2 to 1
move 1 from 1 to 2
"""

"""
Each instruction tells us to move some number of items from one stack to
another stack. Our goal is to execute our instructions on our initial state,
and interrogate the output.

But before we get to that, let's be a bit clearer about our input. Our input
will be a string (from a file, or a pipe or similar) which contains first our
initial state, then a blank line, then our instruction list.

So, our `test_initial_state` and our `test_instructions` above might be
presented as follows:
"""

test_data = """\
    [D]    
[N] [C]    
[Z] [M] [P]
 1   2   3 

move 1 from 2 to 1
move 3 from 1 to 3
move 2 from 2 to 1
move 1 from 1 to 2
"""

def test_input_separation():
    """
    The first thing we're going to need to do is separate our input into the
    initial state, and the instructions we want to run over it.
    """
    (initial_state, instructions) = separate_inputs(test_data)
    assert initial_state == test_initial_state
    assert instructions == test_instructions

def separate_inputs(input_text):
    initial_state = ""
    instructions = ""
    still_reading_the_initial_state = True

    for line in input_text.splitlines():
        if line == "":
            still_reading_the_initial_state = False
            continue

        if still_reading_the_initial_state:
            initial_state += line+"\n"
        else:
            instructions += line+"\n"

    return (initial_state, instructions)

def test_parse_state():
    """
    Our initial state is in an awkward format. We want to be able to treat each
    column of the text as a stack.
    """
    parsed_state = parse_state(test_initial_state)
    expected_state = {
            "1": ["Z", "N"],
            "2": ["M", "C", "D"],
            "3": ["P"]
            }
    assert parsed_state == expected_state

def parse_state(state_string):
    parsed_lines = []
    for line in state_string.splitlines():
        parsed_lines.append(parse_state_line(line))

    parsed_state = {}
    keys = parsed_lines.pop()
    for key in keys:
        parsed_state[key] = []

    for line in parsed_lines:
        i = 0
        while i < len(line):
            if line[i] != " ":
                parsed_state[keys[i]].append(line[i])
            i += 1

    for stack in parsed_state.values():
        stack.reverse()

    return parsed_state

def test_parse_state_line():
    """
    Each line of an input state consists of columns of potentially useful data,
    surrounded by distracting stuff that make humans feel better.

    The useful data is either the contents of a column, or the ID of a column.
    We want an array of those things.

    Helpfully, *every data entry and column ID is precisely one ASCII character
    long*.

    The input data also consistently uses letters for data, and digits for
    column IDs. We will not rely on this.

    Note that a column can potentially be empty -- for now let's represent that
    as a space.
    """
    assert parse_state_line("    [D]    ") == [" ", "D", " "]
    assert parse_state_line("[N] [C]    ") == ["N", "C", " "]

def parse_state_line(state_line):
    i = 1
    parsed_line = []

    while i < len(state_line):
        parsed_line.append(state_line[i])
        i += 4

    return parsed_line

def test_parse_instruction():
    """
    We will represent each instruction as a triple:

    `(quantity, from, to)`

    ...which means that we wish to move `quantity` items from stack `from` to
    stack `to`.

    As with the state lines, the column IDs are precisely one character long,
    and they are all digits.

    However the number of items to move can be any natural number.
    """
    assert parse_instruction("move 1 from 2 to 1") == (1,"2","1")
    assert parse_instruction("move 38 from 9 to 4") == (38,"9","4")

def parse_instruction(instruction_string):
    instruction_tokens = instruction_string.split(" ")
    return (int(instruction_tokens[1]),instruction_tokens[3],instruction_tokens[5])

def test_parse_instruction_list():
    assert_iteraters_are_equal(parse_instruction_list(test_instructions) , [
            (1, "2", "1"),
            (3, "1", "3"),
            (2, "2", "1"),
            (1, "1", "2")
        ])

def parse_instruction_list(instructions_string):
    return map(parse_instruction, instructions_string.splitlines())

def test_execute_instruction():
    initial_state = {
            "1": ["Z", "N"],
            "2": ["M", "C", "D"],
            "3": ["P"]
            }
    expected_state = {
            "1": ["Z", "N"],
            "2": ["M"],
            "3": ["P", "D", "C"]
            }

    end_state = execute_instruction((2, "2", "3"), initial_state)

    assert end_state == expected_state

def execute_instruction(instruction, state):
    quantity = instruction[0]
    source = instruction[1]
    destination = instruction[2]
    for _ in range(quantity):
        data = state[source].pop()
        state[destination].append(data)

    return state

def test_run_program():
    initial_state = {
            "1": ["Z", "N"],
            "2": ["M", "C", "D"],
            "3": ["P"]
            }
    program = [
            (1, "2", "1"),
            (3, "1", "3"),
            (2, "2", "1"),
            (1, "1", "2")
        ]

    expected_state = {
            "1": ["C"],
            "2": ["M"],
            "3": ["P", "D", "N", "Z"]
            }
    end_state = run_program(program, initial_state)

    assert end_state == expected_state

def run_program(program, state):
    for instruction in program:
        state = execute_instruction(instruction, state)

    return state

def test_prettify_state():
    state = {
            "1": ["Z", "N"],
            "2": ["M", "C", "D"],
            "3": ["P"]
            }

    expected_output = """\
    [D]    
[N] [C]    
[Z] [M] [P]
 1   2   3 
"""

    actual_output = prettify_state(state)

    assert actual_output == expected_output

def prettify_state(state):
    output = ""
    keys = state.keys()
    for i in range(length_of_longest_list_in_map(state) - 1 , -1, -1):
        for key in keys:
            current_list = state[key]
            if len(current_list) > i:
                output += "[" + str(current_list[i]) +"] "
            else:
                output += "    "

        output = output[:-1]
        output += "\n"

    for key in keys:
        output += " " + str(key) + "  "

    output = output[:-1]
    output += "\n"

    return output

def test_length_of_longest_list_in_map():
    map_of_lists = {
            "a": [1,2],
            "b": [1,2,3],
            "c": [1]
        }

    assert length_of_longest_list_in_map(map_of_lists) == 3

def length_of_longest_list_in_map(map_of_lists):
    longest_so_far = 0
    for key in map_of_lists.keys():
        current_length = len(map_of_lists[key])
        if current_length > longest_so_far:
            longest_so_far = current_length

    return longest_so_far

def test_solve_puzzle():
    expected_solution = """\
        [Z]
        [N]
        [D]
[C] [M] [P]
 1   2   3 
"""
    actual_solution = solve_puzzle(test_data)

    assert actual_solution == expected_solution

def solve_puzzle(input_text):
    (state_text, program_text) = separate_inputs(input_text)
    initial_state = parse_state(state_text)
    program = parse_instruction_list(program_text)
    final_state = run_program(program, initial_state)
    return prettify_state(final_state)


def main():
    input_text = sys.stdin.read()
    print(solve_puzzle(input_text))

if __name__ == "__main__":
    main()


def assert_iteraters_are_equal(it1, it2):
    list1 = []
    list2 = []
    for x in it1:
        list1.append(x)
    for x in it2:
        list2.append(x)

    assert list1 == list2

If you save this code in a file called day5.py, then you can run the tests with pytest day5.py or you can solve the puzzle with ./day5.py < input.txt

The tests look like you'd expect:

$ pytest day5.py
==================================================================================== test session starts =====================================================================================
platform linux -- Python 3.8.10, pytest-7.2.0, pluggy-1.0.0
rootdir: /home/gds/workspace/advent-of-code/5
collected 10 items

day5.py ..........                                                                                                                                                                     [100%]

===================================================================================== 10 passed in 0.02s =====================================================================================
$

If you're using the same example input data we've been using throughout, then running the program looks like this:

$ ./day5.py < input.txt
        [Z]
        [N]
        [D]
[C] [M] [P]
 1   2   3

$

What's next?

If I were writing this program as a module in a larger program that was meant to be maintained, I would likely make at least the following changes:

Reverse the order in which I present the procedures. Start with the most high-level function (which might be all the reader needs to know), and get into increasing detail as I go.
Program more defensively, and add way better error handling.
Add way more hyperlinks in the prose sections. If my description of the problem doesn't match the latest online description, that's a good indication that the code needs updating.
Add way more dates in the prose sections. Statements like "The oojamaflip needs an inverted polarity, so we do that here" can easily go stale. However, statements like "On 2023-01-15 the oojamaflip needed an inverted polarity, so we handled that in procedure foo()" are much harder to falsify as the code evolves. Even if the oojamaflip changes behaviour, the code moves with it, and the dev forgets to update the comment ; the comment might still provide useful historical context to someone down the line.

If anyone's interested, I might do these refactors for a future blog.

I'm also interested in thinking about how this pattern might work in other languages. I have a hacky minimal version in haskell, and I'd like to try out rust doc comments as tests.

I also realise that while this style makes a lot of sense with imperative or procedural programs, it might be trickier with object oriented ones. I'd like to think more about this.

If you're working in a language that definitely won't support this kind of thing, but really wish it did, you might be interested in literate programming wrapper systems like noweb or org mode.

Updates and Thanks

Thanks to David Laing for suggesting a clearer "What's Next" section, and for asking for an extra docstring.

Thanks to Mike for pointing me to rust comments-as-tests, and to Jay for experimenting with how to inline tests in C#.

Thanks to glyn for pointing out that it's possible to do this kind of thing with even very unsupportive languages, if you have a literate programming wrapper for your language.