TurtleWare

Tagged as tutorial, lisp, fiveam, 5am, tests

Written on 2017-09-05 by Tomek "uint" Kurcz

What is FiveAM?

FiveAM is a simple-yet-mature test framework. It makes test suites for your project easy to implement, maintain, organize and run.

Motivation

While it can't be said that there are no learning materials provided for FiveAM, it feels like they are lacking in both clarity and detail. Beginners are in need of gentle, friendly guidance. Experienced Lisp hackers are able to make do without it, but even they probably spend a little extra time tinkering, experimenting and skimming source code to "get" the framework. This shouldn't be necessary.

This tutorial assumes familiarity with Common Lisp and a basic understanding of ASDF system definitions.

Our building blocks

We will start with a bit of theorizing. Be not afraid, however - there won't be too much of it.

The essential terms you will need to be familiar with are:

Checks
Tests
Test suites

Checks

A check is, essentially, a single assertion - a line of code that makes sure something that should be true is indeed true. FiveAM tries to make assertions as simple as possible. The form of a basic check definition looks like this:

(is test &rest reason-args)

In this case,test is the assertion we want to make. A function (or special operator) application with any number of arguments can be used as the assertion. If it returns a true value, the assertion succeeds; if it returns NIL, it fails.

If the test parameter matches any of the 4 "templates" below, FiveAM will try to reason a little about what is what and attempt to print the explanations of failures in a more readable way. Arguably.

(predicate value)
(predicate expected value)
(not (predicate value))
(not (predicate expected value))

The logic FiveAM follows when reasoning is thus:

The first expression checks whether value satisfies predicate.

In the second one, the predicate is usually some form of equality test. The assertion makes sure the value we got (by calling some function we're testing) matches the expected value according to the predicate.

The last two tests are the same things, only negated.

In practice, these declarations look like this:

(is (listp (list 1 2)))   ; is (list 1 2) a list?
(is (= 5 (+ 2 3)))        ; is (+ 2 3) equal 5?

Simple, right? If we were implementing standard Lisp functions, we could use the above to test whether list generates a list as it should, and whether + sums properly. Or, well, at least we'd ascertain that for the above cases.

And if we wanted to negate:

(is (not (listp (list 1 2))))  ; is (list 1 2) not a list?
(is (not (= 5 (+ 2 3))))       ; is (+ 2 3) not equal 5?

As you may have noticed, we haven't used the optional reason-args argument. It's used to specify what's printed as the reason for a failed check. Sometimes FiveAM's reasoning just isn't good enough. We will get back to it when we start hacking away.

Tests

We know how to write checks, but there's not much we can actually do with just this knowledge. The is syntax is only available in the context of a test definition.

A test, as defined by FiveAM, is simply a collection of checks. Each such collection has a name so that we can easily run it later. Defining one is easy:

(test test-+
  "Test the + function"     ;optional description
  (is (= 0 (+ 0 0)))
  (is (= 4 (+ 2 2)))
  (is (= 1/2 (+ 1/4 1/4))))

We're sticking to the basics for now, but you should know there are some additional keyword parameters you can pass in order to declare dependencies, explicitly specify the parent suite, specify the fixture, change the time of compilation and/or collect profiling information.

A fixture is something that ensures a test is run in a specific context. Sometimes it's necessary to reproduce results consistently. For example, if you had a pathfinding algorithm, you'd probably have to load some sort of a map before you could test it. Apparently, using FiveAM's fixture functionality isn't recommended by the current maintainer. Perhaps it's best to just set up macros for those.

As for profiling information, this functionality doesn't seem to actually be implemented yet. Instead, Metering is a good option if needed.

You'll most likely end up defining a single test for a single function, but nothing stops you from slicing the pie up differently. Maybe a particularly complex function requires a lot of checks that are best divided into categories? Maybe a set of simple, related functions can be covered by a single test for simplicity? Your common sense is the best advisor here.

Suites

The final piece of the puzzle. These are not obligatory, but very useful. Suites are containers for tests, good if you need more hierarchy - which, honestly, you will. Speaking of hierarchy: suites can parent other suites, so you can have plenty of that.

The way suites are defined and used is roughly analogous to packages.

(def-suite tutorial-suite
    :description "A poor man's suite"
    :in some-parent-suite)

(in-suite tutorial-suite)

The first form defines a test suite called tutorial-suite. The in keyword is used to set the parent suite.

Just like in-package sets the *package* special variable, in-suite sets the *suite* one. Test definitions pick up on it when provided. Thanks to that, any test definitions after (in-suite tutorial-suite) will be included in tutorial-suite. Other suite definitions, however, won't be automagically contained in the suite pointed to by *suite*. For that reason, you always need to explicitly set the in keyword when defining a child suite.

And that's actually all there is to suites.

The story so far

Time for a quick summary - from the top, our tests are organized like this:

(optional) Top-level test suites defined with (def-suite)
(optional) Child test suites defined with (def-suite) with :in set
Tests defined with the (test) macro
Checks (assertions) defined with (is) expressions within a (test) form

A practical example

Now that all that is clear, let's try doing something with it. Imagine you are building an RPG game according to some existing pen-and-paper system. One day, it will surely rival the likes of AAA+ titles out there.

...for now, though, you only have the character generation facility down. Oh well, got to start somewhere. According to the specification of the system you're using, the stats of a character are generated randomly, but prior to the generation, a player can choose two stats they wish to "favor". Unfavored stats are decided on with a roll of two 8-sided dice, while favored ones - a roll of three 8-sided dice. You've defined a little utility function for rolling an arbitrary number of dice with an arbitrary number of sides.

You've written this basic functionality, wrapped it up in a package, defined an ASDF system, checked that everything compiles without warnings... So far, so good. But now you want to go the extra mile to make sure this is going to be a well-built piece of software. You want to integrate tests.

If you'd like to follow all the outlined steps and integrate FiveAM with me, just clone the master branch of the quasirpg repository.

git clone https://github.com/uint/quasirpg.git

Ideally, if you have quicklisp, do that in ~/quicklisp/local-projects/. Otherwise, clone the repository to either ~/common-lisp/ or ~/.local/share/common-lisp/source/.

If you wish, you can also look through the commit history of the test branch to see exactly how I've done all the work detailed in the following sections. It might come in useful if you get stuck.

If you want to see the code in action, try these:

CL-USER> (ql:quickload 'quasirpg)
CL-USER> (in-package #:quasirpg)
QUASIRPG> (roll-dice 3 6) ; throw three 6-sided dice
QUASIRPG> (make-character)
QUASIRPG> (make-character "Bob" '("str" "int"))

In case you don't have quicklisp, you can use this to load the system:

CL-USER> (asdf:load-system 'quasirpg)

Keep in mind that without quicklisp, you will also have to download FiveAM by hand. In the same directory you cloned quasirpg to, try:

git clone https://github.com/sionescu/fiveam.git

Groundwork

Tests shouldn't be a part of your software's main system. Why would they be? People who simply want to download your application and use it don't need them. Neither do they need to pull FiveAM as a dependency. So let's define a new system for tests. We could create a separate .asd file, but I like to have just one .asd file around. In this case, any additional systems defined after the main quasirpg one should be named quasirpg/some-name. So we append this to our quasirpg.asd:

(asdf:defsystem #:quasirpg/tests
  :depends-on (:quasirpg :fiveam)
  :components ((:module "tests"
            :serial t
            :components ((:file "package")
                         (:file "main")))))

We also create the new files tests/package.lisp and tests/main.lisp to make true to the above declaration. As you might guess, we're planning to define a separate package for tests. This isn't as important as separate systems, but it's always good to keep namespaces separate. Nice and tidy.

;;;; tests/package.lisp

(defpackage #:quasirpg-tests
  (:use #:cl #:fiveam)
  (:export #:run!
       #:all-tests))

And finally the star of the show:

;;;; tests/main.lisp

(in-package #:quasirpg-tests)

(def-suite all-tests
    :description "The master suite of all quasiRPG tests.")

(in-suite all-tests)

(defun test-quasi ()
  (run! 'all-tests))

(test dummy-tests
  "Just a placeholder."
  (is (listp (list 1 2)))
  (is (= 5 (+ 2 3))))

Defining a simple, argument-less test runner for the whole system (test-quasi here) isn't strictly necessary, but it's going to spare us some potential headaches with ASDF.

We define a meaningless test just so we can check whether the whole setup works. If you've done everything correctly, you should be able to load the test system in your REPL

CL-USER> (ql:quickload 'quasirpg/tests)

and run the test runner

CL-USER> (quasirpg-tests:test-quasi)

Running test suite ALL-TESTS
 Running test DUMMY-TESTS ..
 Did 2 checks.
    Pass: 2 (100%)
    Skip: 0 ( 0%)
    Fail: 0 ( 0%)

T
NIL

So far, so good!

ASDF integration

Integrating the tests with ASDF is a good idea. That way we get hooked up to the standard, abstracted way of triggering system tests. First, we add this somewhere to our quasirpg/tests system definition.

:perform (test-op (o s)
            (uiop:symbol-call :fiveam :run! 'quasirpg-tests:all-tests))

From now on, we can run all-tests with:

CL-USER> (asdf:test-system 'quasirpg/tests)

Next, we tell ASDF that when someone wants to test quasirpg, they really want to run the quasirpg/tests test-op. Somewhere in the quasirpg system definition:

:in-order-to ((test-op (test-op "quasirpg/tests")))

Now all we need to do to test our game is:

CL-USER> (asdf:test-system 'quasirpg)

Adding real tests

Most of the character generation system's math is within the dice-rolling function - it's probably a good idea to tackle that one. The only problem is it's not a very predictable one. We can, however, still do some useful things.

(defun test-a-lot-of-dice ()
  (every #'identity (loop for i from 1 to 100
                       collecting (let ((result (quasirpg::roll-dice 2 10)))
                                    (and (>= result 2)
                                         (<= result 20))))))

(test dice-tests
  :description "Test the `roll-dice` function."
  (is (= 1 (quasirpg::roll-dice 1 1)))
  (is (= 3 (quasirpg::roll-dice 3 1)))
  (is-true (test-a-lot-of-dice)))

The first two checks simply provide arguments for which the function should always spew out the same values - we're throwing one-sided dice. Just... try not to think too hard about it.

The function test-a-lot-of-dice returns true only if every one of 100 throws of two 10-sided dice is within the expected bounds, that is 2-20. All we have to do is check whether that function returns true. We can just write (is (test-a-lot-of-dice)), but I recommend using is-true instead, since the way it prints failures is more readable in cases like this.

In all honesty, test-a-lot-of-dice could be improved in terms of optimization (for example by making it a macro that wraps the 100 checks in an and) or functionality (the parameters passed to roll-dice could be random). But this version is simple and sufficient for this tutorial.

Now let's see this thing in action.

Running test suite ALL-TESTS
 Running test DICE-TESTS fff
 Did 3 checks.
    Pass: 0 ( 0%)
    Skip: 0 ( 0%)
    Fail: 3 (100%)

And there we go. We've just detected a bug that would never be caught by the compiler. A look at the first fail gives us a hint:

(QUASIRPG::ROLL-DICE 1 1)

 evaluated to 

0

 which is not 

=

 to 

1

A look at the function in question should be enough to see the problem.

  (let ((result (loop for i from 1 to n summing (random sides))))

What (random sides) does is generate a number from 0 to (sides - 1). That's not what we want.

  (let ((result (loop for i from 1 to n summing (1+ (random sides)))))

And now we re-run the tests:

Running test suite ALL-TESTS
 Running test DICE-TESTS ...
 Did 3 checks.
    Pass: 3 (100%)
    Skip: 0 ( 0%)
    Fail: 0 ( 0%)

The true power of tests, however, is that if we now ever decide to modify our dice-throwing facility, any bugs we introduce by accident will most likely be caught by the tests already in place. And so we'll avoid nasty, hard-to-debug consequences further down the line. All that without having to test things by hand each time we make changes.

Handling invalid parameters

What happens when someone passes a non-positive integer to roll-dice? Or a fractional one? We should probably control that behavior. And we should probably test to make sure when the unexpected happens, it's handled as expected.

Let's say our specification tells us that when any of the arguments is fractional, it should just be rounded down. So we append two additional tests to dice-tests:

  (is (= 3 (quasirpg::roll-dice 3.8 1)))
  (is (= 3 (quasirpg::roll-dice 3 1.9)))

The first one actually passes. It just so happens that loop is responsible for looping N times over the random number generation for each die. loop rounds down a fractional number if it's passed one.

The second test requires our attention. It fails. The problem is that random is passed a fractional argument, and it thinks it's meant to give a fractional number in response. Simple fix and we're back on track:

  (let ((result (loop for i from 1 to n summing (1+ (floor (random sides))))))

Now for something more interesting. Let's say our specification tells us that if any argument is not a positive number, we should get a SIMPLE-TYPE-ERROR. It's time to introduce yet another kind of check.

(signals condition &body body)

Not a lot to explain here. BODY is expected to cause CONDITION to be signaled. Our check only succeeds if it does. We can use this:

  (signals simple-type-error (quasirpg::roll-dice 3 -1))
  (signals simple-type-error (quasirpg::roll-dice 3 0))
  (signals simple-type-error (quasirpg::roll-dice -1 2))
  (signals simple-type-error (quasirpg::roll-dice 0 2))
  (signals simple-type-error (quasirpg::roll-dice -1 1))

Again, some of the work is already done for us. random will signal a SIMPLE-TYPE-ERROR in response to a non-positive arg. What's left to do is to handle the number of throws, so we add the appropriate code to the beginning of roll-dice:

(if (< n 1)
      (error 'simple-type-error
         :expected-type '(integer 1)
         :datum n
         :format-control "~@<Attempted to throw dice ~a times.~:>"
         :format-arguments (list n)))

And voila. Once more, all checks pass.

Random number generators

So far, we've used specific numbers. We can do better, though. We can run a large amount of checks based on random data. This is where the fiveam:for-all check comes in that runs tests 100 times, randomizing specified variables each time.

(for-all bindings &body body)

bindings is a list of forms of this type:

(variable generator)

generator is a function (or function-bound symbol) that returns random data. variable is the variable binding that stores the results from generator.

body can contain other kinds of checks.

For example, let's try replacing (is-true (test-a-lot-of-dice)) with something more comprehensive.

(for-all ((n (gen-integer :min 1 :max 10))
          (sides (gen-integer :min 1 :max 10)))
  "Test whether calls with random positive integers give results within expected bounds."
  (let ((min n)
        (max (* n sides))
        (result (quasirpg::roll-dice n sides)))
    (is (<= min result))
    (is (>= max result))))

(gen-integer :min 1 :max 10) is a function provided by FiveAM that returns a random integer generator with the specified bounds. We keep the numbers small here so that the tests don't take forever trying to throw a lot of dice, and so that there's a reasonable chance of edge cases getting tested.

We can also replace the rounding checks. Since FiveAM doesn't provide a suitable generator, we have to write our own. It's not difficult, though, thanks to CL's ease of creating higher-order functions:

(defun gen-long-float (&key (max (1+ most-positive-long-float))
                            (min (1- most-negative-long-float)))
  (lambda () (+ min (random (1+ (- max min))))))

With that definition in place, we can write the new checks:

  (for-all ((valid-float (gen-long-float :min 1 :max 100)))
    "Test whether floats are rounded down."
    (is (= (floor valid-float) (quasirpg::roll-dice valid-float 1)))
    (is (>= (floor valid-float) (quasirpg::roll-dice 1 valid-float))))

Finally, we can replace our condition checking too:

  (for-all ((invalid-int (gen-integer :max 0))
            (invalid-int2 (gen-integer :max 0))
            (valid-int (gen-integer :min 1)))
    "Test whether non-positive numbers signal SIMPLE-TYPE-ERROR."
    (signals simple-type-error (quasirpg::roll-dice valid-int invalid-int))
    (signals simple-type-error (quasirpg::roll-dice invalid-int valid-int))
    (signals simple-type-error (quasirpg::roll-dice invalid-int invalid-int2))))

If you run these tests, you'll notice only a few checks in the results. That's because FiveAM treats each for-all declaration as a single check, regardless of the contents or the hundreds of tests that actually get run.

REASON-ARGS

When the tests we've written failed, the output we got was mostly descriptive enough. That's not always the case. It's hard to expect the testing framework to know what sort of information is meaningful to us, or what the concept behind the functions we write is.

So let's say when we make-character, we want the name to be automatically capitalized. We care about punctuation and won't allow our players to get sloppy with it. Pshaw.

We add a new test:

(test make-character-tests
  :description "Test the `make-character` function."
  (let ((name (quasirpg::name (quasirpg::make-character "tom" '("str" "dex")))))
    (is (string= "Tom" name))))

Obviously, it fails.

 Failure Details:
 --------------------------------
 MAKE-CHARACTER-TESTS []: 
      
NAME

 evaluated to 

"tom"

 which is not 

STRING=

 to 

"Tom"

..
 --------------------------------

We can understand it, but put yourself in the position of someone who isn't all that familiar with the make-character function. Imagine that person just got the above output while testing the entire game. They're probably really scratching their head trying to piece this together. Let's make life easy for them. Attempt number 2:

(test make-character-tests
  :description "Test the `make-character` function."
  (let ((name (quasirpg::name (quasirpg::make-character "tom" '("str" "dex")))))
    (is (string= "Tom" name)
    "MAKE-CHARACTER should capitalize the name \"tom\", but we got: ~s" name)))

We use the &rest reason-args parameter of the is check. You can use format directives and pass it arguments, just like in a format call. Now the test result is much easier to interpret:

 Failure Details:
 --------------------------------
 MAKE-CHARACTER-TESTS []: 
      MAKE-CHARACTER should capitalize the name "tom", but we got: "tom".
 --------------------------------

Reorganizing

Let's imagine what happens when the project grows. For one thing, we'll probably write many more tests, until having all of them in one file looks rather messy.

We'll also probably eventually end up reorganizing the code. roll-dice might eventually end up a part of a collection of utilities for generating randomized results, while make-character could get moved to chargen.lisp. It would be good if the hierarchy of our tests reflected those changes and let us test only random-utils.lisp or chargen.lisp if we want to.

So above all of our dice-testing code we tuck this in:

(def-suite random-utils-tests
    :description "Test the random utilities."
    :in all-tests)

(in-suite random-utils-tests)

Now all-tests contains random-utils-tests, which in turn contains dice-tests.

Let's do the same for character generation:

(def-suite character-generation-tests
    :description "Test the random utilities."
    :in all-tests)

(in-suite character-generation-tests)

(test make-character-tests
  :description "Test the `make-character` function."
  (let ((name (quasirpg::name (quasirpg::make-character "tom" '("str" "dex")))))
    (is (string= "Tom" name)
    "MAKE-CHARACTER should capitalize the name \"tom\", but we got: ~s" name)))

You can check that running (asdf:test-system 'quasirpg) still runs all of our tests, since it launches the parent suite all-tests. But we can also do (fiveam:run! 'quasirpg-tests::make-character-tests).

The next logical step is moving the test suites to separate files. If you wish to see how I've done it, just look at this commit or at the end result in the test branch.

What else is there?

A few different kinds of checks and a way to customize the way test results and statistics are presented.

So far, we've always used run! to run all the tests, which is really a wrapper for (explain! (run 'some-test)). You can, therefore, replace the explain! function with your own.

How can you learn about those things? The best I can do is point you to the FiveAM documentation and possibly source code.

Happy hacking!

Tutorial: Working with FiveAM