The Philosophy of Testing Code

Recently at work I ran into an interesting situation... I refactored some tests with the sole purpose of documenting requirements and making them more stateless. This not only offended people, but it caused a rift on some of the finer points of testing code. I can't speak to the offensive aspect of the situation. For reasons I am not willing to share quite yet, it is not surprising to me that I might have said things in a manner that might have offended others. But I do want to speak to the philosophy of testing that I adopted over the course of my career and why.

I think this awkward situation final motivated me to write about this topic. I think its something that I haven't really had to justify in most of the places I have worked at before, so I am glad for the most recent experience. In light of that, lets begin...

But, why?

Before even deciding on what is a good testing philosophy, I think it stands to reason that we must first ask the following. Why would I test my code at all? When we are asking this I think its crucial to not lean to either side in your reasoning. Who cares if its industry practice... Who cares if smarter people than you do... Or the inverse, who cares if the smartest engineer you know doesn't test code... who cares if you've never tested code and you're doing just fine...

I think in order to jump into that balanced mindset you have to make a mental switch. You have to live in possible worlds, rather than the actual one. You have to reason with abstractions and not particulars. To solidify and clarify the point, Aristotle once wrote,

It is the mark of an educated mind to be able to entertain a thought without accepting it.

So, lets again ask, “Why would I test my code at all?”

Naturally, I think the most logical response would be to verify that the code works. But when we say works, what do we mean? What does it mean for a piece of code to work? It seems to me that when most people say 'work' they mean that the code does what it is intended to do. What the writer or original acceptance criteria intended for. This raise an interesting point that can be overlooked...

Under that definition, it doesn't matter what the code will do in any alternate scenario. The key is that the intended scenario works and everything else is left out. Its the difference between stating, the car should be able to speed up, and the car should only be able to speed up. One will leave you in an accident and the other will simply allow you to speed up.

I believe this is why Dijkstra said the following

Program testing can be used to show the presence of bugs, but never to show their absence.

So restated it seems like, one would test code if one wanted to be able to identify when the code written experiences bugs while running the original intended scenario. But this still leaves some ambiguity. The problem can be seen if we analyze the word bugs in that previous sentence. How would we define bug? What exactly is a bug?

This can seem a little obtuse. But I see value in it. When we identify bugs, we often say something like, “oh, it shouldn't do that.” This is often uttered once someone is reviewing the output of the code. And that bit of context is key. Bugs are usually determined by the output of the code.

To use a bit of mathematical lingo, you might remember f(x), our code can often be abstracted into a function that takes certain input and produces certain outputs. Given that, whenever we receive an unexpected output that is not in line with our intended goal of the code, we say that the code has a bug.

That output can be an exception, a different UI, or a different charge. And yet, the abstraction holds, we landed on an output that we shouldn't have been expecting. So, defined this way, a bug is when the code provides an unexpected output while running the intended scenario with predetermined input.

Crucially, under this definition an unexpected output that is correct, but we simply didn't realize from how we wrote the code is not a bug. It is a successful run of the program. This highlights one of the hardest aspects of testing and software engineering. When we write our code, we are outlining the intended scenario. We are writing code that will achieve our intentions.

This translation of requirements in our heads to the actual code is one of the hardest things we do as software engineers.

In this process of translation this can often gain ambiguity. They can even be made vague. Sometimes its that we don't understand the intention from a product manager or that we don't realize the logical implications of what we have written in the code. Either way, sometimes we can deviate from the intention.

So, let me restate things holistically.

You would want to test your code in order to determine if you code satisfies the original intention under the expected scenarios. The code is determined as satisfactory if under the expected scenarios it does not result in any unexpected output given a predetermined input to your code. This unexpected input is known as bugs, and they can occur if the code has logical implications we did not intend for or if we misunderstood the requirements.

## Given, our definition, then...

Given the above restatement, that we know the end, what virtues can we adopt to ensure we achieve that end? Well, this is probably were the interesting things start to arise if you are a software engineer.

Before addressing the virtues a few words on the restatement. In the restatement we identified the following points:

We test to ensure the expected scenarios, not all scenarios

Given a piece of code, we only want to verify the original intention of the code and not all scenarios. To some its obvious why we wouldn't want to do this... we'd have to test that all scenarios don't occur. This type of exhaustive scenario results in bloated test suites that are hard to manage and often require much effort to up keep for little reward.

At times this is even symptomatic of code that needs to be refactored. The code has become so vague and gained so much technical debt that the original intention is not clear. Or, the code has gained so many use cases that now it is extremely difficult to verify the logical implications of the code that we write. I've seen this countless times. Countless...

The argument against refactoring always seems to follow some variant of the sunken cost fallacy or seems to be circular in my opinion. “We can't refactor this hard to understand code, because its so hard to understand. It will be faster to not understand it and add on top of it and simply try to squash all the resulting issues...” Unfortunately, that has been uttered many times in the rooms I have stood in. Just before some poor soul is sent out to try to accomplish that...

We test to validate the logical implications and requirements of a piece of code

Now that we are testing only the intended scenarios. And we are dialing in our tests to the intended path, we need to verify that the logical implications and requirements make sense. We need to find a way to express those requirements. Think of your tests as a CO2 detector. They are silent as long as the intended scenarios work. They only ever speak up when the intended scenarios have an unexpected output.

This creates confidence in your test suite. There is nothing worse than false positive, and there is nothing more annoying than false negatives. After a while of this, engineers start to ignore the test suite altogether. It becomes more of a secondary application to maintain rather than a tool the provides value and speeds up work.

We test to discover bugs in the expected scenario, not prevent them

I am sort of stating this without justification, but you'll have to forgive me for that. You shouldn't test how the requirements in the intended scenarios are achieved, but that they are achieved. This has to do with the black box testings and abstracting software into the f(x) paradigm. I might write on this at some other point...

But if you outline exactly how the scenarios are suppose to be achieved, then it will be extremely easy for bugs to arise in your intended scenarios. Sometimes, they will arise all the while the code is actually producing the right output. The issue is that the intended scenario happened in a different manner than the test was expecting.

In this case we are not focusing on the outputs so we have to validate the processes as well. In this validation, it'll be hard to maintain the test suite and it will only require more effort from you.

Then, what are the virtues

So, what are the virtues that can keep us on the path to this end? Well, those are actually quite simple and easy to list out. Note, all of this will be written with Ruby & Minitest, but I am sure it can be abstracted to any other language or tool.

Vritue #1 Tests should be stateless

If we are going to follow the analogy of a function that gets input and returns output, we can't depend on external state and expect reliability and stability.

Auxiliary Benefits:

-Tests are easier to understand because everything is listed within the block
-Tests are more readable and require no other pieces of knowledge to understand the requirements they aim at

First a bad example...

def subject
  described_class.call(args: @args, person: @person, charges: @charges)
end

def person
  @person ||= people(:jonny) # fixture used in many other tests
end

def args
  @args ||= { org_due_at: organizations(:blue_settings).due_at, ... }
end

...

test "charges are linked to the person and due_at is in the past" do
  result = subject
  expect(result.charges.count).must_be 3
  expect(result.org_due_at).must_be 2.days.ago
end

The above snippet unfortunately makes a lot of stateful assumptions... It assumes things about the fixtures and that none of those instance variables will be changed. And it assumes that the magic number values will remain constant and not error out. As well, its nearly impossible to look at the test block and determine whats going on...

Let's adjust the example so that it exemplifies this virtue...

test "charges are linked to the person and due_at is in the past" do
  person = people(:jonny)
  organization = organizations(:blue_settings)
  service_args = { org_due_at: organization.due_at, ... }

  result = described_class.call(args: service_args, person: person, charges: person.charges)

  expect(result.charges.count).must_be 3
  expect(result.org_due_at).must_be 2.days.ago
end

Vritue #2 Tests should be self-documenting

Tests that are self-documenting are verbose. They express their exact intent. They express the acceptance criteria. They give you all the necessary information and nothing else.

First a bad example...

test "charges are linked to the person and due_at is in the past" do
  person = people(:jonny)
  organization = organizations(:blue_settings)
  service_args = { org_due_at: organization.due_at, ... }

  result = described_class.call(args: service_args, person: person, charges: person.charges)

  expect(result.charges.count).must_be 3
  expect(result.org_due_at).must_be 2.days.ago
end

Let's adjust the example so that it exemplifies this virtue...

describe "when charges are due" do
  context "and the person is part of a blue organization" do
    it "should ensure that only charges before the due date are counted" do
      person = people(:jonny)
      blue_org = organizations(:blue_settings)
      args = { org_due_at: blue_org.due_at, ... }

      result = unitUnderTest.call(args: args, person: person, charges: person.charges)

      expect(result.charges.count).must_be 3
      expect(result.org_due_at).must_be_same_as 2.days.ago       
    end
  end
end

The difference in the examples I think is fairly apparent, but for the sake of clarity. I can read the second example and understand what our requirements are. When charges are due for blue orgs it should only count those before the due date. Boom. No ambiguity. Are they more verbose? Yes. Are they repetitive? Yes.

But when you are outlining the requirements and attempting to translate them from your head onto zeros and ones, I can think of no better practice to ensure that they are crystal clear. At some point, you might forget and need to freshen up on them. This kind of test makes that very easy and clear.

Vritue #3 Tests should only use the public api

Tests that use the public api only can abstract all implementation details away and focus on the outputs of code. They zero in on the relevant information. Going back to the function analogy. They only show the function, the input, and the output. This guarantees that the how does the matter, but the what does. It doesn't matter how something is achieved... only what the input and what the output is.

This means you can refactor and evolve the code to your hearts content. It doesn't matter if the how changes, as long as the what doesn't.

This post is quite long already...

I don't expect anyone to be reading this, but in case someone is, I probably better stop... I might need to write a series on this for my own edification...