The Beauty of Test Pyramids

I hate bugs. Please, don't misunderstand me: I don't necessarily despise looking into errors and fixing them. In fact, sometimes investigating bugs can be exceptionally rewarding and makes me feel like a hard-boiled detective. No, what I hate is that they break my illusion of being some kind of magician, muttering powerful spells in the shape of lines of code and bending reality to my will. According to my ideal, microservices should harmonically run together, with no hiccups, no breaks, and no dead-letter queues.

This is, of course, a deeply flawed - and quite arrogant - vision. The reality is that software soon loses its heavenly halo in the mud of crappy workarounds, fuzzy requirements, and unmalleable architectures. Its authors are quite certainly imperfect, too. In my programming career, there always was a looming side quest looking for better practices and sophistication that would attenuate the aforementioned shortcomings: one of the most valuable tools I found is the Test Pyramid.

The Test Pyramid is an ancient concept (as ancient as a book written in 2009 can be) but still holds valuable treasures. The Pyramid illustrates what the different types of tests are and which is their relative weight in a codebase. Every half-decent developer what the three types of tests featured are:

Unit Tests: their purpose is to focus on a single unit of the codebase, often a single function. They are narrow, straight to the point, and by large the most abundant in a project.
Integration Tests: how will the application behave when interacting with other systems? These tests target the interfaces between different services, such as a database or an external endpoint.
End-to-End Tests: finally, it is wise to test an application from the point of view of its user (which is typically a UI, like those offered by mobile and web apps). These automated tests are very slow and traverse the whole tech stack, but will validate the application in a way that is otherwise impossible in a more convenient way than manual QA.

However, I like working with a slightly different mental model: the Correctness Pyramid. This enhanced version - originally presented by Christina Lee during the KotlinConf 2019 - encompasses broader activities and different stakeholders, including end-users. For example, it also takes into account the detail specification phase, PRs, manual QA testing, and - at the very top - even the normal application activity in the Production environment.

The Correctness Pyramid by Christina Lee

The message is crystal-clear: the higher you go, the more painful it is to be wrong about something. It could be a misunderstood requirement, an unexplored corner case, or an algorithm that scales poorly: anything that compromises the best functioning of the software and jeopardizes its effectiveness. To see why this is true, let's make a silly example and consider the worst case possible.

Conezilla is a home delivery app that serves delicious ice-creams to its customer base. To ensure that the product delivered is always of the highest quality and doesn't melt, any order that takes more than 1 hour automatically expires and the ice-cream cannot be sold anymore. Developers take this predicament to heart and, without thinking too much about the consequences, implement a filter that hides all the expired orders from the UI. However, disgruntled users soon start complaining that they found credit card charges even if they don't remember ordering anything.

The cost of not being correct is maximum here: not only we had some kind of disruption in the production environment, but now we need to go back to the whiteboard to untangle the mess, redesign a better process, develop the fix, push it through all the different type of test phases necessary and, finally, release it. We would also make sure that the impacted users are aware of the update and willing to use the App again. You really don't want to be wrong when transitioning to the upper part of the pyramid.

As a consequence, the attentive (and self-interested) developer will devote a reasonable deal of time and focus when designing a new feature. Linters and compilers will also automatically highlight a huge number of trivial errors that nobody wants to carry upwards and figure out only later when performing unit and integration tests. Ça va sans dire, the software will be written in a way that facilitates testability, flexibility, and decoupling. Good choices made at the foundation of the Pyramid will reap compounding benefits as we travel to the top. «Give me a lever» said Archimedes «and I will move the world». The Correctness Pyramid may give us developers a different type of leverage, but it is still a very powerful one.

What is the actionable insight here? The main takeaway is that the Pyramid nudges us into down-shifting issues: the earlier I find out I am wrong, the easier and less painful will be to solve the problem. It's obviously cheaper to verify an assumption when still on the whiteboard than when opening a PR, and it's way better to write one more test than to improvise as a firefighter in Production. I believe that the majority of best practices in software engineering help us to down-shift problems, and healthy team dynamics will help to stay on track.

In other words, it's best to squish bugs even before they are born. Why? Less work, fewer emergencies, satisfied customers, and happier business owners. Additionally, I can continue to be lulled in my reassuring illusions about perfect and harmonic software.

The Beauty of Test Pyramids

References