Using integration testing should reveal any bugs in the system, especially those that rely on the system being connected correctly. Using it this way is a lazy but smart way of testing the whole system, and using it like this you can test the performance/scalability and the behavior at the same time. Also, we can test both the things we know should work and the things that shouldn’t work.

Magic. This article doesn’t require any special functional programming techniques and can be replicated everywhere, but using a functional programming language makes this easier. For example, using Arbitrary (from QuickCheck) in Haskell makes this very nice. We used this approach in “real code” at IOHK and while it may take a while for it to become fully integrated with CI, it’s something that will make our application safer and our users happier. This idea probably isn’t original, like most stuff isn’t, but I did come to this idea on my own and don’t remember reading about anything similar, so apologies if I didn’t mention anything related.


I have seen several comments about Haskell testing and how it’s something optional since the type system and the pure functions make sure the code is “perfect”. Reading comments like these, a red flag pops up in my mind. I really don’t know if there will ever be a language that can simply remove itself from the software engineering practices. I’m not saying that there is real value in creating UML diagrams, and we should all refresh our knowledge of activity diagrams.

I think there is a genuine value from stating what users want “on paper”, and I’m sure you use some similar type of reasoning when you think about user activity, even though you maybe don’t use “The Diagrams” to do so. Similarly, testing is making sure that the code does what we intend to do, and while there is a never-ending list of things a developer must possess, the correct code is at the top. Saying that a language fixes that for you is just sweeping the dirt under the rug. And then pointing to the rug when a bug is discovered. And I’m not talking about dependently-typed languages here or formal verification. I’m talking about good old testing.

I’m not saying we should test everything, but that we should have a healthy dose of paranoia. Yes, I’m talking to you. Martin.

Also, I really think that making your code testable makes it more flexible and more easy to maintain. It forces you to think how will the input (value) connect to the code you want to test and makes it easier to understand. It will force a modular design even if you don’t know what you are doing, like me. Even if you don’t write a single line of test code. And I’ll be honest, testing stuff isn’t easy if you don’t plan it and I think it’s one of the reasons why I don’t write a lot of tests. Or I’m just lazy.


Ok, we went through all the basic examples. Or imagine if we did, because I presume you can write a simple test. We can write nice checks and tests if we plan to test the code beforehand. So if we didn’t plan for this, it’s all over? We just give up and say “testing is hard”? Hell, no! We waste more of our lives and our time going around the problem! I thought about this for some time (which is sad, I guess) and found a very nice idea that fits in all of this. Let’s say you have a web application written in Haskell. You didn’t test a line of your code because you are cool and Haskell doesn’t need tests. What would you even test? The checkout? Ha, the worst things that can happen there is… well, it’s… money… actually… maybe I should have tested that…

We have these two very simple REST API calls (let’s say we banned payment since we don’t want to go bankrupt):

The easiest way we can use some integration tests is to run the web application and write some tests that perform some checks. How would we write those tests?


User “Martin” adds an item “sleeping pills” to the cart
1. User “Martin” adds an item “rope” to the cart
2. User “Martin” adds an item “big bag” to the cart
3. User “Martin” adds an item “shovel” to the cart


1. User “Martin” should have some sleeping pills, a rope, a big bag and a shovel in his cart.
The result should be correct and no alarms should go off. For the computer part, at least, if a user bought these items all my alarms would be going off.

But what are we testing exactly? We are testing how a specific sequence of inputs behaves. We want to use more general checks which will ensure our code does the right thing. We want to say less and do more. And how do we do that? We write more general code, of course. We want to be able to express something like “when a user purchases N items we should see all of them, in the correct order in the cart”. Well, we can do that! That’s property based testing – using QuickCheck. But how do we do that with our whole application? How do we test all the code, the whole thing running?


Let’s step back. What is happening here? No, really. Who am I? Where am I? And what is this? Ah, I’m buying some things on the web.

If I add “rope” and “big bag”, I should have a “rope” and a “big bag” in my cart. Wait a minute. That sounded like a check. That is a check! If only I could generalize this. But I can. I can create arbitrary items and send them to the server and check if they are there. Brilliant. But wait. Real users don’t just order 5 items at once and then checkout. They might do some other (weird and totally unpredictable) actions before or after. They might delete an item, add the same one again and delete it again! That’s the reason they are called users and you are called a developer – they don’t know what they can’t do – they are free and you are confined within the boundaries of your mind. There is no spoon.

If only I could track the items between requests on my end. I can do that – I can track the state on the client side and modify my requests based on that. This is still not the “real” user, I want to test how the system behaves when we have a “real” user and maybe even better, many “real” users simultaneously. And that’s the final piece of the puzzle – we randomize user actions as distributions of our “real” users.

Which means, simplified if a “real” user is:

  • creating a new cart 10% of the time                                                                                                   
  • adding an item 50% of the time                                                                                                                                       
  • removing it 30% of the time                                                                                                                                           
  • and getting the items of the cart 5% of the time                                                                                               
  • checking out 5% of the time

We have a distribution. We have the users “action distribution” which we can use to drive the actual requests, and we know what is allowed and what is not since we see what the users see – we have the state on the client side.

This way, we can test both the things we know should work and the things that shouldn’t! Which is also important but rarely tested. If you want to connect that to some mathematical concept, you can use Markov chains to do that. An important part that we left out is the fact that some of the actions are dependent on each other. For example, we can’t add an item if we don’t have a new cart, and we are getting a new cart just 10% of the time, which means 9/10 times we won’t be able to do anything but retry. How you do the retrying logic is up to you, using guard seems like a nice abstraction in Haskell, and a plain if … then … will suffice. How does that look like?

Action distribution

Notice that we displayed just the valid actions, you can always test that the checkout is impossible if you don’t have a cart. So, now you should be able to test like this:

  • given a arbitrary action – “add an item”
  • with a current user state – a “rope” item in a cart
  • produce a arbitrary request – add a “big bag” and “shovel”
  • check it’s validity – we should now have a “rope”, “big bag” and a “shovel”

So each action should have a check for the result of the action. Also, each action has a precondition when should it be valid to execute – like the example, “add an item” is not valid if we don’t have a cart.


So an action has:

  • precondition
  • request/interaction with the server/application
  • postcondition


The real beauty of this approach is that you will explore completely random state spaces of your application, check for all the constraints you write down, and you can do that concurrently!

As a bonus, you can define several user profiles with different action distribution, which will allow you to focus on specific actions. That can be useful if, for example, you recently changed the checkout logic and you want to test it in more depth. You simply switch from your regular user profile to one that does checking out 50% of the time and modify the rest of the actions accordingly and you are good to go.

Well, that should be it, thanks for reading this, hope it helps you out. And apologies to you Martin, it’s just a random name I picked 😛