Original article found here.


This post (series) will be about testing. About testing Git and seeing if its behavior holds and if we can find any bugs. This was conceived as a series of blog posts, in which I will, over time, demonstrate a complete (core) behavior of Git and (hopefully) verify that it’s working. The tests will be written in Haskell using property-based tests (QuickCheck/Hedgehog), but this kind of testing is possible in other programming languages as well. Also, I will test the behavior of Git using just the git command and it’s output to verify it works – this approach is a real “black box” testing method. I will not go as deep as binding to and test the C code underneath, while is some other applications that would be possible.

Let’s start! Yay, fun. When we take a look at what the scientific method (https://en.wikipedia.org/wiki/Scientific_method) is, if you take a look under “Testing”, you may find “This (Testing) is an investigation of whether the real world behaves as predicted by the hypothesis. Scientists (and other people) test hypotheses by conducting experiments. The purpose of an experiment is to determine whether observations of the real world agree with or conflict with the predictions derived from a hypothesis.”

So, the behavior of our program is a hypothesis and we write tests (our experiments) to investigate whether it actually behaves like that in the “real world”. We can ignore that we developer don’t really live in the “real world”, and it’s something that’s bothering me from time to time, as I think that the exposure to this amount of virtual concepts eventually buries you deep away from the “real world”, but we can presume that the “real world” just means – the way the application behaves when the user is using it/interacting with it. Without tests, we can just cross our fingers and hope it works like we imagined. Most of the time there is a bug lurking there, but we rarely find it.

Why Git? I was trying to find a popular application which I could test. And I don’t mean test like – run the application with different parameters and see if it works. git add -A followed by git commit. It works! Next!
I mean, set up a test which can detect if there are any inconsistencies in the application itself. Git seems like a popular application that most of us use in our everyday work, and it’s something that’s not too complex to test (I hope, we will see). And no, and I don’t know why the insert name of your favorite VCS application isn’t more popular, and it’s not my fault – the point of this series is to use something that’s documented, easily accessible and popular, so everyone can get a glimpse how this is done.

The plan is to start with some basic model, a simple specification that describes how Git works, and add more and more details as we go along. The plan is to enrich the model so that it contains the description of how Git actually works and in the process, we can discover if the model is the correct representation of Git or Git is buggy. Do note, I will be using some existing libraries, but I will introduce them slowly, so you should be able to pick up what I’m talking about.

Anyway, once we have enough infrastructure in place, I will try to find an existing bug in a specific version of Git. Don’t worry, you will be informed and you will get your chance to vote for your favorite Git bug you want me to find.
I’m kind of hoping that I will be able to find a bug and demonstrate the effectiveness of this approach to you.
This kind of testing was used in many different places, but I have to acknowledge people who created QuickCheck – the library that was the first one to use this approach (as far as I know?). I also acknowledge all the other folk responsible for this approach and other testing approaches – thank you guys, you saved people and millions of $$ – metaphorically and otherwise.

Why even test?

I didn’t always like testing. I thought that testing is something difficult and awkward and it was something done by the elite programmers only. That was almost 10 years ago when I started working with Java. I remember the first project that I tried to test – it was horrible. I started with JUnit, as you do, and I had a feeling that it was nearly impossible to test my code. That’s because I tried to test my application when it was already finished!
Over time, I learned to use Mockito and I still felt that it was impossible to test most of the code. Then I discovered PowerMock which is a library that enables you to, for example, mock private methods. Then I switched to JMockit. Then things started to fall into place, bit by bit. I was testing using a combination of the aforementioned libraries and I felt that, even though I was able to test my code, it was somehow “unnatural”. Kind of hard to test, kind of hard to change once you would write tests, brittle.
Finally, I found Guice and started using it about 7 years ago and started to understand Dependency Injection. Little by little, things fell into place. I’m not going to lie, it was hard figuring these things out and I learned them all by myself. No mentors, no co-workers to help, google and work.

Over time, I realized that using anything besides JUnit is an overkill. You don’t need all these libraries if you know how to write good code. And writing good code means you need to know how to write code that you can test (seems paradoxical, I know). The design pattern – they almost fall out of DI if you use it. If you can test it easily, it’s good code. No discussion. You don’t need Mockito if you have interfaces in place. In fact, if you don’t use Mockito, that will force you to have small, understandable interfaces that you can stub out anytime.

Forget all the things about SOLID principles, about UML diagrams, about application architecture – cohesion, code coupling, any kind of possible metric when talking about nice code. The amount of bullshit going around is staggering, and you have to keep your balance not to fall in there. There is a simpler metric, let me repeat it once again – if your application can be easily tested, it’s good enough. If it can’t be, I don’t care what principles you used, I know it’s not good code. I don’t care how smart you are – the underlying principle is that easy to understand code, that it’s easy to compose and can be easily tested can be discovered by writing tests 100 times more easily then following and other guidelines.

There is a ton of people who just repeat the popular principles and have no real clue how to write good code. That’s a fact. You want proof – you want to see beautiful Java code – for example, Square. Take a look at their code. At their test code, more specifically. Seems easy, right? Small tests, fast, small interfaces, decoupled. Beautiful. That’s rare.
What would you say the percentage of such code is? 10%? Less? Maybe I’m pessimistic, but I would go for under 5%.

So, why should you test? Other than having an application that runs correctly, is easy to compose, has high cohesion and little code coupling, can grow to be very large without having any issues associated with that? Yeah, that should be plenty.

When I switched to Haskell, I was trying to find a way to easily test an application. It was hard. It’s simple to test a pure function, but testing the impure function is where my focus was. I want to know how that function works in combination with an outside world. Over time I found ways to do that, and I may even write about it in some future post. For now, trust me, it’s quite possible.

What if I didn’t test it?

This post is about taking a finished application, Git, and testing it. That’s also a possible way of testing an application. You can have a finished application and still be able to test it. I don’t recommend it, but it’s possible. The problem with this approach appears in the application that is very large and has a lot of IO communication (it “talks” to a lot of stuff).

In any case, the version of Git I will be using will be:

Basic Git commands

We can find the basic commands here. In this post, we will focus on simple commands. In the following posts, we will focus on the more complicated behavior.
For starters, let’s see the basic ones. We have:
git version
git init
git status
git add [FILENAME]
git commit -m "[SOME_MESSAGE]"

We also have a function to remove a git repository with a simple rm -rf .git. We can use the process library for calling Git. We don’t want anything fancy for now, so a simple readProcess will do. For example, this is how the simple functions look:

Also, at some point we are going to need to interact with the filesystem, so let’s see how we can generate files and filenames. Seems like the directory library might be a good fit for now.
We want to keep directories and filenames fully portable and simple, so for that we picked simple symbols for their names – A–Z a–z 0–9 . _ -. For reference, here is the more about that.

Parsing the response from Git

We need to parse the output from Git somehow so we know what is the result of an operation. For example, the git status returns something like this when we have untracked files:

Or something like this when we add a file (using git add filename.txt):

We need to know what state Git is in so we can test its functionality.

Parsec and how to apply it

For that, we can use Parsec (or some of its derivatives) and parse the results back. For example, if we add parsec as a Cabal dependency and use stack ghci to “drop in” the ghci, we can play around with it. You import the library first. We need the Text module since we are going to parse… integers! No. Text. We are going to parse text.

And then you can start playing around with Parsec and seeing how it works. Let’s start simple.

So, if we parse anyChar and we call it on “ab”, the result is the first char, in this case, char a. Makes sense, no?
If we run it on an empty string, what do we get?

We have an error, there is no anyChar there, the string is empty! You can play around and test some of these yourself by looking at the Hackage documentation on Parsec.

But, we can see that the first line that we get back from git status, On branch master, can tell us what branch we are on. That is an important piece of information and is going to be important to us later!
We can play with that to see what we can extract:

Bingo! So there is a string “On branch ” and after it there are many characters of any value we are interested in. That does sound like what we need. Remember, the initial model doesn’t have to be perfect, we need to iterate until we have a good enough model which can realistically model Git behavior.

But let’s add a full example in the file so we can see if it’s working:

We use unlines to join the strings into a single string back, so we can get the original string – Haskell doesn’t support multi-line text blocks so we work with what we have.

If we run it the whole Git response, we get:

Wow. That’s a lot of text. Seems to me we didn’t include a newline in our parser. Let’s try that:

Seems like Parsec swallowed the rest of our text with anyChar, since a newline is a char as well. What can we do?
Hopefully, we have a solution:

Ah, yes, here we go.
So we can type in our latest version which seems to be working in a function, export the function and test it out in the GHCi.
The function can be:

And we can test it by:

It works! Yaay. Unfortunately, it’s not quite that simple. We have several cases where we have a different response back from Git. For example, these are (some of) the statuses we get back we have to distinguish and parse:
clean – the repo has been initialized and is empty
untracked – there is a new file but it’s not added to the repo, it’s not tracked
tracked – the new file has been added
clean – the file has been committed
unstaged – the file has been changed and was not added
staged – the file was changed and added

For example, git init returns something like:

But when you add an (untracked) file, you get:

But if you commit and add a new (untracked) file:

The status we get back from each of these is different. So much for the Git specification – there isn’t a standard way to return a git status response back to the user. This is the first place where we can see (a very small, but still realistic) inconsistency – the commands don’t have a common response.
While this is probably intentional since they are different where there is Initial commit (nothing was committed to the repository), it still begs the question – if this was such an important output, why don’t we see it on other responses? Why is it important in one case, but not in the other cases? And this is a rhetorical question, I’m really not interested in the possible interpretations why this is working like this – be as it may, it seems inconsistent, that’s all, not saying it’s the end of the world. Yet.
This makes things much harder than it should be and it’s showing that it’s not that clean as I hoped it would be.
This actually gives me hope that I will be able to find a bug. Buahaha. Hahaha. Ha.
Anyway, the state machine (the way state changes) should be something like in the next image.

Git state machine

So, we need to split the parsing tasks into different types of responses we get back. How can we do that? No problem for Parsec. Let’s see if we can define the common portions of the response and we can then merge them back into a single parser that can distinguish between the different types of responses.

The try command we use in the code is telling to Parsec – “try this and if it fails, try the Alternative“.

So we can create parsers for individual parts, and then combine them all into one big parser.
Ignoring the details that you can find in the Git repository, we can see all the choices here:

There might be some performance concerns about the duplication of parsing results, but let’s try to keep things clean and simple here. So we have a status that can return the output from git status command. Sounds to me like we made progress and can now focus on writing simple tests.

Testing simple scenarios

In this section, we will test very simple scenarios. The point of this is to mostly introduce and set-up the test harness so we can change it and add more tests quickly. This section won’t roam outside Quickcheck territory for now and will use hspec, even though I have an idea of what I could use in the future to make this kind of testing more obvious. But let’s take this step by step and see where we end up.
Ok, we need to write a test. What’s the easiest test we can write? Adding a file? No. Too complicated. Initializing the empty repository? Might be good, but is there anything even more simple? Something stupid simple.
Ah, we have a version we can check. Yeah, sounds stupid, but we need to start somewhere and picking the simplest possible starting point is a good way forward. I had plenty of situations where I thought “oh, this test will be simple”. 3 days later I would regret such rash decisions and would be reminded of my limited intelligence.
Let’s see how we can do that.

Very simple test

Since we are going to use IO, which basically means we have means of interacting with Git by running git commands, we might want to use monadicIO to test it out. We open our Spec.hs, add all the Cabal dependencies and imports and add:

Duh. Obvious. Not really. The command we used to test is gitVersion. What is that?

It’s a thing that returns an IO action containing a Text. It’s running git version and returns the output of that command as a Text to us.
My current git version output is:

So naturally, I will crash the tests on machines where that is not the case. I’m joking. It serves to illustrate how we can write a simple test, you can simply uncomment that test, it’s not something you would add to your test suite unless you knew that other versions might not have the same behavior.

In any case, we use run to be able to run an IO action in a test. What’s run?

Erm. It’s something that runs our IO action. After that we simply assert that the version output is correct:

We also have another important part in the code, which is modifyMaxSuccess (const 1000). That simply tells the test to run it 1000 times. Yes, not kidding. I want to be sure that somethįng funny doesn’t happen, we should have the same version for 1000 times.

And there we have it, folks. Our first test is written and explained. Well, kinda explained. We can’t dwell too deeply into the contents, but the basis behind should be clear – we run git version and its result should be git version 2.7.4.

More tests?

Can we add something a bit, more, … useful? Let’s see. What’s the next most simple thing we can test? I would say that git init would be a nice candidate now. But testing it on its own seems kind of… empty. We need to ask ourselves – what are the environmental factors for running git init and what might affect its result?
Well, we always create a new repository in new directories, right? That might be a useful thing to test out – we should always get the same result (which is “success”, also known as Clean state in our model) no matter in which directory we test it out, right?
And also, there isn’t just one directory, right? We can have Arbitrary directories. For anyone not familiar, Arbitrary is a class in QuickCheck which allows you to create arbitrary data out of any data type. The most basic ones are included, but when you think about more complex data types – they always seem to be constructed out of:
– other data types
– primitive types

And all data types are constructed out of primitive types. Which means that we can create an instance of any data type we have.
Let me show you what that looks like for a simple directory:

So we just defined the DirectoryName type with it’s instance. We can check how that works, here are some of the outputs of that:

Ok, so we can have the simple situation to test first, which is to test whether the git init works for a single subdirectory:

So we have a function createDirectoryInitStatus which creates a directory composed of the list of directory names we send and return the git init status back. The result that we get back is Either ParseError GitStatus and it should be a Right, which is a success. Left would be an error.
If it is, git status must return that the repository is in the Clean status, using the isClean function:

Spoiler alert. The test passes. What could be a more complex case? Did I hear you, adding multiple subdirectories in the repository before calling git init? Yes, that’s true.

Well, we get a failed test with that. The problem seems to be uncaught exception: IOException of type InvalidArgument. At the end of the file we see – createDirectory: invalid argument (File name too long). Ah! We hit our boundary! Means either we found a bug or we need to change our model to represent the correct state of the world better.
listOf1 arbitrary ensures we get a list of directories which is not empty. Test otherwise fails. We add a precondition to the test saying, let’s limit the number of directories to 4. How? We use prepre $ length directoryNames < 5. Even when using this precondition, the test fails after 100-200 tries. That simply means we need to go back and change our Arbitrary instance. But wait. We don’t have an Arbitrary instance for multiple DirectoryName, just for a single one. See? The test is telling us that our code requires a type that describes multiple directories. Our test is not just to tell us if something is working, but to warn us if we missed some important details in our model.

When we define that data type:

And it’s instance:

So we encode the rules of the data type in the arbitrary instance, and now we don’t have to add them in our tests. When we run the test:

We get success! This means that we can currently guarantee that as long as the name of the file path is below 100 characters, we can guarantee that it works. Of course, this depends on how deep is your folder. It’s not the same if you have a prefix of 200 characters before. For more details on the file path limit, see here.


This article is getting pretty big, so I decided to stop this first part here. In the next parts, we can continue with different tests, including creating arbitrary files and adding them and seeing how that behaves.
I would say that we did most of the hard work until now, and the rest of series should be more focused on the tests themselves and enriching our model until we can get up to the functionality of the (core) full behavior of Git.

Until next time, and if you have any comments, ask away!