Today we are going to look into inventory prediction, while the future posts will probably touch upon the subject of price elasticity, which in conjunction with knowing how many items you are going to sell is going to be a helpful model for anybody who is selling physical goods. For anyone who requires price changes to maximize profit and wants to keep some sort of inventory.

Intro

Today, in the world of Quantum computers (Google just released a Tensorflow Quantum) and very capable AI/ML models, it seems that there is a huge divide between the researchers and the rest of the IT users.

Seems like the researchers are busy pondering (very fundamental and difficult) questions like interpretability and casualty, while the rest of the IT users, primarily the companies that don’t know a lot about using “AI” are stuck with looking at models that can distinguish cats vs dogs. It’s a cat. Move on.

Very few of the end-users actually understand the impact of this technology and how it can do a lot for them without being an expert in it. While I certainly don’t advocate that you should jump from the classification of your pets to trying to cure cancer, it would surely help if people start understanding the practical benefits of this technology and how it can be applied to your business.

Not everything interesting in AI/ML has to do with deep learning (which is a very silly name anyway). My own interest points me in the direction of probabilistic programming, and while there is still a ton of stuff for me to learn, I think that field is essentially going to dominate ML since that kind of reasoning is exactly what we need from a machine, reasoning under uncertainty.

This post is going to use an existing dataset that can be found here and there will be some code snippets provided for the more “technically inclined”.

 Let’s start with the most basic concept, time series.

Time series

Time series is simply a list of some values in time. If you can imagine a list of some numbers that occurred/changed over time, that is a time series.

Looking into the temperature starting from today (12.03.2020), here in Pula, where I live, we can get a list of temperatures:

Day Thu Fri Sat Sun Mon Tue Wed Thu
Temp. 16 °C 14 °C 13 °C 12 °C 13 °C 15 °C 16 °C 16 °C

So, it’s a list of values in time. That is a time series. If we want to just look at the values, these are:

[16, 14, 13, 12, 13, 15, 16, 16]

And if we want to graph this, we can do that using Python:

  1. import seaborn as sns
  2.  
  3. values = [16, 14, 13, 12, 13, 15, 16, 16]
  4. g = sns.lineplot(x=range(0, len(values)), y=values, markers=True, dashes=True)
  5. g.set(ylim=(0, None))

On the X-axis we have days in a week and on the Y-axis we have the temperature. We have a two-dimensional graph, which displays exactly the table above.

What are other examples of time series? How about any values that change over time. Like stock exchange prices, prices of gas, the number of people in stores depending on the time of the day, the number of cars on the roads…

So what?

Well, you can take the values you have and you can then let the ML model “lean” the values in the future. For example, if we take temperature readings for the last 20 years, the model will be pretty accurate as to what is the weather going to be tomorrow.

It doesn’t “see” into the future, but it can learn that on some periods the temperature was lower and when we predict that “period” we can get the correct prediction. 

The same thing works for somebody working in a store. If they work long enough, they eventually know that because today is Friday, people are going to come into the store and buy more booze (or something else).

That is how ML works. It learns the data and it can then use what it learned in new situations.

What does that bring me?

If you know what people buy often and in what periods of time, you can adjust your whole selling strategy over that. 

For example, if you were a large general store and the ML model shows you that you sell a ton of candles in February (Valentine’s day and stuff, I’m making this up as I go), then you can start to notice:

  • We might need to restock on specific candles, those go quickly in February
  • We might try to buy them in advance when they are cheaper and buy them in larger quantities beforehand, keep them in stock until February
  • We might try to sell more candles + flowers packages, upsell
  • We might modify our price so we maximize the profit (I mentioned this in the plan for the future blog post, the subject is price elasticity)

 While one single strategy might not seem like much, if you have all these ideas implemented, suddenly your February profit goes up. 

And all of these are not terribly difficult, you just need somebody who knows what he is doing and can show you how this works.

Did I mention that we do this for free for a single store? Yep, you can try this out for free, given you have other stores you might be using this in, just drop us an email.

Data used

So the data we used here is, as we mentioned, Walmart Recruiting – Store Sales Forecasting and is the dataset that contains historical sales data from 45 Walmart stores. 

In the dataset, we have 45 stores, each of those stores has its own departments and the last thing we worry about is the “Weekly_Sales” data. Store each has its type.

The dataset contains three years of data, and we will try to predict the next 180 days of sales in a department of a specific store. Using a history of the previous two years is not really going to get us some very precise results when trying to predict the next 180 days, but it will be a nice way for us to show how it works.

We load the data using Pandas.

The kind of work we do is typical in a “ML lifecycle”:

  • Gather data
  • Transform and clean
  • Explore
  • Analyze & Build Models
  • Communicate results

 We have gathered the data, which is quite a feat unto itself. The “transform and clean” is somewhat simpler since we already have things formatted the way we need to.

Onto the exploration next!

Data visualized

So we can now load up the data and see the sales based on weeks for all departments and all stores.

We can do that for example, for “Store 1”:

For “Store 2”:

And for “Store 3”:

And so on…

After the initial visualization, we can observe some things. First of all, it seems not all stores do equally well. Yeah, it’s obvious now when you visualize it. Second of all, it seems that there is a huge jump between some of the points we have. Most of them are in the bottom part where the weekly sales are not that significant, but some of them are above with a huge gap in between.

Now, the hypothesis is that that gap is simply a jump from one department to the next one. For example, we could be seeing electronic departments vs some more general departments.

Let’s take a look at “Store 3” and see how it does for a specific department. But first of all, let’s see a box plot.

When we try to select all the data points above 60.000 and then group them by the department we get two departments:

  • 38
  • 72 

So just two of these departments are causing this discrepancy in the data, since the jump out from the rest of the departments. If we try to visualize the actual department 38, which is by far the most successful (judging by weekly sales), we get:

The actual amount of deviation we see in the data is due to the spreading out of the weekly sales with regard to years.

Data learned

So let’s now step into the actual ML algorithm and the prediction. Let’s use these three stores above and try to see how much we can predict in the future with it.

There are a lot of choices here.

People generally start with something simpler like ARIMA and SARIMA and move from there. We are not going to jump into using deep learning and LSTM, even though that might be an interesting topic for people interested in the subject.

First of all, let’s use something simple we don’t have to do a lot of work on.

We are going to use Prophet. It’s simple and works nicely out of the box. I first saw it used in a nice local Meetup event on Istra Programming Languages.

The alternatives are the more advanced STS modeling in tensorflow-probability or use something like probabilistic programming directly (pymc3 or the aforementioned tensorflow-probability).

In any case, all of these work similarly. They use something called “generalized additive models” which means that they have different models for modeling trends, seasonal series, holiday effects, corona (yes, things like that do affect the reality and thus, the model).

y(t) = Trend(t) + Seasonality(t) + Holiday(t) + Noise(t) + …

If we run it on “Store 3” and “Dept 38”, we can see the following with the next 180 days:

The points we see are the actual data points we have. The darker blue line is the best line fit that the model found, while the whole lighter blue area is the area the model is more than 80% sure the result will end up.

We might not want to have perfect information, but information that we can be sure of. It doesn’t have to be very precise in order for us to draw out some conclusions we might use to increase our profits.

For example, if we want to be 95% sure, we can trade off some of the precision for certainty and get something like this:

We can see that the uncertainty about the endpoint is higher, but we can now be more sure that it will lend in the certainty area. It’s all about tradeoffs. And what do we get for the additive model output?

Seems like the trend is going downward, which might suggest that the store is losing some customers. Or that a financial crisis is looming. We could model that. After all, we have the additive model to add, right?

Let’s look into some other stores. How about department 72?

Wow, nice! Seems like it’s going up. Let’s see what might be causing it.

The trend is going up! It might be that this is becoming a more popular store. How could you rate that? Well, you could simply entice users to fill in a survey and get them some points for that. Again, you could do that periodically and find out what is causing the difference.

 How about department 1?

Could we predict the sales of all our three stores mentioned above? Sure, let’s see it, but just for the “Dept 1”.

Ouch. Not that nice, but we get some results in the end. Let’s see the components:

Again, we see some uncertainty in the trend, and the uncertainty is rising the more we “go into the future”, which is natural since the more we try to see into the future, the less certain that future is.

 Conclusion

 First of all, I should have probably used Jupyter Notebook for all of this. I probably will in the future, this was sort of a “nevermind, it will be quicker this way”.

Second of all, we have predicted the sales of stores and their departments. 

Yay! 

It wasn’t as fun as you hoped it would be, right? 

That’s because all this stuff isn’t some dark magic, but a bunch of big words that sound scary (that is sort of my definition of IT – it’s all “cool” until you realize how it works). 

There is legitimate complexity in these models, and if I were to build one from scratch it would take me quite some time and the blog post would become huge, but to jumpstart the actual work, we don’t need to dig that deep.

But it’s still a pain in the ass to write all of this.

Happy learning, folks, enjoy!