Hi, folks.


The last article I wrote was about an application for recognizing parking spaces with the help of artificial intelligence – https://exact-byte.com/recognizing-parking-spots-with-artificial-intelligence/


The program I’m talking about today was developed for free and is used to automatically detect and notify the responsible persons in the event of a heating system failure, before the actual heating system shuts down. In other words, when the capacity of the heating system in the hotel drops and the system does not work at an optimal level thus leaving people with cold water.


I offer my program and my time for free to anyone who has data with which they could use this program, and the application is in automation of any kind.


The program recognizes when the system capacity drops and informs the responsible persons that the system has stopped working optimally and that intervention is needed. An algorithm that recognizes this is a general algorithm and, without additional expert intervention, can recognize when the system stops behaving optimally and intervene before it stops working.


A program that does this uses artificial intelligence and “learns” the behavior of the system on it’s own. Once a program learns from a large amount of data, it learns to recognize which behavior is “normal” and which is not, and with that behavior returns the likelihood that the system is behaving “normal.” In other words, it not only reports that the system is behaving abnormally, but also reports that the system is behaving abnormally with a probability of 90% (9 out of 10 times this is an error) or 80% (8 out of 10 times this is an error).


The program that was created learned from the data of one heating system (hotel) that has central heating and rooms which require hot water (for showers). The program was able to recognize the decline in the capacity of the heating system and clearly show when the system stops working optimally.


But first, let’s start with a concrete example.

For all interested, the video presentation below is mine and is on the same topic being maybe somewhat easier to follow.

Unfortunately, the video presentation below is in Croatian, so unless you are really eager to learn a foreign language that is difficult, do follow the article.

What problem are we solving?

The problem we are solving can be shown in the picture below.

We have one boiler that we heat up to a certain temperature, S0. This boiler is a central part of central heating as it serves to transfer heat to other rooms. The other rooms contain showers, and, hopefully, people who need hot water.


This central boiler contains water heated at a temperature S0 that is pulled by electric motors to other rooms. It comes to room 1 with the help of an electric motor PWM1 (Pulse With Modulation), that brings hot water to the shower temperature S1. In other words, each room has its own electric motor, a sensor for the power of the electric motor and a sensor for the temperature of the shower. Room 2 has a PWM2 electric motor power and a S2 shower temperature. Room 3 has a PWM3 electric motor power and a S3 shower temperature. And so on. All the way to room 16, as much as one such system serves a room with one boiler.

This is a typical example of a system that works automatically. The temperature on the boiler is set and the system maintains that temperature. The electric motor automatically starts to distribute that hot water when the shower temperature in each room drops. Thus, the system automatically tries to maintain a comfortable temperature that comes out of the shower so that hotel guests can take a shower with hot water.


Of course, the system, like any other, is made up of individual parts. Each part of that system works for a certain amount of time and after that it starts to degrade. As parts of the system stop working as well as they did in the beginning, the system itself stops working as well as it did in the beginning.


Of course, system degradation is mostly progressive, as parts of large systems are made to be robust, fault-tolerant. Therefore, if we have enough data to see how the system degrades, using sensors, we can also predict at what time the failure will ensue.

Example of data

The data we received is the data that came directly from the hotel facility running for 4 years. An example of such data can be seen in the figure below.

We can see that we have time (date and time) and measurements for room 1. So we have the boiler temperature, S0, shower temperature, S1, and we have the permeability of the electric motor PWM1 which regulates the amount of hot water from the boiler to the shower room number 1.


We can see in the example above that only at 15:42 the electric motor starts to work slowly since the temperature we maintain in the showers is 55 ° C, and the system tries to maintain that temperature. It doesn’t matter if the temperature in the boiler drops or the temperature in the shower starts to drop, the system tries to keep the shower temperature at 55 ° C.


Of course, it is possible for limescale to form or for the heaters to lose some capacity, just as it is possible for the ambient temperature to drop, making the system, however, still keep trying to keep the temperature at a given 55 ° C.


Let’s look at the same data on the graph with the time axis.

An example of the data above is from 2017 and displays the beginning of the season on April 29, 2017. until 23.07.2017. On the X axis, this left-right, stands time, on the left stands the past, on the right the future. On the Y axis, this up and down, stands the room temperature 1, S1, which visibly varies. It changes over time. Color represents the third dimension of this graph, which is the boiler temperature. The temperature of the boiler is visibly marked in color on the right side of the graph, and represents a lower temperature in darker and a higher temperature in lighter color.

With this we can see all the temperature readings for room 1, S0 boiler / boiler temperature and S1, room shower temperature.

What can we improve in the existing system?

What we can improve is the quality of service / work, because we can intervene before the failure and complete shutdown of the system. Before the error, the program informs us that the capacity of the system has dropped and that it is necessary to intervene.

Also, we can save time, which is a pretty important factor, especially if parts of the system happen to stop working at the same time. We can buy spare parts ahead of time or hire technicians to be on standby as the system stops performing as well as it did.

And of course, all this translates into something tangible to everyone – money.

What are the possible implementations of the system?

A program that can automatically report when the system is no longer behaving properly can be implemented in a number of ways, of course.

What we are interested in is the general algorithm, which can, without details about the system itself, estimate when the system will stop working properly. In this case, the algorithm “learns” on it’s own when the shower temperature is no longer good, and can inform people that an error has occurred.


Also, it is necessary to have an algorithm that can predict the future based on the past, and such an algorithm belongs to the group of algorithms for time series.


We want this algorithm to be able to recognize the anomaly on its own without additional human intervention. But again, we want to have the opportunity to add the knowledge of a human expert to the problem and that is not difficult to do. So we want the algorithm to make the prediction of the anomaly itself, and then we can further refine that prediction with the knowledge of a human expert – in this case a technician.


What we want is not just the result. We do not want am algorithm that can tell whether a system is working or not, but also how likely it is that the system is working or not. We want to get a result of whether there was an error in the system with a probability of 90% or 80%, and thus proportionally let our superiors know how serious the situation is.


The various implementations that I can mention here that can be used are:

  • VAR
  • Autoencoders
  • LSTM


There are many choices and algorithms we can choose, but none of those listed above meet all of the criteria we listed before.


So in the end I decided to use probabilistic programming as a branch of artificial intelligence, and implemented it in PyMC3 using the Gaussian process of additive models – Generalized additive model (GAM).

But before we get into (some) details, let’s focus a little on the very idea behind probabilistic programming.

Probabilistic programming

What is important to keep in mind here is that the key difference between this branch and others in ML is that each result is a probability, not just a number.


In other words, if we ask the program what the temperature will be in an hour with the data we have, we won’t only get a result of 30 °C, but also a probability of 84% that the temperature will be 30 °C. The key idea behind this is the term “reasoning under uncertainty”. One of the winners of the Turing Prize (Nobel Prize in Informatics) for artificial intelligence in his research focused on exactly this and talks about how important it is in the world of artificial intelligence. I would agree with him. It’s nice to have a super fancy large LSTM neural network architecture, but we don’t see the probability estimate of the network itself to give us an answer. Of course, we can always retrieve something similar, but it is not always necessary to use a neural network in every problem. Every problem requires its own tool, and the less complex the algorithm is, the more we can interpret it.


It is not only important whether the temperature will be 30 °C, but also how safe the system is at predicting that. Here we are talking about numerical data, meaning regression, we are not talking about classification. In, say, neural networks, it is possible to use the softmax activation function to obtain the probabilities of a particular class in the classification.

If we look at, say, a linear regression over existing hotel data, it is possible to see not only shower temperature measurements, which represent the Y axis, but also time points, which represent the X axis.


Each of these X measurement is one measurement in time. This yellow line is a linear regression obtained by “pure calculation” and is the only “correct” linear regression. But, as we have said, we are not only looking for an answer, but also certainty in the answer itself.


If we use linear regression line prediction algorithms (we will not go into distributions now), then we can see these black lines. Each of these black lines is a potential linear regression. This is how algorithms in this branch work. They do not give one correct answer, but give more answers (distribution of answers, yes) and thus, the probability that a particular answer will be more or less correct. Let’s take all these black lines and we can form a safety interval with them – the denser the lines, the more frequent they are, the more likely the answer is there. The rarer they are, the less likely the answer is to be found there.

What is the result?

And what is the end result of all this? Well, a system that managed to detect an anomaly at the time the sensors reported the information. If we look at the graph, it is possible to see black dots. These black dots are shower temperature measurements and show a smaller subset of data. We have measurements every minute for the last 4 years, and that would be really too much data to show on the graph. What we can show is a smaller subset of the data.


What is shaded green area? The safety interval that serves to encompass the regions that the program thinks occurred correctly in 99% of cases. What’s outside of it? Rare events that happen very rarely (if we’re talking about distributions, less than 2 sigma).


These yellow dots are anomalies and we can see the first one appearing there sometime towards the end of 2018, near the end of the season. The temperature was suddenly at a high 60+ °C, while the temperatures (that season) were between 40-55 °C.

I agree with the system, this is weird.


To show the real “magic” let’s look at 2020 and the season and how the program immediately intervened when it found strange measurements. Observing the yellow colored dots, temperature measurements between 20 °C and 30 °C. The program predicted exactly this as an anomaly, and had the technician had this program, he could have intervened before the guest himself noticed the anomaly and opened the shower with cold water.


In other words, the program correctly finds anomalies, even on a smaller subset of data. If we start to include more data, the program will be more and more precise, and if we include domain knowledge (knowledge of experts / technicians), then the program will become even more precise.


So why not have a program like this in your systems that could let you know when the system loses capacity and starts to malfunction?


I offer my program and my time for free to anyone who has data that his could be used with, and it could be used with automation of any kind.