Hello everyone!

A recent article we released dealt with analyzing the results of digital advertising and comparing them using the Google Analytics API data. The article gives a good general idea of what can be done with automation (a lot of it). 

In the spirit of the paranoid people that this corona crisis has thrown ashore to publicly comment on all possible articles on the internet, in this post, we will talk about the loss of privacy and technology that already allows people to be tracked 24 hours a day, 7 days a week.

Yes, it wasn’t enough that 5G antennas cause the coronavirus, that the coronavirus has traces of the AIDS virus in it, and that Bill Gates doesn’t have something better to do than to chip everyone and… I don’t know anymore. It seems to me that there are people who are just looking for articles on which they have not yet left a comment and without reading anything in the article, they start talking about how armed robots will soon control the world. There is this one guy who is obsessed with guns and robots and in every comment, regardless of the content of the article, he has to share it with everyone.

These ideas have become so bizarre, that my mind just isn’t able to handle it.

People, I understand that you are afraid and trying to rationalize the situation. You cannot rationalize it because the situation is terrible, strange, and incomprehensible. So you’re trying to explain the world in such a way that it still seems you have control and are in a way above it all. It’s OK. I’m scared too. I also feel as if I have no control. As if the world is not a safe place and as if everything I know is slowly disappearing. I’m not going to say I’ve come to terms with that, but I know enough to understand that nature is more complicated than we can even imagine. Ideas about what is happening are not even a small part of the truth, because that truth is incomprehensible, and reality is stranger than you can imagine.

Recognizing people and their movements around the city is our topic for today! From my point of view, this topic is scarier than 5G antennas, chipping, and all sorts of nonsense that you keep blabbering about.

The soundtrack for this article.


ML (Machine Learning) has taken off in recent years and today it is possible to create “artificial intelligence” that can recognize people via cameras with relatively little effort.

Today I will show the result of just that, where the model will recognize people via public cameras that you can find on livecamcroatia. These are public cameras available to everyone on the internet and we were permitted to use them, as we only filmed people who agreed to it. If you happen to be somewhere on video, we can “blur” you from the video. But since these are public streams available to everyone, we don’t break any laws if we only record people who have agreed. And trust me, those laws aren’t exactly simple so we only filmed ourselves.

The following posts could explain the code itself below and show a complete project made in Keras, as well as face (and identity) detection of people.

The goal of this is the same as with other posts, to show people the possibilities of today’s technology. This does not mean that someone is following you through the cameras, because, probably, no one cares about you (nor about me), but it is possible to do.

If I wanted to, I would connect to the city’s public cameras and launch a more elaborate program that would detect your faces, emotions, and interactions. I would follow your movements and your habits. I would analyze your internet profiles and detect your personality type. I would find your preferences and sell you just what you want.

But I have no use for it. Which doesn’t mean someone else can’t do it. As I said, the reality is a lot scarier than we can imagine.

If you are not familliar with it, I recommend that you take a look and read a little about what is happening in China and how they solve problems with the society and journalists’ control (real journalists, not what is called journalism today).

What scares people is that someone will follow them all the time and have all the information about them. It’s not that scary to me if it applies to everyone equally.

I don’t have much to hide, and I would immediately sign that the same measures apply to me as to the politicians. If that were the case, how many politicians do you think would “fall” and would the world be better or worse in the end? These possibilities of technology are neither good nor bad, they are just tools that bring a lot of power into the hands of those who control them, the only question is how they will be used in the end, that’s all.

The problem is, of course, that people who have access to this data also have control over such programs, so the problem becomes that the power of information is not evenly distributed. These are difficult problems, but it is interesting to see what happens when one government tries to do something that is “equal“.

Neural networks and object recognition architectures

We will use neural networks for such tasks. Neural networks are a fairly powerful way for the system to learn to recognize images, text, sound, and other media, also called “unstructured data.” They are so-called because… Well, they have no structure.

The structure exists within this data, but it is not so clear. It is clear to humans because humans are incredibly intelligent and can see structures (patterns, but let’s not nitpick) in things that go beyond the perception of other beings (to some extent).

There is a particularly deep connection between humans and language, which makes us incredibly rich conscious beings. Imagine, people can see the invisible language of mathematics and numbers by looking at objects in the world around us and based on that predict the behavior of those same objects.

Interpretation of languages and the search for the basic elements of computers, machines that perform the tasks we tell them (using “languages”) has led to an idea of electric computers and programming languages, precise languages to describe operations in order/time and define unambiguous meaning.

Although the language seems to be a huge difference between us and animals, when we say language, we should not limit ourselves to the sounds we produce with our voice and the words we write (which are not sound, but visuals we interpret in a “sound” way).

People who are deaf (and mute) can still communicate, even though they may have never heard sounds. Language is then something much deeper than just a single medium of communication and stands as the pinnacle of human evolution. Language seems to be the way we “understand” this world, and we got those opportunities because we have incredible brain capacity.

What I want to describe here is how rich our inner worlds are and how many things we see when we look at a particular picture. We take it for granted because it is how we function our whole life, but the blessing we have transcends human understanding.
For the same reason, it is very difficult for a computer to understand a single image and find meaning in that image. The computer does very specific and unambiguous things that we tell it and does not “understand” anything but that.

That is why neural networks, algorithms (small programs), imitate how the human brain works, which is based on the communication of neurons. That is why they are called neural networks. Because these are networks of artificial neurons that we have “translated” into the world of programs with a programming language (see, language again).

We do not program them precisely and specifically, but we give them pictures to learn from, similar to how people learn. Machine Learning (ML) is a field that does just that, trying to use a certain group of algorithms to get to a computer that learns using examples (data).

From the image of the cat, we expect to get the result that the ML algorithm recognized the cat, but not that we necessarily have to say what a cat is. Because once you try to come up with an answer to the question “what is a cat really?” in a very precise way, human capabilities start to collapse. As great as our understanding is, the easiest questions are the ones that have the hardest answers. We take them for granted and base our understanding of the world on them. So our understanding of the world is nothing but one big illusion, even though it sometimes seems so real.

In a more practical view of things, we can conclude that we need some function that turns an image into what is in that image, without knowing how that function works. This is one of the current problems of neural networks and more complex ML algorithms, because even when we know the answer to the question “is it a cat?”, We don’t know how we came to that answer. That answer is, to say the least, useful if that same algorithm drives you down the highway at 200 km / h and a slight shift of the steering wheel can lead to death.

More generally, we can say that neural networks are approximations of (very complex) functions that we could not easily describe using human “manual” algorithms.

The type of neural network used in image recognition is called CNN (Convolutional Neural Network). Convolutions are not convulsions, although they sound similar, but mathematical operations that are done on images, to reduce the amount of computing power required and get similar image recognition results, but with weaker computers.


The project you can use to run this yourself is here.

Either that, or you can make this architecture yourself in Keras / TF 2.0 or PyTorch.

There are many different architectures that you can use to identify objects. But before we get into the list of these architectures and what they are for, we can look at different types of object recognition.
We can recognize objects via bounding boxes (the rectangle surrounding the object).

Also, we can recognize objects by segmenting (coloring) them. There are two examples of segmentation, “semantic segmentation” and “instance segmentation”. Semantic segmentation does not distinguish more of the same objects in an image as different, while instance segmentation does this – if there are multiple people in the image, instance segmentation recognizes it.

To give some concrete example, here is a picture of me after too much coffee and too many attempts to stay awake:

if we pull it through MaskRCNN, we can see both examples of image tagging, bounding box, and instance segmentation:

As we can see, the network is 99.9% sure that this picture contains a person. I’m also almost sure it’s the person in the picture. But since I know myself, I have my doubts.

The image around it has that rectangle that is a bounding box and we have a more precisely colored face which is an example of segmentation.

The R-CNN mask is based on the “Faster R-CNN” architecture, which returns these rectangles around the object that is detected.

“Faster R-CNN” first uses CNN to extract interesting image features.

These features are then passed to a single network called the Region Proposal Network (RPN) which returns a list of candidate objects with rectangles. Finally, these candidates are sent to the classification and output rectangle layer for objects.

The story behind it is much broader than this and we will leave it for another time when we can go into a little more detail. I know that people are already rolling their eyes and that they are not interested in too many details.

Video analysis

In the video you can see two fools (Matija and me) walking through the forum in Pula, the busiest part of the city. It was filmed the forum because a lot of people pass through it daily. It took us 40 minutes to capture this clip precisely because people are constantly passing by.

The rectangles in which people (Matija and I) are recognized are colored green, and above these rectangles is the percentage of how sure the neural network is that it’s a person.

You will see that the recognition is not ideal and that there are moments when that rectangle disappears. This happens for several reasons. One of the reasons is that an object recognition certainty higher than 90% is required. In other words, if the neural network is not more than 90% sure that a person is there, it will not mark it as a person. Also, the resolution of the video would ideally be higher, and we are quite far from the cameras in the footage (we are tiny). There is yet another reason, and that is special training and fine-tuning the neural network to solely recognize people.

We received the link and consent to use the videos from:



We thank you for your help.


future is closer than we can imagine.

Power is not as raw as people think, real power is elegant and subtle. It is not the one who screams the loudest that matters, but the one who says something smart. So it is with this technology as well – most ideas about how it is used are simply primitive.

Robots armed with rifles are not needed if I know how to change your mind. 5G towers are not as terrible as the possibility that someone will be able to stream any video from any city at any time and that will allow for real-time tracking without pauses or brakes.

Virus manipulation is not necessary if I can manipulate your lives, and I don’t need any chip to track you everywhere. Also, let me remind you that you voluntarily carry the tracking device in your pocket and use it to watch cat videos and spread your theories on Facebook.

Wake up, people, from the stupid dreams you have and take a look at the new today, which is richer and scarier than any of your primitive ideas about reality.

Wake up and stop blabbering on like crazy. Armed robots, chipping, 5G towers, and virus manipulation are primitive ideas. The real power is in the hands of those who possess the information!



P.S. If you are interested in more about how real-time recognition behaves and what the challenges are, we touched on this with the project of an autonomous vehicle “Mini Tesla”.