Intro

Hello everyone.

In the last article, Matija wrote a tutorial on how to use Jupyter Notebook. In the example, you can learn to visualize the COVID-19 (Coronavirus) related data.

The COVID-19 recognition article has been popular. It’s mentioned in the Jutarnji list, I have appeared on HRT4, and you may run into it elsewhere in the future. The article was posted long before this media coverage, on 03/21/2020.

Inspiration for the article was found here. We used the same information to arrive at similar results. Afterward, these results began in appearing various “papers” such as COVID-Net. Who “borrowed” the idea form who is not my concern. People may have the same ideas at the same time. It is still strange that the articles did not even reference Adrian Rosebrock. His article was published 15 days before everyone else. It included more than enough information to write those “papers”.

khm React / Redux khm Elm khm… This is a developer reference.

Today we have something different for you because we imagine you had enough of COVID.

For people familiar with ML / Artificial Intelligence, this article will be simple. But it also presents ML’s capabilities today in an easy to understand way. Today I am going to introduce you to one project I am involved in. An autonomous self-driving car!

Awesome, huh? The project has been going on for about 5 months, but first, the credits. This is going to be like a novel where the writer introduces 20 characters in the first 5 pages. I’m Kristijan Šarić. I guess. No, that’s my name. But who am I then?

Who is working on it?

This all started when the three of us started to gather around IoT (Internet of Things) interests. Aleksandar Vojnić (Aco), Roberto Brezac (Roberto) and (me) would meet in Rabac (a small city in Croatia) to play with Arduino components.

The plan was to do an automatic irrigation of plants, digital shepherd and more. These projects are easy (at least basic versions of it). Likewise, it’s easy (and mandatory) to burn some Arduino components.

Rite of passage (display translates “you have burned me”):

The ambitious idea was to make a “mini Tesla”, an autonomous self-driving car. And so we started.

The two official groups behind this project are:
Istria Programming Languages
Udruga za promicanje naprednih tehnologija Start IT
I represent the former while Zvonimir Mandekić represents the latter.

Aco was developing a self-driving car in an association in Rijeka, and they often had workshops in CTK. Roberto assisted Aco with the car, but his primary contribution was the car track and his diplomacy skills that prevented Aco and me from killing each other.

Aco, let me say this publicly, give up on the virtual 3D track.

My task is “autonomous” in “autonomous driving”. Things are not so strictly divided though, and we all consulted each other, but I didn’t meddle into making a car.

What kind of car is that?

The car is awesome. This is the third version of the car. It is powered by a fourth-generation Raspberry Pi 4.

We also have:

  • 10,000 mAh Deltaco POWERBANK
  • Micro servo Sg90 engine
  • The camera is “Raspberry Pi 3 4 Wide Angle 130 Degree 1080P 5M Pixel Camera Module”

We have a powered steering wheel and a stepper motor for the drive. The stepper motor is here to give us more precise control of the car. Aco could go on a rant about why everything is how it is. He tried to make a special controller for stepper and power steering, but Tiny’s memory was full, so it didn’t work. He gave up on that venture (for now). We may still put in a special controller because we do not know how it will run the engine while processing images.
The car can be controlled via a joystick on the mobile if necessary. We figured out that the servo jumps from point to point when we just move the mouse around the screen.

3D car images:

And in reality:

A few more pictures with the battery (we’re missing a mini model to pose in challenging poses in front of the car, maybe we should put a Barbie doll in the picture).

Goodyear tires. Not really a good year, but man, great tires.

How did we capture, what kind of track is it?

The track was created by Roberto, and this is the second (major) version of the track we have.

The first version was made entirely of cardboard. The only thing that survived that version was the traffic signs themselves.

The first sketches of the track. And no, I’m not ashamed to show it. I sketched them, so if you are looking for someone to blame for these wonderful scribbles, here I am. Also, I am available for creating abstract paintings.
Ground plan:

From a car POV:

The brilliant comments say “These are the lines on the floor, not the wall” and “Stick to the wall if it’s easier”

If you think projects get done in one sitting, you’re wrong. There’s a lot of bugs, scribbles, and nonsense until we come up with some usable version. That’s true about everything.

Fortunately, Roberto did a phenomenal job with traffic signs. That was the key – if the signs are good, then everything else will work. Artificial intelligence will be able to recognize the traffic signs properly.

Traffic signs:

The first version of the track made of cardboard.

And one more picture of a cluttered garage and complete improvisation:

Yes, it looks like shit, but it is better to have something bad but usable. We didn’t want to spend months on something perfect that may or may not work. There can always be something unforeseen. As they say in IOHK, the American company I work for, “monotonic progress”. It doesn’t matter how fast we go forward, it matters that we go forward. 5 days with small steps forward is better than 5 days of big steps forward, followed by 3 days of big steps back.

A common illusion is that success comes when people of extraordinary capabilities do things perfectly on the first try. Over time I learned that the most important thing is to keep working and moving on. Make something imperfect, something that serves a purpose, and improve it. And so on. To get rid of perfectionism is a difficult task, but worth the energy and effort.

Better to do something imperfectly than to do nothing flawlessly.” – Robert H. Schuller

Outstanding people of extraordinary capabilities usually have tenacity, patience, and self-delusion. Endless optimism when you convince yourself that the finish line is around the corner. And in reality, you have months of work lined up.

But life comes and goes one way or another, so why not spend it on something interesting to you? Start. Today!

Aco, the virtual track is a great idea; ONCE we are done with the car, but not sooner.

Anyway, Aco made a virtual track:

How did the car learn to recognize the signs?

That’s the part I was in charge of.

Since I was going to go into a little more detail in this article, let me clarify a few things. First of all, people often talk about “artificial intelligence”.

But often they are talking about one subset called ML – machine learning.

Artificial intelligence is a broad field. You can say that everything that has the features of “intelligence” uses artificial intelligence. It’s a bit of a loose definition and it’s questionable of how useful it is. Some stupid things, in the sense that they are very far from how people think, are commonly called “intelligent” today.

Everything is “smart”. Smart TV, smartphone, smart fridge, smart home, smart car. Everything is so clever that one feels stupid. “I choose not to choose life”.

For all the people reading this, to give my opinion – it’s all BS marketing. Your house is not smart because it has 5 sensors and can turn the lights on and off. A young child can do the same, as he solves a puzzle and talks to you.

Do you have a house that solves a puzzle and talks to you? No? Then you don’t have a smart home. You have a stupid home that someone sold you as smart. So who’s smart there?

Machine learning

A more useful definition is the definition of ML – machine learning. The emphasis here is on algorithms (programs) that “learn” from the data.

As a programmer who writes “normal” programs, you have to define every possible situation and how to handle it.

With ML, you give the algorithm a large amount of data (the images in our case) and you ask it to teach itself to recognize different data.

Turn the input data into numbers, then some magic happens and you get some number as output.

The upside is that you don’t have to write and define every possible situation.

The downside is you can never know for sure why a program behaves a certain way because you didn’t write it. This is even more evident in neural networks. We will use the ML “algorithm” to recognise traffic signs. The neural network itself learns to recognize the elements of images (say, sample images). Because it learns to interpret a picture itself, it is difficult to understand why it “sees” what it “sees”.

If that doesn’t sound that bad to you, consider how you would feel in a car that sees the road differently from you. A little closer to reality would be to say – imagine being driven by someone who is under the influence of (hallucinogenic) drugs.

Or better yet, look at what’s happening with GAN networks and how they can confuse “smart” cars.

Of course, networks are already being developed that are much better, but need a lot more resources to work. One idea to circumvent these issues are the so-called “Capsule Networks”. Developed by one of the “fathers” of neural networks, Geoffrey Hinton (the Turing Prize was awarded to three people, not just him). The idea behind these networks is that they try to relate the spatial characteristics of the image. This means that you can’t have a dog that has its head rotated around its body and still be a dog.

Bonus question – is the dog still a dog if it has its head rotated?

To summarize, ML allows us to answer (in our example of images of traffic signs) whether a traffic sign exists, where it is, and what it is, based on more input. So all the ML algorithm does is approximate the function (creates a mapping) from the track images to the location into the sign type.

We need not go into more detail than this. We can deal with neural networks in another article. Now we focus on how character recognition is made.

What is available, what have we used?

For anyone following what is happening in the ML world, in recognizing images, sound, and text, they are sure to be familiar with the big names mentioned there.
Tensorflow, the library from Google, stands at the top and is already quite famous. The alternative is Pytorch by Facebook.

Keras is a higher-level API that stands above Tensorflow, and fast.ai is a higher-level API that stands above Pytorch. They are not the only ones, but they are most famous. There are so many names and players here right now, but I don’t think you should worry too much about what to use. Choose one and work with it. In my opinion, for a beginner, Pytorch is a little less magical, but not as supported as Tensorflow.

For example, training neural networks requires a lot of hardware resources (and thus a lot of time). So it is useful to have access to special hardware. Graphics cards are great, but there is something even better. As training is time-consuming, it would be great to access TPU (Tensor Processing Unit). TPU is a special neural network training hardware that has a short training time. Keras can use it, Pytorch can’t (still?). That makes sense, considering that TPU was built by Google and available in Google Compute Cloud.

Once you have selected the library to use, then you need to check what is happening in which field you want to work with – images, sound, or text. There is a lot of innovation in all fields and I hope that in the future I will be able to cover it in a little more detail.

For images and for what we need, we can also choose TensorFlow Object Detection. It is an easier way to do the same job of recognizing images without so much coding. Don’t think you can gain an advantage without giving up something. You have a lot less flexibility and it’s hard to get it to work the way you want it to, but in turn, it’s a little easier to start.
What we find interesting here is image recognition. Looking at what’s new in the world in the last 10 years of image recognition, we have a lot of interesting things.

The biggest revolutions here are CNN, the Convolutional neural network. There are differences in how neural networks are used, how many neural networks have layers and how other things are connected, and all these differences are called neural network architectures.

How did we train sign recognition?

We will focus on two architectures, as we look for a specific result. How can a neural network work on a Raspberry Pi, which doesn’t have such a strong processor? That means it can’t process as fast as an average laptop or desktop computer.

We are looking for a neural network that is not too big, that could run on weaker hardware and that is precise enough to recognize the signs on the track and where they are. We want to make sure that the neural network recognizes only the signs and not the whole image where the sign is and associates it with the sign.

In other words, we want to check that the neural network is “seeing” the right thing:

Also, we need to keep in mind that once a neural network has been trained, it has to be transferred to a Raspberry device and started there.

Of course, you have to put it all together first – Raspberry must be able to run the Tensorflow we use. I’m not going to say it’s difficult, but if you’re not lucky (or knowledgeable) to pick the right version of something, it will cost you a lot of time. Be persistent, consult the internet and I’m sure it will work.

But how does the neural network “know” where a sign is? Well, this is magic. One video the car “sees” is essentially a group of pictures. When you read FPS somewhere, it doesn’t always mean First Person Shooter. Sometimes it means Frames Per Second.
Frames per second – that’s all we see. We look at the frames that change and we get the fourth dimension of the images, which is time. Spacetime, anyone?

Before we start training, we need to go through all frames of one video manually and tag picture with the traffic sign.
Let’s see what the track looks like from the perspective of the car as it drives around. In this case, we see the camera on the car pushed by Roberto.

We take a video like that and turn it into pictures (frames). Let’s take those pictures and open it in say LabelImg and mark where the traffic signs are in the picture.

Example annotation (big word meaning to mark) of pictures with a “stop” sign:

Example of annotating an image with a “left_turn” sign from another video:

Train / Test split

When we annotate each image, we divide them into a “train” set and a “test” set.

This is a common procedure and you can imagine it working like this. We have 5 pictures of cats. Let’s put one picture of the cats in the test set and the other four in the train set.

So one in five, or 20% (⅕ * 100) is for the test set, and four out of five, or 80% (⅘ * 100) is for the train set.

When training the neural network, we use these four images from the train set for training the neural network. When we want to check if the neural network has learned to recognize the cat, we will not give it an image it has already seen. We will give it a picture it has never seen because we want the neural network to learn to recognize all cats, not the four we have in the pictures.

In other words, we are trying to get real accuracy, and we can only get it if the neural network doesn’t “see” the image we will use for testing. The story is a little more complicated than this, and it would be okay to mention overfitting as a topic for research if anyone is interested.

Neural network architectures

The first architecture is pretty standard. This is the “SSD Inception v2 (COCO)” architecture. “COCO” is a standard object recognition dataset, and stands for “Common Objects in Context”.

We can see how it behaves on the track:

Very precise with training on only one run on the track, it can recognize the traffic signs. It’s performance when we run it on the Raspberry Pi 3 is not so great.. We have an average of 0.73 FPS (Frames Per Second). This means that when shooting with the camera, the Raspberry Pi 3 manages to capture the image and analyze it after 1.36 seconds. Every 1.3 seconds we have a new image.

It’s not realtime and it’s not fast enough to drive around the track. Imagine having a driver who needs more than a second to react. Yes, I’m looking at you, retirees. The Raspberry Pi’s CPU stays at about 75-80%, which is not good.

Let’s try a smaller architecture, assuming that the smaller architecture will be accurate enough and much faster. Architecture that meets the criteria and should run faster on a Raspberry Pi 3 is “SSDLite Mobilenet v2 (COCO)”.

 

Let’s see how it behaves on the track:

This is not good. Although we only have one run, we see that there are architecture issues and that it would take quite a few examples to achieve the same performance. And those performances are?

A bit faster. We even have 0.8 seconds per frame, which translates to 1.25 FPS and the CPU is around 60-70% at RPi 3. It works marginally faster for much worse performance.

Let’s look at another lightweight architecture in the hope of finding something much faster. The architecture we will look at and will no longer use at all is “SSD mobilenet v3 small (COCO)”. It has almost 3 FPS, which is much better. The CPU is at about 40%, which is acceptable, but its accuracy is …

Unacceptable.

And then what? That’s it? We’re stopping here, we’re not gonna do anything for detection? No, you need to have something on the car that can detect not only the traffic signs but also the line along which it drives. So how do we do that? Will we have an “Retirees Derby” where the cars will pile into each other? No!

We will use a new technology.

How did we speed up the traffic sign recognition?

You remember that I mentioned above that there are neural network hardware accelerators. TPU (Tensor Processing Unit)?

Well, well, there’s one that you plug into a USB that works as an external unit. Its name is “Google Edge TPU“.

You can see the pictures of how I unpacked it and how it connected to the Raspberry Pi 3.

(Coral) Edge TPU has extremely good performance for neural network evaluation. You can see more in detail here.

It arrived within a couple of days as I ordered it. What we are interested in is, is it possible to run this more accurate real-time architecture with good performance on the Raspberry Pi 3?

The answer is yes.

What we need to do first is move the existing model into one special format called TensorFlow Lite. It is, as the name implies, a lighter version of the standard model intended for smaller devices like a smartphone or an IoT device.

Maybe we can get a little more detailed about the device itself and its final performance in another article. One of the plans is to investigate whether it is possible to recognize the sign at the same time and follow the line.

What next, what are the next steps?

Road monitoring is another major problem that still needs to be addressed. We have started to resolve it, but we have not yet reached the level to say that we are satisfied with it.

Some cars that move around the track might have Lidar as well, but this is much simpler.

My idea was to add three ultrasonic sensors to the car and use it to detect the location. But monitoring the road might be a more realistic example of an autonomous vehicle.
Aco plans to do manual tracking using OpenCV and rough contrast images. I plan to do a neural network regression.

The current track is likely to gain ground. The surface is there to make the car easier to navigate and to have a line in the middle of the “road” that you would follow with the car.

An example of how rough a track would look:

When all this is done, expect the second part. Also, expect a presentation.

We will announce the presentation of the car when the time is right. The presentation will be in Pula, at the “Istria Programming Languages” Meetup and in Rijeka, at the “Udruga za promicanje naprednih tehnologija Start IT” Meetup.

You are all invited. Also, I invite you to take part and send us your ideas, suggestions, and improvements. And above all, I invite you to take part in this project.

Join “Istria Programming Languages” and “Udruga za promicanje naprednih tehnologija Start IT” and write to us what you want to see in the article/presentation. Also, if you are stuck let us know where you are stuck.

Conclusion

You can do all this by yourself, and you should not be afraid to do it, it’s not that difficult (if you persist).

What has been on my mind for a long time is to put the camera in a “real” car and then evaluate other drivers (who are mostly idiots, BTW), whether they cross the center line (yes, people, it’s common), ordo they use signal lights (turn signals).

What is a big problem today is that people don’t even know the possibilities of technology, so even when we contact some companies with “we have products that use artificial intelligence”, people who work there often do not even understand about what are we talking about. Maybe they think we’re trying to rip the off or something?

Everyone is sadly waiting for some “best practice” BS, where managers will listen to sellers sell them things that were on the market 15 years ago.

One of the reasons we started writing this is not because it is fun, but because we hope it spread the word about us and that people will finally begin to realize that things like this are neither difficult nor 10 years in the future. It is here. Today.

We hope you will join this as well.
It’s a short life. Do something you will be proud of.