The last article written by Kristijan Šarić was about the detection of breast cancer and how we can use artificial intelligence to save more lives.
You can find the full article here:
It is hard for me to hide my disappointment. For a project that was really accurate after only 1000 examples, very compatible with the medical prognosis of radiology specialists (of course that it did not work perfectly, it only had 1000 examples), but the project itself could be turned into something that can really help to save a lot of lives, especially with the right cooperation between doctors and medical staff in healthcare.
Anyway, today we will continue with another topic, and I am finally fulfilling the promise that I made 2 years ago.
I created a free app that can recognize alphabetic characters for a sign language using a smartphone and you can download that app for free, for Android users only, on The Play Store.
The program is used to recognize the alphabetic characters (letters) by hand gestures. Artificial intelligence can recognize the position of your hands and try, based on a small number of examples, to recognize which letter is shown by your hand movement/gesture.
The app was trained on a limited number of examples, so of course that it won’t work perfectly, but it would be a great example of an app that would work very well if many more people would get involved in the project in the way which they would help improve the app.
The first mail with the original idea for a sign language recognition application for the hearing-impaired was sent on July 16th in 2019, and it has just been completed now.
It’s hard to do your job for which you’re paid, and on the other side to find some time to make free apps and everything else that goes with it, so I really hope that the reader/user can have a little bit of understanding of what I am talking about.
I finished the application by ending my collaboration and contract with the IOHK this June. I no longer work with the 5th largest cryptocurrency in the world. I was there for over 4 years and I wish them all the success.
On the other side, I have more time now to work on some interesting and new projects that I will present to you over the next few months.
Who is involved?
In this project, the Association of the Deaf of the Istrian County (called “Udruga gluhih i nagluhih Istarske županije”) has been involved from the beginning, and you can find out more about the association here – https://ugniz.hr/
Dina Ivas started collaborating with me back then (2 years ago), but in the meantime, she found another job. Anyway, it’s really important to mention that she was involved in organizing the recording of the two-handed sign language alphabet, which I’ll talk more about later. The person I am currently in contact with is Alma Zulim, and she was the person who witnessed the end of this project when I contacted the association again early this year.
How did it start?
My first idea was to try something simple and something that should work within a month or two. So I made a final decision and decided to make a sign language alphabet. The sign alphabet is the same as the normal alphabet, but it is performed by hand and is intended for the hearing-impaired.
Sign “speech” itself is much more dynamic and complicated than this, as it involves movements of the head, lips, and whole body in general. Such a project would take a lot of time and, therefore, a lot of money. The apps that I usually do for free are mostly for promotion, but I also have to make a living from something, so I cannot work on a project for a year without receiving any money.
In other words, I wanted to make a prototype app on a smartphone to see if it’s possible to build a mobile app that could recognize sign language in real-time on a smartphone without some extra tools, using only a camera?
My answer, for now, would be – yes, that seems possible.
I will start from the alphabet first and then continue in the direction from where it would be possible to go and what could be done for now, as I have a couple of nice ideas already on how to solve some problems.
In order to see from where could we start and see what we can do about it, we need to have examples to “teach/train” artificial intelligence on how to recognize letters first.
So in the end we agreed with the Association of the Deaf that a couple of members in the association would get together and record videos with the same background behind them so they could try to capture different hand positions of the letters they will be showing (in alphabetical order).
Here we have an example of a member in a video that shows a sign alphabet in alphabetical order, trying to show the hand in different positions. Why in different positions? Because we want artificial intelligence first to learn to generalize hand position for each letter – we want artificial intelligence to “understand” and make sure that hand position is important, not just in one position, but to recognize and understand that hand position is important regardless of some rotation/motion.
Image augmentation (https://en.wikipedia.org/wiki/Data_augmentation)
would do something like this if we work with images.
After recording and getting videos of 6 members separately (not too much, but good enough to start with), we separated these videos into sections where the individual letters are. This gave us, for each member, recordings of individual letters. That job was then done by Linda Štaba, who was on occupational training at EXACT BYTE.
Then we loaded these recordings individually so we could get pictures from them by capturing the screen.
Example of two different images:
After we got these pictures from the recorded shots, we randomly divided the pictures into a training set and a test set.
Now we already have a set of data to start with.
We always talk about how we manage to do something, but we rarely mention what tasks have failed and what exactly went wrong on its way. If someone is also planning on going through this, it’s important to mention what things are definitely not going to work out, or which are going to require a huge amount of data to work with.
First of all, my idea was to just insert a bunch of images into the neural network with just a little bit of augmentation and I thought that the thing would work out, but it didn’t quite go as planned. I mean, it has great accuracy of test data, but in the real world – it’s really unusable.
Secondly, the idea of detecting hands (hand gestures) first, then cropping them out, and then inserting them into the neural network is equally unsuccessful, even if it seems closer to reality.
I tried with different data sets to try to get something like this, even with segmentation, by using EgoHands data. In the end, it was still not enough. I wasn’t satisfied with the results so I needed to give up on the idea.
The conclusion would be that it would take a lot of examples for a neural network to “learn” from a picture to generalize something like this. In my case, it was completely unsuccessful.
How does the application work?
In the end, I realized that the best thing would be to have photo mapping to the coordinates of the wrists (input) and then use those coordinates to have mapping to an output value – the letters themselves.
Unfortunately, to get this first mapping, we would need to record it with special gloves that can send such coordinates automatically, and we would also need to have plenty of nice examples.
So in the end I shamelessly stole/borrowed it from an existing Google project, which you can find here – https://mediapipe.dev/
So, how does the hand tracking work in that project?
Well, the application recognizes first the thumb (but we will try not to complicate for now) the hand, and then returns the 3D coordinates (x, y, z) using the red dots that the hand contains (21 dots). Take a look at the picture below.
How does this recognition work? You can see an example here.
Images were copied from this link:
Once we have the first part of the mapping, the next part of the mapping is relatively easy.
For each image, we detect 3 coordinates of each dot for both hands. That is 3 * 21 * 2 = 126 dots. This is our input. The output would be our letter that we detect, of course. Since we only consider and count “static letters” (letters that do not need any hand movement) – the result are 22 letters.
There are also “dynamic letters“, letters that do require movement. These are letters like letters with a checkmark – č, ć and others. Why don’t we count such letters?
Because these letters require movement, which is a group of images in time – animation.
Animation – film or cartoon, is a set of images that rotate in time and they require a large group of the images to be able to understand the movement. This is a big problem due to several things.
First of all, the problem occurs because it requires a lot more examples, and secondly, it needs a much stronger smartphone with a good hardware on it. However, I am sure that this can be solved with a larger amount of examples, but then the problem transforms from “regression” to “classification”. But, it can be solved by using time series data.
That’s why I say that this is a prototype of a real sign language application – it just proves that the application working on a smartphone works so fast that it is possible to implement dynamic movements that could work on a smartphone. And yes, with enough resources (like money and time), I believe I could do it.
So, it is possible by using time series data (like LSTM which stands for “long short-term memory”…or similar) to make motion recognition. I’m also offering another interesting and useful idea if someone goes into this direction.
Image augmentation can be used to improve/generalize the object recognition model in an image.
We can also make a model based data augmentation for hand gesture recognition, where we can, for individual letters, move the “fingers” in rotation around the wrist, or zoom in the hands, which can form a much richer dataset.
Where can I try the app?
The application can be downloaded for Android users, and it is located on the Google Play Store. It is possible to use the application if you download it here:
If you have two cameras (front and rear), it is possible to test in both ways.
As I have already written, do not expect miracles. Try to move away from the camera so that your upper body is visible (from the waist up) and make sure that your hands are in the frame. Then try to reach the position of the letters by your hand movement. Some letters are very easy and work right away, while others are more complicated.
For other letters, a lot of additional examples are needed to make the app work better.
Don’t expect miracles, I know people will look at it and say “this is stupid”, but if you take into account that this is just an application that recognizes (quite well) sign language without any special tools or gadgets, and indicates the direction where the sign language can be used as a whole, along with movements – this is quite impressive!