Here, you will find a brief explanation of computer vision, some cases we are experiencing in real life, and some of the existent techniques in data annotation supporting the advance of computer vision. I want to highlight upfront that I’m not approaching any computer vision algorithms in this post. My main goal is to support and bring knowledge to people with little or no understanding of CV and data annotation.
What is computer vision?
Computer vision is a field in Artificial Intelligence that enables computers to understand images, videos, and digital assets, through the understanding of those visual content computers can take decisions based on a set of predetermined rules.
Computer vision has been and will likely be one of the hottest fields in AI for the next few years. Many of the recent advances in healthcare, security, retail, agriculture, and many other fields are tied to how computers are gaining a higher level of understanding of digital images and videos.
That is cool, right? How does it translate to the real world?
Some good and simple examples of computer vision we are experiencing in our lives and will experience in a few years are
One of the fields with heavy investment in the past years is a huge user of computer vision techniques. When you see a Tesla car or in the future when you see many other autonomous vehicles driving around that is among other beautiful technologies a sharp computer vision system detecting, classifying, and processing thousands of images in real-time and sending them to the main system for a decision and action.
If you want to dig deep into autonomous vehicles technologies and how computer vision impacts them, I recommend checking here
Motion Controlled Games
Have you ever played Kinect, Nintendo Wii, or any other motion-controlled game?
The camera on those devices tracks the movements in a 3d space and the computer vision system processes it, recognizing human joints and positioning you in an equivalent place in the game and replicating every move in real-time.
The advance of computer vision has helped doctors with the analysis and diagnosis of conditions. Especially looking at static imagery processing and classification, medical software systems impact radiology, pathology, and others.
Among many other things, computer vision helps medical systems to classify images identifying if, for example, the image in question is cancer or a false positive.
You can read more details about computer vision in healthcare here
Many other cases are out there, most of the enterprises are using or will be using computer vision in the following years. A lot of developments have been made in the past years and a lot is yet to come and the impact in the real world will be huge!
Ok, we saw a few cases of computer vision but how does it happen? Well, there are a lot of techniques involved and in general, those systems are everything but simple. That said, I won’t navigate through the machine learning algorithms, frameworks, and other technical pieces of it. Let’s take the first step and understand how these systems get inputs to start learning and “seeing” things as they are supposed to be.
That is where the Data Annotation comes to the game. What is it?
Well, remember all these computer vision systems we spoke about before? They all need a good amount of data with the patterns they need to identify, with correct and incorrect answers for the situation they will face. Like the example of cancer or not in a medical image.
Data annotation is the technique of collecting all the raw data and identifying and labeling the objects to give it a mean and turn it recognizable for a machine learning algorithm, in other simple words is providing these algorithms with similar situations they will face when running in the real-life and what are the correct answers for each one of those. Nowadays you will find automated data annotation methods and also companies such as iMerit that provide among other services data Annotation workforce with expertise in many industries.
Let’s take a look at the most common techniques and some examples.
This technique consists of analyzing a set of images and classifying each one of them in a category, the group of categories is pre-determined and changes accordingly to the system you are building.
Imagine you are trying to develop a system that can read images of animals and first of all identify what animal is in each picture, your categories would be a cat, dog, horse, lion, etc. Sounds very simple right? Yes, but it also has its challenges.
How would you define and classify if suddenly a picture has no animals, or more than one, or the image is pixelated and you can’t identify the animal, or it is half an animal in the picture? Those are all edge cases that the annotator faces, and they need to be defined for the right data to be provided. Usually whoever is developing that system will be able to support and choose what action to take in each case.
As the name suggests, this technique identifies specific and multiple objects in an image, video, etc. Once identified, the annotator draws and limits that object in an area, usually using a bounding box (as the example in the image below) or a polygon, finally that area is labeled accordingly.
Face recognition and video surveillance systems heavily rely on object detection.
This is probably one of the hottest, especially because of its large utilization in autonomous vehicles and healthcare.
Like object detection, this technique also will limit objects in a scene. Although, instead of relying on a square, rectangle or polygon, it will analyze each pixel. Easy to realize that this technique requires much more time and effort from an annotator to define the boundaries.
Using semantic segmentation, all pixels that belong to the same category will receive the same pixel value during the annotation. As you can see in the example below data annotation tools use different colors to facilitate the annotation process and understanding.
Annotation using semantic segmentation is complex since the scene will often contain hundreds of objects and categories and some systems require a very high precision when defining the boundaries of each object.
I mentioned before this method is often confused with semantic segmentation, which happens because the boundaries and objects are mapped at a pixel level as well. The difference is the value you attribute to each pixel.
Remember I said in semantic segmentation every pixel of the same category receives the same value and we saw the example with multiple cars with the same color and label? Here, every single object will receive a different value even if they belong to the same category. This technique is useful if you have a system for example that needs to count the number of objects in a scene or image
Remember, I just listed a few to bring you some understanding, but many other methods exist, such as object tracking, pattern detection, lidar annotations, etc.
Here we finish, I hope this gives you a good overview of computer vision, data annotation, and some of the most known techniques.
Originally posted here. Reposted with permission.
About the author
Rafael Oliveira a Solutions Architect at iMerit helping clients in different industries to develop their solutions using Data Annotation techniques. I’m a Bachelor’s in Computer Science at the Universidade Estadual Paulista (UNESP), and a master’s degree in IT Management at Fundacao Getulio Vargas (FGV) both in Sao Paulo, Brazil and MBA at Hult International Business School, San Francisco.