Computer Vision is Building the Future

Developing an accurate, regulated and purpose-driven facial recognition software is the challenge of computer vision engineering. It is also the future of how people do their everyday tasks: from unlocking a phone to signing into a bank app or logging the time they get to the office. Facial recognition amazes more and more with its new applications. Today, people can go to a concert without the tickets they bought online, they just have to bring…their face and show it at the gate.

Computer vision science has been working on facial and object recognition software for some decades now. From the beginning, the main goal of this engineering, framed in the AI field, has been to give computers a visual understanding of the world.

The goal gets fulfilled through a three-step process: image acquisition, image processing and image understanding. Each step can be done thanks to different kinds of technology. For example, image acquisition can be done using a web camera, an embedded phone camera or a 3D camera, among other types of lenses.

With the help of algorithms computer vision is currently capable of recognizing and tracking objects. But it can go much further than that. Now, neural networks are at such a high level that they are capable of restoring and even creating new images. All of these features that seemed to be available only in futuristic or science fiction movies are now in the market thanks to AI algorithms, such as Generative Adversarial Networks (GAN).

Today, big cloud providers have released their own AI and vision models including facial recognition APIs. Nevertheless, these models are conceived with rather general purpose algorithms that might be sensitive to specific environmental details, but don’t offer the possibility to retrain models with data gathered in a productive environment.

The good news is some companies can offer AI models retraining in customers’ environment to accommodate any need or requirement, including usage of depth cameras or ToF sensors. But of course, these developments bring along some challenges. The way to succeed is to be ready and thoughtful in every step. This way, any surprises along the process will be controlled and won’t destabilize the project offered to a client.

Last year at intive, we built a cognitive department with our internal AI experts as well as the best people we could find in the market. Eleven people shaped the core team that developed advanced facial recognition software in only one year of work. Our objective was to create an AI model fast suited to specific environmental conditions of small, medium-sized and big companies. We created AI models capable of using RGB, depth cameras and ToF sensors.

Depth cameras enable the API to distinguish a real human face from flat images such as screens, tablets and paper. The new intive’s API can estimate age, detect sex and recognize a fraudulent face. It identifies eyes, nose and the ratio of length and distance of individual facial elements. It can even discern the differences between identical twins. The only face characteristics not taken into account are hair, mustache and beard; this is to make sure shaving or changing the hair style won’t impact the recognition capabilities.

Here are the three major challenges for computer vision development.

Legal and Compliance

First, at any computer vision development, it is crucial to have the General Data Protection Regulation (GDPR) in mind. You must ensure you are allowed to manage or process certain biometric data. Imagine the city government installs cameras over the city parks, these are useful not only for recording but they are also capable of tracking people, thus enhancing the streets security. In this case, people’s identity has to be somehow recognized, and the way of doing this is by processing biometric data. If the person tracked does not know this is being done, a GDPR violation happens.

Anytime a system needs to process people’s biometric data that can identify them, they need to give consent to process this data. To solve this, intive installed a button on our API that releases the biometric calculations and the person’s identification. By pressing this button, the person automatically gives consent to start the process. Additionally, while enrolling to the system, employees need to sign the consent.

Same Environment, Always

In every computer vision case, environment is an aspect you will always have to deal with. This refers to the technology used for acquiring images during the development, testing and productive stages of the project: the type of camera and its calibration, the drivers, libraries, settings and the light conditions.

To ensure the API’s best accuracy, all stages must be developed with the same environmental conditions. Otherwise, the system loses its best accuracy and could come-up with worse results in the productive environment and in the development one. When environments change, the error rate rises.

To ensure success, it is crucial to collect all possible data from the final environment and retrain the developed model with this data. For our computer vision project, we created several different production environments and we gathered all the data that came up. There will always be differences, for example, the lighting conditions are hard to control in every stage. But the more data you have, the fewer errors will appear.

Set Acceptance Criteria and Meet It with the Data You Have

From the beginning of the project, it is very important to set the accuracy you are aiming for. This means you need to decide how often the system recognizes an enrolled user correctly. To understand this better, if you have 10 users, and you set a classification accuracy of 90%, this means nine people are going to be recognized accurately.

False positive rate (FPR) is linked to this and it needs to be decided too. This value represents how often the system recognizes a person that enters the system with different characteristics than the enrolled one. For example, if you are using the system to control employees’ entrance to an office, how often will a person who is not an employee (and not enrolled in the system) gain access to the building?

Both values, accuracy and FPR, are strictly bound. Better accuracy is directly related to an increase in FPR. You need to decide if you want better accuracy and higher FPR or lower accuracy and lower FPR. This can be controlled smoothly along the course of the development if you set it according to the available data.

Accuracy and FPR will change based on the quality of data–the photos used. The more pictures (data) per enrolled person, the more accurate the system is. Pictures from different angles allow the system to have a better chance of being effective. In our case, we took a minimum of five pictures for every enrolled person.

Working on a computer vision project with facial recognition software can seem intimidating. But it is just a matter of being well organized. Take into account the three challenges highlighted above and you will be on the right track to success.

— Grzegorz Tymiński