Rate it: NIST standards and quality assessment of biometric algorithms based on facial recognition

The global facial recognition market, which is already considered quite developed and capacious today, is predicted to get an above-average rate of growth in IT industry.
So, according to the recent Future Market Insights forecasts, this market will grow from $5.2 billion in 2022 to $22.5 billion by 2032, while the average annual growth rate will be 15.7%. According to analysts, the Facial Recognition Market of software solutions will grow at a faster pace than the market of specialized software for law enforcement agencies and the market of individual technological algorithms for self-building biometric systems.
No wonder so a large number of facial recognition technology developers from different countries are aimed at this “tidbit” of a rapidly growing market. In this regard, the fact of the emergence a generally recognized independent organizations is not surprising. It develops generally accepted standards and assessment methodology, according to which solutions from different vendors are lined up in a periodically updated summary rating.
There are many comparative tests for facial recognition technologies, but the generally accepted international standard is considered to be a package of specialized FRVT benchmarks (Face Recognition Vendor Test), which is developed and periodically updated by NIST (National Institute of Standards and Technology). This institute is one of the oldest US federal agencies under the Department of Commerce, which develops standards and specifications for various industries, including the defense one.
The very first FRVT test package was developed at NIST back in 2000, and since then it has been improved many times and significantly refined, especially after the September 11 attacks in the United States. Although for the sake of fairness, it is worth noting that the NIST facial recognition quality assessment project was born much earlier than 2000 and certainly long before the modern hype around AI, as far as the relevant studies within the FERET (Face Recognition Technology) project under the auspices of the US government began back in 1993 year.
Be that as it may, the modern FRVT has experienced a number of revisions in 2002, 2006, 2010, 2013 and 2017, at the same time the benchmark developers have made great efforts to create a transparent testing methodology, a rating system that is understandable for the end users, while maintaining the independence of the expertise. A special role was also played by the possibility of submitting face recognition algorithms for tests by developers of any country at any time (no more than once every four months). And yes, testing is free.
In this article, we will briefly review the format and methodology of FRVT tests, as well as its main metrics. In conclusion, we’ll say a few words about how these tests relate to RecFaces products.
Contents
- What is measured with FRVT
- FRVT rating evaluation criteria for different applications
- How does it work?
- Integrated RecFaces strategy: not by algorithm alone
What is measured with FRVT
FRVT includes several evaluation tracks, however, from the standpoint of the practical use of facial recognition technology the most popular tests are FRVT 1:1 Verification and FRVT 1:N Identification.
FRVT 1:1 Verification is a 1 to 1 verification quality assessment where the algorithm must determine whether two data samples belong to the same person. This is the verification principle that is used, for example, in the “box” (ready to use) RecFaces Id-Logon software product, which provides biometric access control to the operating or information systems when comparing a user’s image with his photo from the database.
FRVT 1:N Identification is a test for the quality of the identification algorithm by comparing data on a 1 to N (one to many) basis, for example, for the facial recognition in a video stream when comparing with faces from an existing database. These can be stop lists to restrict access to undesirable persons, or, on the contrary, lists of clients for personalized service, with whom, after recognition, you can conduct targeted work based on the history of previous interactions (purchases, etc.).
One illustrative example of such personalization is the biometrics functioning in the RecFaces Id-Line queue management system (QMS), designed to identify a person by face in electronic queue solutions and self-service kiosks, as well as the RecFaces Id-Target software product for targeted interaction with the client through biometric identification in retail, multifunctional service centers, public catering establishments and other organizations with similar tasks.
In addition, FRVT includes tests for facial evaluation in a video stream (Face-in-Video-Evaluation, FIVE), change of facial expressions (MORPH), demographic aspects (DemographicEffects), such as, age, gender, and race. Since 2020, when the coronavirus pandemic began, the Face MaskEffects test has been added for assessing the accuracy of facial recognition in medical masks.
Before submitting an application, all participants must check their software using a special validation package. Such packages for each test are available on Github. Validation is performed to ensure that the data exchange between the participant algorithm and the NIST test loop is consistent.
The parameters of data packets with extensive sets of different quality photos, taken from different angles and used to train facial recognition algorithms in each of the NIST benchmarks, are described in detail in the FRVT specifications, but the sets of photos are closed, which ensures that developers do not have access to them and, accordingly, puts all participants on an equal footing.
In particular, the NIST datasets include sufficiently high quality images of Visa from the US Immigration Service which taken in full face on a white background, a lower-quality image of Immigration Lane taken with a webcam while crossing the border and medium-quality Kiosk shots taken at border crossing in self-service terminals.
Another set of photos is supplied by law enforcement agencies. It includes high quality frontal Mugshot images, a combination of full-face and profile photos of a person (Profile), as well as medium and low quality webcam images (Webcam).
The most difficult category of shots is Wild. Such photos, taken from reportage shooting, are characterized by a huge variation in quality, lighting, different angles and even partial visibility of the face.
FRVT rating evaluation criteria for different applications
Counting the errors made in the facial recognition process is the most logical method for evaluating the quality of the algorithm. Estimates for different types of errors are applied for different areas, while the cost of an error is different in each case. So, during the biometric identification a false mismatch is considered a key error, it happens when the algorithm denies access to an authorized user. In the case of the verification, the probability of a false match is more important, when an outsider gets access to an object or a banking service using someone else’s biometric template from the database of authorized users.
Verification testing is performed on several pairs of images, where either the same people or pictures of different people are represented. In FRVT benchmarks, each pair of potential matches is accompanied by dozens of false pairs. In identification tests, the algorithm looks over images from a large database in search of a match with a set of control photos.
The accuracy of the algorithm is assessed by the level of false-negative errors – FRR (FalseReject Rate), which reflects the probability of not being able to recognize the right person and the level of false-positive errors – FAR (FalseAcceptance Rate), which shows the probability of providing access to an unregistered person.
The FRR coefficient is calculated as the ratio of the number of false failures to the total number of biometric profiles in the database, while the FAR coefficient is the ratio of the number of erroneous responses to the total number of images in the database. In both cases, the coefficients can be reflected both in percentages and in shares. These coefficients are relative and depend on the applied algorithm settings, while both are interconnected: higher FRR – lower FAR and vice versa.
For the objectivity of the evaluation results, the threshold value of similarity is used as a benchmark for comparison, while the criterion for the probability of a false mismatch – FNMR (False Non-Match Rate), shows the ratio of false recognitions below the established similarity threshold to the total number of all recognitions, and the criterion for the probability of a false match – FMR (False Match Rate), shows the ratio of false recognitions above the similarity threshold to the total number of recognitions.
How does it work?
Algorithm testing results are contained in NIST’s regularly published FVRT reports. Remarkably, tables with test results from different vendors are published in separate columns for different sets of photos (Visa, Mugshot, KioskWebcam and others), but the final position in the rating is determined by the average algorithm performance.
In other words, some algorithm can be extremely successful when it works with high-quality photos, but in the end it can lose to an algorithm with high results in reportage images processing.
In a way it looks like a situation when an F1 car was launched off-road, but this is no way detracts from the neutrality of the NIST tests, and here it is more correct to compare them not with Formula 1 races, but, for example, with testing supercomputers, when for different ratings, such as TOP500, HPCG or GREEN500, completely different test packages are launched.
At present, about 200 facial recognition algorithms from different companies are currently testing FRVT, while tests are conducted on at least six photo collections in a range of image sets, including photos of more than 8 million people.
Integrated RecFaces strategy: not by algorithm alone
The main advantage of NIST biometric benchmarks is the possibility of free and regular participation in global software development contests under the auspices of an independent organization with transparent conditions. This has led to an increase of the FRVT popularity all over the world, so that today being in the rating is prestigious for any organization, including commercial enterprises, scientific and academic centers, and even fintech companies.
Market leaders are trying to provide improved algorithms for testing and update their positions in the ranking as often as possible, thus confirming their leading positions in the industry.
The best algorithms for checking FRVT 1:1 give a false non-matches coefficient (FNMR) about 0.0003 with a false matches coefficient (FMR) of 0.0001 on high-quality visa images. The capabilities of modern leading biometric algorithms have reached such heights that the difference between them is sometimes insignificant.
The high-precision algorithms that are currently used in a number of RecFaces box products are no exception. These algorithms regularly feature on the top of the NIST FRVT ranking in the Visa category and in the Top-70 list of the overall NIST ranking in recent years.
However, the choice of algorithms in the case of RecFaces solutions is only one of many steps in the process of creating a biometric product for solving problems in a particular industry.
Moreover, RecFaces products are not tied to any particular biometric “engine”, as the company’s box products are integrated application solutions, where the accuracy and speed of face recognition is necessarily complemented by RecFaces’ high level of exceptional expertise.
Thus, there is always the possibility of switching to another algorithm, provided that it will show better results and be more convenient for solving applied problems of end users. While the basic advantages of RecFaces products, such as fast installation, deep full-featured integration with equipment, built-in «privacy protection» compliance mechanisms, user friendly interface with flexible settings and others, remain at a consistently high competitive level.
In addition, in contrast to hard-wired facial recognition software and hardware solutions, RecFaces box products offer flexible seamless integration with access control systems from a variety of partners, including Bosch Security System (BIS), DormaKabaExos, Honeywell Pro-Watch, Kone Access, LenelOnGuard, Schneider Electric EcoStruxure Security Expert, while the list of available integrations is constantly expanding. So, for example, all necessary integration adapters are available free of charge for Id-Gate users on all types of licenses, including demo.
Summing up, it can be noted that today the high rating of the algorithm according to NIST standards is only one of the many criteria for the operation of the RecFaces box application software product, which provides not only high performance and accuracy of face recognition, but also the smooth and trouble-free functioning of the entire identification and/or verification system with access to all the functionality of modern hardware and software technologies.
In other words, the NIST algorithm competition is rather a topic for narrow specialists and technical enthusiasts, which is not a decisive factor in choosing a facial biometrics system with a real application set of necessary functions for solving current and future business problems.
RecFaces box products are proven solutions based on a powerful algorithm from a great team of experts, analysts and developers. These solutions, ready for immediate installation and operation, are designed to improve security, staff management, targeted communication and other specific industry business tasks.
A company interested in the rapid deployment, trouble-free operation and prompt technical support of such system hardly needs a deep dive into biometric technologies and additional costs for programming its own custom product, especially when there are ready-made RecFaces box solutions on the market, where all the nuances have already been thought out and taken into account.