Updated: Oct 23, 2018
Face recognition as a technique needs no introduction, for it has become quite commonplace. Smartphone manufacturers have exceedingly improved on this technology in the recent years and thus even mid-range phones today utilize facial recognition as an accessibility tool. Face recognition is also progressively gaining popularity outside people's personal spaces as technological limitations wane off. Large scale deployment of facial recognition systems is being done in residences, offices, airports, and despite sounding Orwellian, on the streets.
There are many commonly used algorithms used for recognizing faces from images and videos. This article provides an overview of the most popular ones.
The first step in any face recognition system is to detect a face in an image. The main objective of face detection is to find whether there are any faces in the image or not. If a face is present, then it returns the location of the image and extent of each face. Pre-processing is done to remove the noise and reliance on the precise registration. There are various factors that make face detection is a challenging task. Pose presence or absence of structural components, Facial expression, Occlusion, Image orientation. The facial feature detection is the process to detect the presence and location of features, like nose, eyebrow, eyes, lips, nostrils, mouth, ears, etc. this is done with the assumptions that there is only a single face in an image. In the Face recognition process, the input image is compared with the database. The input image is also called a probe and the database is called a gallery. Then it gives a match report and then the classification is done to identify the subpopulation to which new observations belong.
Over the past few decades, ever since the first research on using computers to recognize faces was done by Woody Bledsoe, along with two other researchers during 1964 and 1965, a number of approaches have been adopted for recognizing human faces. A large amount of literature that uses a number of different approaches has been published. A brief review of some of the important ones follows:
1. Eigen Faces
A popular technique, at the heart of Eigen Faces, is an unsupervised dimensionality reduction technique called Principal Component Analysis(PCA). PCA is used to remove information which is not useful and therefore reduces the dimensions of the data and accurately decompose the face structure into orthogonal principal components known as EigenFaces. The term Eigenface is essentially the name given to a set of eigenvectors when they are used in the Human Face Recognition Problem. The approach of using eigenfaces for recognition was first developed by Sirovich and Kirby (1987) and used by Matthew Turk and Alex Pentland around 1991 in face classification.
Advantage: The algorithm behind the construction of Eigenfaces is simplistic in its approach and efficient in terms of time and storage. The PCA reduces dimensions of an image in a short amount of time. This makes it a very practical approach for Facial Recognition.
Drawback: The Eigenface approach has a major drawback. The accuracy of the approach reduces with varying light intensity and the position of the head. While the problem of head positioning can(at least theoretically) be solved by head pose estimation and reorientation of the bounding box to move a person’s head to a standard location in an image, the preprocessing required to achieve satisfactory result leads to the need for a better solution to the problem of face recognition.
L. Sirovich; M. Kirby (1987). ”Low-dimensional procedure for the characterization of human faces”. Journal of the Optical Society of America A
2. Fisher Faces
Another face recognition technique based on the idea of Dimensionality reduction is the Fisherfaces Algorithm. The Fisherfaces algorithm is based on Linear Discriminant Analysis(LDA). PCA finds a linear combination of features that maximize the total variance in data. While this is a very powerful way of representing data, it can discard valuable information, as it doesn't consider any classes and so a lot of discriminative information may be lost when throwing some components away. This can yield very bad classification results. In order to find a combination of features that separates best between classes, the Linear Discriminant Analysis instead maximizes the ration of between-classes to within-classes scatter. The idea is that the same classes should cluster tightly together.
The Fisherface method for face recognition originally described by Belhumeur et al uses both principal component analysis and linear discriminant analysis which produce a subspace projection matrix, similar as used in the eigenface method. However, the Fisherface method is able to take advantage of within-class information, minimizing variation within each class, yet still maximizing class separation.
Advantage: The single major advantage of using Fisher Faces technique is that it is invariant to light sensitivity. Further, although it is similar to Eigenfaces, it gives an enhancement of better classification of different image classes of images.
Drawback: The disadvantages of Fisherface are that it is more complex than Eigenface to finding the projection of face space. Calculation of ratio of between-class scatter to within-class scatter requires a lot of processing time. Additionally, due to a need for better classification, the dimension of projection is more expensive to storage as it is larger then Eigenface representation and leads to more time needed for recognition.
Peter N. Belhumeur, Joao~ P. Hespanha, and David J. Kriegman(1997): “Eigenfaces vs. Fisherfaces: Recognition Using Class Specific Linear Projection”; IEEE Transactions on Pattern Analysis And Machine Intelligence
3. Geometrical Feature Matching
Geometrical feature matching techniques are based on the computation of a set of geometrical features from the picture of a face. The overall configuration can be described by a vector which represents the position and size of the main facial features like eyes and eyebrows, nose, mouth, and an outline of the face. The first attempt on automated face recognition by using geometrical features was done in 1973. Their system achieved 75% recognition rate on a database of 20 people using two images per person, one as the model and the other as the test image. In 1993 another approach was developed that automatically extracted a set of geometrical features from the picture of a face, such as nose width and length, mouth position and chin shape. Typically, 35-45 feature points per face were generated. The recognition accuracy of the right person was 86% and 94% of the correct person's faces were in the top three candidate matches. In summary, geometrical feature matching based on precisely measured distances between features may be useful in finding matches in a large database. However, it will be dependent on the accuracy of the feature location algorithms.
Drawback: The drawback of current automated face feature location algorithms do not provide a high degree of accuracy and require considerable computational time.
4. Neural Networks and FaceNet
The neural networks are used in many applications like pattern recognition problems, character recognition, object recognition, and autonomous robot driving. The main objective of the neural network in the face recognition is the feasibility of training a system to capture the complex class of face patterns. To get the best performance by the neural network, it has to be extensively tuned number of layers, number of nodes, learning rates, etc. The neural networks are non-linear in the network so it is a widely used technique for face recognition. So, the feature extraction step may be more efficient than the Principal Component Analysis.
FaceNet is a system that directly learns a mapping from face images to a compact Euclidean Space where distances directly correspond to measure of similarity. It is a Deep Learning architecture consisting of convolutional layers based on GoogLeNet inspired inception models, which came out in March of 2015. The system returns a 128-dimensional vector embedding for each face. Having been trained with triplet loss for different classes of faces to capture the similarities and differences between them, the vector embedding, returned by the FaceNet model, effectively clusters faces. Thus, the vector would be closer for similar faces(low intra-class distance) and farther apart for dissimilar faces(high inter-class variance). The FaceNet architecture is trained over a dataset with a very large number of faces belonging to numerous classes.
It is possible to now train an SVM classifier or any other simple multi-class classifier over the vector embeddings obtained for faces from different classes. Every time you have a new person’s face being added to your set, you just need to add another class and train the final classifier rather than training the entire FaceNet model. This way, it is possible to use FaceNet architecture very effectively for real-time face recognition applications as well.
Advantage: The major benefit of this approach is the representational efficiency. State of the art accuracy levels are achieved using only 128 bytes-per-face. Adding to this is the fact that this approach works well for real-time face recognition applications due to its efficient architecture.
While this post may not include all the approaches and solutions (like the DeepFace by Facebook that I didn’t include), it was not meant as a comparison but an overview of the various techniques that have evolved over decades of research in this area.
I hope this was informative. Do let me know in the comments in case you have any questions.
At Aidetic, we specialize in providing artificial intelligence enabled video analytics solutions. We build customized software for retail analytics, production line management, automated surveillance, and surveying and land mapping. Moreover, we also handcraft unique solutions for our clients' specific use cases. Feel free to write to us at email@example.com to learn if we can help you with your specific use cases.