Michigan State University main website

Pattern Recognition Advances through High Performance Computing

Over the past 40 years, University Distinguished Professor Anil Jain has been working on design and applications of pattern recognition systems. Currently, Jain and his students are devoting their efforts towards three challenging problems: Automatic fingerprint recognition, automatic face recognition and large scale data clustering.

Latent Fingerprint Identification – At the Speed of Crime
Automatic and accurate comparison of latent prints – partial impressions of fingers found at crime scenes – to rolled fingerprints (exemplars) in law enforcement databases is critical in forensics. In this research [1], Jain’s research group incorporates feedback from the exemplar to refine the features extracted in a latent fingerprint to improve the identification accuracy. Experiments for this research involved comparing 700 latent prints to 100,000 rolled prints. Michigan State University’s High Performance Computing Center resources allowed them to run their matcher [2] in parallel on 144 single core machines, reducing the comparison time from about 250 days to just about 20 days; a speedup time of ~12x.

Related Stories

Combining Forensic Art and Technology to Catch Criminals
Scanning Babies’ Fingerprints Could Save Lives
3-D Fingerprint Phantoms Improve Fingerprint-Matching Technology
Facial Recognition Technology Proves Its Mettle

Longitudinal Study of Face Recognition Improves Accuracy
Jain’s research group [3] is conducting a large-scale longitudinal study on how facial aging affects the performance of state-of-the-art recognition systems. Their study utilizes statistical models to analyze the variation in face comparison scores with respect to different covariates such as elapsed time, age, gender, and race. The goal is to determine the trend in face recognition accuracy over time. To obtain reliable parameter estimates for the models, they rely on bootstrapping. Because of the large size of the study (~148K face images of 18K subjects), bootstrapping involves fitting a statistical model to 1,000 random samples (with replacement) of 18K subjects. Fitting each model can take more than 1 hour, but the researchers are able to run the 1,000 bootstraps in parallel on Michigan State University’s High Performance Computing Center, significantly reducing processing time.

Face Image Clustering Helps Speed Processing of Overwhelming Data Inputs
Investigations that require the exploitation of large volumes of face imagery are increasingly common in current forensic scenarios due to the prevalence of surveillance video, as well as the video/image recording capabilities of cell-phones. Effective solutions for triaging such imagery (i.e., low importance, moderate importance, and of critical interest) are not available in the literature. General issues for investigators in these scenarios are a lack of systems that can scale to large volumes of images, say 100M, and a lack of established methods for clustering the face images into the unknown number of identities. As such, Jain’s research group is investigating the problem of clustering large database of face images, attempting to group individuals together by identity. The computational requirements for handling these database are quite large; simply extracting descriptive features from 1 million face images could take ~20 hours on a single machine [4]. Aside from feature extraction, computing lists of the most similar individuals for every image in a large database (a necessary condition for some clustering methods) is costly. Typically, a single machine may take on the order of a week to process a single database but, leveraging Michigan State University’s High Performance Computing Center resources, this task can be accomplished in less than a day.

New Algorithms Speed Large-Scale Kernel-Based Clustering
Every day, massive amounts of data are generated through sensor-equipped devices, websites, social networks and financial transactions. Analysis of such large quantities of data can lead to useful insights and important decisions. Clustering is an exploratory learning technique which can be used to analyze data in an unsupervised manner. Jain’s research group focuses on developing efficient and accurate clustering algorithms, which can group tens of millions of high-dimensional data points. While kernel-based clustering can achieve high clustering accuracy by using non-linear inter-points similarity, they have high runtime and memory complexity. Clustering data sets containing billions of points using these algorithms would take many weeks and would need petabytes of memory. Jain’s group has designed efficient approximate variants of these kernel-based clustering algorithms that can cluster large data sets in a few hours [5]. These algorithms employ random sampling and matrix approximation techniques to reduce the runtime complexity of kernel-based clustering to linear time and reduce memory requirements. “We have been able to parallelize our algorithms and further reduce their running time,” Jain says. “For instance, we were able to reduce the time taken by our approximate kernel clustering algorithm to cluster 80 million images form the Tiny image data set [6] from about 9 hours on a single core to just about two minutes, using 100 cores in Michigan State University’s High Performance Computing Center. This task would have taken several weeks using the classical kernel-based clustering algorithms.”


[1] S. S. Arora, E. Liu, K. Cao and A. K. Jain, “Latent Fingerprint Matching: Performance Gain via feedback from Exemplar Prints”, IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 36, No. 12, pp. 2425-2465, December 2014.
[2] A. A. Paulino, J. Feng, and A. K. Jain, “Latent Fingerprint Matching Using Descriptor-Based Hough Transform”, IEEE Transactions on Information Forensics and Security, Vol. 8, pp. 31-45, January 2013.
[3] L. Best-Rowden and A. K. Jain, “A Longitudinal Study of Automatic Face Recognition”, 2015. (In Submission).
[4] C. Otto, A. K. Jain, and B. Klare, “An Efficient Approach For Clustering Face Images”, 2015. (In Submission).
[5] R. Chitta, R. Jin, T. C. Havens, and A.K. Jain (2014). Scalable Kernel Clustering: Approximate Kernel k-means. arXiv preprint arXiv:1402.3849
[6] Torralba, A., Fergus, R., & Freeman, W. T. (2008). 80 million tiny images: A large data set for nonparametric object and scene recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 30(11), 1958-1970.

Institute for Cyber-Enabled Research (iCER)

Comments are closed.