davidsandberg · shoyo-su · May 7, 2017 · May 31, 2017 · Jun 5, 2017 · Jun 22, 2017
diff --git a/.travis.yml b/.travis.yml
@@ -2,6 +2,7 @@ language: python
 sudo: required
 python:
   - "2.7"
+  - "3.5"
 # command to install dependencies
 install:
 # numpy not using wheel to avoid problem described in 

diff --git a/README.md b/README.md
@@ -1,47 +1,55 @@
-# Face Recognition using Tensorflow
+# Face Recognition using Tensorflow [![Build Status][travis-image]][travis]
+
+[travis-image]: http://travis-ci.org/davidsandberg/facenet.svg?branch=master
+[travis]: http://travis-ci.org/davidsandberg/facenet
+
 This is a TensorFlow implementation of the face recognizer described in the paper
-["FaceNet: A Unified Embedding for Face Recognition and Clustering"](http://arxiv.org/abs/1503.03832). The project also uses ideas from the paper ["A Discriminative Feature Learning Approach for Deep Face Recognition"](http://ydwen.github.io/papers/WenECCV16.pdf) as well as the paper ["Deep Face Recognition"](http://www.robots.ox.ac.uk/~vgg/publications/2015/Parkhi15/parkhi15.pdf) from the [Visual Geometry Group](http://www.robots.ox.ac.uk/~vgg/) at Oxford.
+["FaceNet: A Unified Embedding for Face Recognition and Clustering"](http://arxiv.org/abs/1503.03832). The project also uses ideas from the paper ["Deep Face Recognition"](http://www.robots.ox.ac.uk/~vgg/publications/2015/Parkhi15/parkhi15.pdf) from the [Visual Geometry Group](http://www.robots.ox.ac.uk/~vgg/) at Oxford.
 
-## Tensorflow release
-Currently this repo is compatible with Tensorflow r1.0.
+## Compatibility
+The code is tested using Tensorflow r1.7 under Ubuntu 14.04 with Python 2.7 and Python 3.5. The test cases can be found [here](https://github.com/davidsandberg/facenet/tree/master/test) and the results can be found [here](http://travis-ci.org/davidsandberg/facenet).
 
 ## News
 | Date     | Update |
 |----------|--------|
+| 2018-04-10 | Added new models trained on Casia-WebFace and VGGFace2 (see below). Note that the models uses fixed image standardization (see [wiki](https://github.com/davidsandberg/facenet/wiki/Training-using-the-VGGFace2-dataset)). |
+| 2018-03-31 | Added a new, more flexible input pipeline as well as a bunch of minor updates. |
 | 2017-05-13 | Removed a bunch of older non-slim models. Moved the last bottleneck layer into the respective models. Corrected normalization of Center Loss. |
 | 2017-05-06 | Added code to [train a classifier on your own images](https://github.com/davidsandberg/facenet/wiki/Train-a-classifier-on-own-images). Renamed facenet_train.py to train_tripletloss.py and facenet_train_classifier.py to train_softmax.py. |
 | 2017-03-02 | Added pretrained models that generate 128-dimensional embeddings.|
 | 2017-02-22 | Updated to Tensorflow r1.0. Added Continuous Integration using Travis-CI.|
 | 2017-02-03 | Added models where only trainable variables has been stored in the checkpoint. These are therefore significantly smaller. |
 | 2017-01-27 | Added a model trained on a subset of the MS-Celeb-1M dataset. The LFW accuracy of this model is around 0.994. |
-| 2017&#8209;01&#8209;02 | Updated to code to run with Tensorflow r0.12. Not sure if it runs with older versions of Tensorflow though.   |
+| 2017&#8209;01&#8209;02 | Updated to run with Tensorflow r0.12. Not sure if it runs with older versions of Tensorflow though.   |
 
 ## Pre-trained models
 | Model name      | LFW accuracy | Training dataset | Architecture |
 |-----------------|--------------|------------------|-------------|
-| [20170511-185253](https://drive.google.com/file/d/0B5MzpY9kBtDVOTVnU3NIaUdySFE) | 0.987        | CASIA-WebFace    | [Inception ResNet v1](https://github.com/davidsandberg/facenet/blob/master/src/models/inception_resnet_v1.py) |
-| [20170512-110547](https://drive.google.com/file/d/0B5MzpY9kBtDVZ2RpVDYwWmxoSUk) | 0.992        | MS-Celeb-1M      | [Inception ResNet v1](https://github.com/davidsandberg/facenet/blob/master/src/models/inception_resnet_v1.py) |
+| [20180408-102900](https://drive.google.com/open?id=1R77HmFADxe87GmoLwzfgMu_HY0IhcyBz) | 0.9905        | CASIA-WebFace    | [Inception ResNet v1](https://github.com/davidsandberg/facenet/blob/master/src/models/inception_resnet_v1.py) |
+| [20180402-114759](https://drive.google.com/open?id=1EXPBSXwTaqrSC0OhUdXNmKSh9qJUQ55-) | 0.9965        | VGGFace2      | [Inception ResNet v1](https://github.com/davidsandberg/facenet/blob/master/src/models/inception_resnet_v1.py) |
+
+NOTE: If you use any of the models, please do not forget to give proper credit to those providing the training dataset as well.
 
 ## Inspiration
 The code is heavily inspired by the [OpenFace](https://github.com/cmusatyalab/openface) implementation.
 
 ## Training data
 The [CASIA-WebFace](http://www.cbsr.ia.ac.cn/english/CASIA-WebFace-Database.html) dataset has been used for training. This training set consists of total of 453 453 images over 10 575 identities after face detection. Some performance improvement has been seen if the dataset has been filtered before training. Some more information about how this was done will come later.
-The best performing model has been trained on a subset of the [MS-Celeb-1M](https://www.microsoft.com/en-us/research/project/ms-celeb-1m-challenge-recognizing-one-million-celebrities-real-world/) dataset. This dataset is significantly larger but also contains significantly more label noise, and therefore it is crucial to apply dataset filtering on this dataset.
+The best performing model has been trained on the [VGGFace2](https://www.robots.ox.ac.uk/~vgg/data/vgg_face2/) dataset consisting of ~3.3M faces and ~9000 classes.
 
 ## Pre-processing
 
 ### Face alignment using MTCNN
-One problem with the above approach seems to be that the Dlib face detector misses some of the hard examples (partial occlusion, silhouettes, etc). This makes the training set to "easy" which causes the model to perform worse on other benchmarks.
+One problem with the above approach seems to be that the Dlib face detector misses some of the hard examples (partial occlusion, silhouettes, etc). This makes the training set too "easy" which causes the model to perform worse on other benchmarks.
 To solve this, other face landmark detectors has been tested. One face landmark detector that has proven to work very well in this setting is the
 [Multi-task CNN](https://kpzhang93.github.io/MTCNN_face_detection_alignment/index.html). A Matlab/Caffe implementation can be found [here](https://github.com/kpzhang93/MTCNN_face_detection_alignment) and this has been used for face alignment with very good results. A Python/Tensorflow implementation of MTCNN can be found [here](https://github.com/davidsandberg/facenet/tree/master/src/align). This implementation does not give identical results to the Matlab/Caffe implementation but the performance is very similar.
 
 ## Running training
-Currently, the best results are achieved by training the model as a classifier with the addition of [Center loss](http://ydwen.github.io/papers/WenECCV16.pdf). Details on how to train a model as a classifier can be found on the page [Classifier training of Inception-ResNet-v1](https://github.com/davidsandberg/facenet/wiki/Classifier-training-of-inception-resnet-v1).
+Currently, the best results are achieved by training the model using softmax loss. Details on how to train a model using softmax loss on the CASIA-WebFace dataset can be found on the page [Classifier training of Inception-ResNet-v1](https://github.com/davidsandberg/facenet/wiki/Classifier-training-of-inception-resnet-v1) and .
 
-## Pre-trained model
+## Pre-trained models
 ### Inception-ResNet-v1 model
-Currently, the best performing model is an Inception-Resnet-v1 model trained on CASIA-Webface aligned with [MTCNN](https://github.com/davidsandberg/facenet/tree/master/src/align).
+A couple of pretrained models are provided. They are trained using softmax loss with the Inception-Resnet-v1 model. The datasets has been aligned using [MTCNN](https://github.com/davidsandberg/facenet/tree/master/src/align).
 
 ## Performance
-The accuracy on LFW for the model [20170512-110547](https://drive.google.com/file/d/0B5MzpY9kBtDVZ2RpVDYwWmxoSUk) is 0.992+-0.003. A description of how to run the test can be found on the page [Validate on LFW](https://github.com/davidsandberg/facenet/wiki/Validate-on-lfw).
+The accuracy on LFW for the model [20180402-114759](https://drive.google.com/open?id=1EXPBSXwTaqrSC0OhUdXNmKSh9qJUQ55-) is 0.99650+-0.00252. A description of how to run the test can be found on the page [Validate on LFW](https://github.com/davidsandberg/facenet/wiki/Validate-on-lfw). Note that the input images to the model need to be standardized using fixed image standardization (use the option `--use_fixed_image_standardization` when running e.g. `validate_on_lfw.py`).
diff --git a/contributed/__init__.py b/contributed/__init__.py
diff --git a/tmp/batch_represent.py → contributed/batch_represent.py b/tmp/batch_represent.py → contributed/batch_represent.py
@@ -76,7 +76,7 @@
 import numpy as np
 from sklearn.datasets import load_files
 import tensorflow as tf
-
+from six.moves import xrange
 
 def main(args):
 

diff --git a/contributed/cluster.py b/contributed/cluster.py
@@ -0,0 +1,193 @@
+# MIT License
+#
+# Copyright (c) 2017 PXL University College
+#
+# Permission is hereby granted, free of charge, to any person obtaining a copy
+# of this software and associated documentation files (the "Software"), to deal
+# in the Software without restriction, including without limitation the rights
+# to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+# copies of the Software, and to permit persons to whom the Software is
+# furnished to do so, subject to the following conditions:
+#
+# The above copyright notice and this permission notice shall be included in all
+# copies or substantial portions of the Software.
+#
+# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+# FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+# AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+# LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+# OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+# SOFTWARE.
+
+# Clusters similar faces from input folder together in folders based on euclidean distance matrix
+
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+
+from scipy import misc
+import tensorflow as tf
+import numpy as np
+import os
+import sys
+import argparse
+import facenet
+import align.detect_face
+from sklearn.cluster import DBSCAN
+
+
+def main(args):
+    pnet, rnet, onet = create_network_face_detection(args.gpu_memory_fraction)
+
+    with tf.Graph().as_default():
+
+        with tf.Session() as sess:
+            facenet.load_model(args.model)
+
+            image_list = load_images_from_folder(args.data_dir)
+            images = align_data(image_list, args.image_size, args.margin, pnet, rnet, onet)
+
+            images_placeholder = sess.graph.get_tensor_by_name("input:0")
+            embeddings = sess.graph.get_tensor_by_name("embeddings:0")
+            phase_train_placeholder = sess.graph.get_tensor_by_name("phase_train:0")
+            feed_dict = {images_placeholder: images, phase_train_placeholder: False}
+            emb = sess.run(embeddings, feed_dict=feed_dict)
+
+            nrof_images = len(images)
+
+            matrix = np.zeros((nrof_images, nrof_images))
+
+            print('')
+            # Print distance matrix
+            print('Distance matrix')
+            print('    ', end='')
+            for i in range(nrof_images):
+                print('    %1d     ' % i, end='')
+            print('')
+            for i in range(nrof_images):
+                print('%1d  ' % i, end='')
+                for j in range(nrof_images):
+                    dist = np.sqrt(np.sum(np.square(np.subtract(emb[i, :], emb[j, :]))))
+                    matrix[i][j] = dist
+                    print('  %1.4f  ' % dist, end='')
+                print('')
+
+            print('')
+
+            # DBSCAN is the only algorithm that doesn't require the number of clusters to be defined.
+            db = DBSCAN(eps=args.cluster_threshold, min_samples=args.min_cluster_size, metric='precomputed')
+            db.fit(matrix)
+            labels = db.labels_
+
+            # get number of clusters
+            no_clusters = len(set(labels)) - (1 if -1 in labels else 0)
+
+            print('No of clusters:', no_clusters)
+
+            if no_clusters > 0:
+                if args.largest_cluster_only:
+                    largest_cluster = 0
+                    for i in range(no_clusters):
+                        print('Cluster {}: {}'.format(i, np.nonzero(labels == i)[0]))
+                        if len(np.nonzero(labels == i)[0]) > len(np.nonzero(labels == largest_cluster)[0]):
+                            largest_cluster = i
+                    print('Saving largest cluster (Cluster: {})'.format(largest_cluster))
+                    cnt = 1
+                    for i in np.nonzero(labels == largest_cluster)[0]:
+                        misc.imsave(os.path.join(args.out_dir, str(cnt) + '.png'), images[i])
+                        cnt += 1
+                else:
+                    print('Saving all clusters')
+                    for i in range(no_clusters):
+                        cnt = 1
+                        print('Cluster {}: {}'.format(i, np.nonzero(labels == i)[0]))
+                        path = os.path.join(args.out_dir, str(i))
+                        if not os.path.exists(path):
+                            os.makedirs(path)
+                            for j in np.nonzero(labels == i)[0]:
+                                misc.imsave(os.path.join(path, str(cnt) + '.png'), images[j])
+                                cnt += 1
+                        else:
+                            for j in np.nonzero(labels == i)[0]:
+                                misc.imsave(os.path.join(path, str(cnt) + '.png'), images[j])
+                                cnt += 1
+
+
+def align_data(image_list, image_size, margin, pnet, rnet, onet):
+    minsize = 20  # minimum size of face
+    threshold = [0.6, 0.7, 0.7]  # three steps's threshold
+    factor = 0.709  # scale factor
+
+    img_list = []
+
+    for x in xrange(len(image_list)):
+        img_size = np.asarray(image_list[x].shape)[0:2]
+        bounding_boxes, _ = align.detect_face.detect_face(image_list[x], minsize, pnet, rnet, onet, threshold, factor)
+        nrof_samples = len(bounding_boxes)
+        if nrof_samples > 0:
+            for i in xrange(nrof_samples):
+                if bounding_boxes[i][4] > 0.95:
+                    det = np.squeeze(bounding_boxes[i, 0:4])
+                    bb = np.zeros(4, dtype=np.int32)
+                    bb[0] = np.maximum(det[0] - margin / 2, 0)
+                    bb[1] = np.maximum(det[1] - margin / 2, 0)
+                    bb[2] = np.minimum(det[2] + margin / 2, img_size[1])
+                    bb[3] = np.minimum(det[3] + margin / 2, img_size[0])
+                    cropped = image_list[x][bb[1]:bb[3], bb[0]:bb[2], :]
+                    aligned = misc.imresize(cropped, (image_size, image_size), interp='bilinear')
+                    prewhitened = facenet.prewhiten(aligned)
+                    img_list.append(prewhitened)
+
+    if len(img_list) > 0:
+        images = np.stack(img_list)
+        return images
+    else:
+        return None
+
+
+def create_network_face_detection(gpu_memory_fraction):
+    with tf.Graph().as_default():
+        gpu_options = tf.GPUOptions(per_process_gpu_memory_fraction=gpu_memory_fraction)
+        sess = tf.Session(config=tf.ConfigProto(gpu_options=gpu_options, log_device_placement=False))
+        with sess.as_default():
+            pnet, rnet, onet = align.detect_face.create_mtcnn(sess, None)
+    return pnet, rnet, onet
+
+
+def load_images_from_folder(folder):
+    images = []
+    for filename in os.listdir(folder):
+        img = misc.imread(os.path.join(folder, filename))
+        if img is not None:
+            images.append(img)
+    return images
+
+
+def parse_arguments(argv):
+    parser = argparse.ArgumentParser()
+
+    parser.add_argument('model', type=str,
+                        help='Either a directory containing the meta_file and ckpt_file or a model protobuf (.pb) file')
+    parser.add_argument('data_dir', type=str,
+                        help='The directory containing the images to cluster into folders.')
+    parser.add_argument('out_dir', type=str,
+                        help='The output directory where the image clusters will be saved.')
+    parser.add_argument('--image_size', type=int,
+                        help='Image size (height, width) in pixels.', default=160)
+    parser.add_argument('--margin', type=int,
+                        help='Margin for the crop around the bounding box (height, width) in pixels.', default=44)
+    parser.add_argument('--min_cluster_size', type=int,
+                        help='The minimum amount of pictures required for a cluster.', default=1)
+    parser.add_argument('--cluster_threshold', type=float,
+                        help='The minimum distance for faces to be in the same cluster', default=1.0)
+    parser.add_argument('--largest_cluster_only', action='store_true',
+                        help='This argument will make that only the biggest cluster is saved.')
+    parser.add_argument('--gpu_memory_fraction', type=float,
+                        help='Upper bound on the amount of GPU memory that will be used by the process.', default=1.0)
+
+    return parser.parse_args(argv)
+
+
+if __name__ == '__main__':
+    main(parse_arguments(sys.argv[1:]))