-
Notifications
You must be signed in to change notification settings - Fork 60
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Face alignment for increased TAR@FAR (after training) and couple more thoughts #9
Comments
I think properly aligning the faces in the VGGFace2 dataset would be a good idea, also for aligning the faces in a facial recognition system. I am assuming face alignment requires that the face detection model predicts the landmarks of the eyes to use as a reference for alignment as described by Adrian Rosebrok in this blogpost. Unfortunately, the MTCNN model I have used from David Sandberg's repository only predicts the bounding box coordinates of the faces and does not predict actual facial landmarks like the eyes if I am not mistaken, so another face detection model would need to be used. Could you explain more on how you implemented face alignment? Maintaining the aspect ratio of the face might yield better results and it might not. I have personally seen that maintaining the aspect ratio of retinal images did not really improve with classification performance in another project, but of course it depends on the data itself. As far as I know, PyTorch Tensors normalize the pixel values to [0, 1]. I have used a script from a video on youtube by Aladdin Persson to calculate the mean and standard deviation values for each channel in the RGB channels to normalize the input images according to the mean and standard deviation of the VGGFace2 dataset (mean and std of the pytorch tensors of the images). I am not sure if this is the best range as well but it is based on the training dataset as good practice. |
Yes, I can post the code. It is integrated well within a copy of your validate_on_LFW.py script. For aligner tests, we use the notebook: |
So far:
|
Hello AGenchev, It is unfortunate the aligner did not perform so well. Does the Jetson TX2 share the memory for both the CPU and the CUDA cores? If so, that would be problematic since the generated triplets would also be loaded in memory, lowering the amount of iterations per epoch would help. But even if it managed to fit into memory the low number of CUDA cores would make the training speed too low in my opinion. Thank you for the triplet generation enhancement. However, I have modified it to bypass the computationally heavy dataframe.loc() method by instead looking up dictionaries of the used columns in the .loc operation, the triplet generation process will now use more memory but finish in a matter of seconds instead of hours. Note: I have used the glint360k dataset for a preliminary experiment on two Quadro RTX 8000 GPUs and managed to get 99% LFW accuracy on a resnet-34 model but that required a batch size of 840 and a lot of computation time (around two weeks). I may instead switch the training dataset to the glint360k dataset instead of vggface2 for this repository because of the better results but I will only be able to utilize my own hardware instead of the hardware of the institution I work for unfortunately if I would want to share a trained model. I will do an experiment with reduced image size to try to squeeze more triplets in the batch size since I haven't noticed significant differences in LFW performance in relation to image size (at least images with sizes not above 224x224). |
That's good news ! I'll check your faster triplet code, your solution is a bit different than mine, yet still using dictionaries. For the hobbyist in me, the newer GPUs are out of stock, so I consider to buy an old K80, though unsure whether I'll manage to run recent stuff on its way old CUDA caps. The Jetson performed worse than 6GB GPU because it seems it needs much more host memory besides the GPU memory. |
I didn't manage to download the whole Glint360k (no seed) so I'm on my modified/filtered/aligned version of Vgg2. It seems they (Glint360k authors) merged whatever face DB they found, so validating on LFW is less or more compromised. I think that in Glint360k there is overlapping with LFW, because the LFW identities are also present in MSCeleb1M and its versions (Emore, RefineMS1M,etc). For me it is interesting did you pre-process the faces to stretch them (like in your preprocessed Vgg2 version you provided to me) ? |
The current GPU shortage is indeed really annoying. However, be careful about getting an older GPU as I think some GPUs would be no longer supported by newer NVIDIA CUDA toolkits after some time but I might be wrong about that. I got lucky with the torrent and managed to download the glint360k, here is a link for the 224x224 MTCNN version of glint360k: link I will add the link to the README later once the current experiment is done so I can change the repository to focus on glint360k instead of VGGFace2. Unfortunately, I am not able to upload the original dataset since my google drive subscription's storage is used for other things as well. That is a good point regarding glint360k. However, I was not able to find a metadata file of the dataset to figure out which overlapping identities exist in glint360k and lfw to remove the overlapping folders. I will look into EfficientNet in the future. I tried training a MobileNet-V2 model but unfortunately even though it has a relatively low number of parameters; during training it takes a lot more GPU memory than ResNet-34 which forced a lower batch size so I decided to continue with ResNets instead. |
I still explore what could be achieved with the old version on filtered VGG2 and ResNet34 with constrained GPU memory. Had to reduce the batch size to 144 with 16 identities to be able to run on a 16G Tesla P100 in the cloud. This gave me only 77197 training triplets on the 5th epoch among 1152000 generated. At least I'm going. Decided not to buy 24GB К80, because this is a dual GPU card and torch might see it as 2x12GB GPUs, which would be unusable. Also its CUDA caps is low, so to run recent frameworks on this, one needs to compile them from source with older CUDA libs. |
Hopefully the RTX 3090 shortage will get better soon. |
For the 99% model, did you change the optimizer from "adagrad" to something which helps it to converge faster ? I see that after 6th epoch, the accuracy improvements from epoch to epoch became very small (but it already got 91% on my version of LFW) |
I used the same adagrad settings as the current available model. It took until epoch 80+ to reach 99% but it then kept fluctuating between 99% and 98.8% accuracy afterwards. It would get a bit costly if running on a cloud virtual machine instance. |
OK, to share, I train with 128-D vectors to see the limits of lower-d embedding. At epoch 24, it achieved 95.9%. 98% seems unattainable for now, but we'll see. Also, it seems Glint360k torrent went alive, so I'll look what's inside. |
Good luck |
It went up to 97.6 % (on my VGG2) |
Hello @AGenchev Glad to see it improved. I was manually decreasing the learning rate because I was running the experiments on my own PC which I was also using for other things. So I would run 4 or 5 epochs while I was at work and then stop the training to do other stuff in the evening and then continue the training by newly constructing the optimizer object. That is why I haven't added a learning rate scheduler like the MultiStepLR scheduler to the training script. I am assuming (based on my memory which most likely will be incorrect) that by saving the optimizer state dict and then reloading the saved state dict it wouldn't cause too much issues (maybe in the first epoch). I am not sure on the specific details of the Adagrad optimizer to be honest, so it might be the case that since each parameter group would have a different learning rate value to avoid too much oscillation of the loss function gradient; using the naive manual setting of the learning rate by reconstructing the optimizer object and loading the saved state dict would not be the most optimal way to do it. To be clear, I have only tried using SGD and Adagrad only so far. Edit: I have added a download link to the raw unpacked glint360k dataset that I recently uploaded to my Google Drive to the README if you are interested. link |
Thanks for Glint360k. I suggested it, but then I didn't train on it because its bigger size. I succeeded to download it. Maybe when I train for production. |
Yes, it allowed bigger batches. |
@tamerthamoqa
Hello again! Your pre-trained model is trained on unaligned VGG2 dataset, so it performs well with variances over pose. But many projects pre-process the images to obtain aligned faces which helps them to increase the TAR @ FAR score with given CNN model.
So I wonder are you interested in testing what can we get with face alignment ?
I implemented face align as transformation for the
torchvision.transforms
which let me test your pre-trained model on the raw LFW with this transform. It obtained TAR: 0.6640+-0.0389 @ FAR: 0.0010 without training and without face-stretching, which I think is promising. Unfortunately it can not be used with the cropped VGG2 and LFW for training/testing, because the faces are deformed/stretched (although it can be made to stretch the faces as well) and some face detections fail.Next thing I'm not sure about is whether we can obtain less false-positives if the input faces are not stretched but preserve their shape. This leads to the next question - why the input is chosen to be square 224×224 ? Can't we change it to rectangle (for example 208×240) to better fit the human face instead of stretching the (aligned) faces ?
I also see that the normalized tensors RGB values have range [-2;2] is this the best range ?
The text was updated successfully, but these errors were encountered: