-
-
Notifications
You must be signed in to change notification settings - Fork 37
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reverse image search 2.0 #401
base: master
Are you sure you want to change the base?
Conversation
If this is meant to be a replacement to the deduper, I require that this is tested extensively to ensure there are no false-negatives. I'd rather have 10 dupe reports, 1 of which is correct, than less reports but duplicate images happily live on site. |
I can see that it can successfully detect mirrored images and variations of the same image. Can it detect slightly cropped images, images with different brightness levels, etc? How does it perform on very thin and tall (webtoon-manhwa-like comics) images? |
This PR replaces the old "image intensities" reverse image search, and has come about due to the confluence of several key factors within the past year:
Together, these factors are used to implement a reverse image search system that uses semantic meaning in the images to identify them, rather than their overall appearance. To establish what is meant by this, here are some examples of an original image and matches found when executing on Derpibooru:
The fact that DINOv2 has semantic extraction can be determined through generated attention maps for these images. The code to generate these attention maps can be found in this repository. These have been reprocessed at a higher scale for visibility:
The system works as follows:
Indexing the classification vector using a nested field allows for the possibility of extracting multiple vectors from each image, and the database table has been set up to allow this should it be desired in the future.
I have pre-computed the DINOv2 with registers features for ~3.5M images on Derpibooru, ~400K images on Furbooru, and ~35K images on Tantabus. Batch inference was run on a 3060 Ti using code from this repository, with the entire process heavily bottlenecked by memory copy bandwidth and image decode performance rather than the GPU execution itself. However, the inference code is efficient enough to run on a CPU in less than 0.5 seconds per image, and this is what is implemented in the repository (with the expectation that there will be no GPU requirement on the server).
This PR must not be merged until OpenSearch releases version 2.19, as 2.18 contains a critical bug that prevents the system from working in all cases. Other bugs
relating to filtering may or may not also be fixed in the 2.19 release, but have been worked around for now.
This PR must also not be merged until its dependents #389 and #400 are merged.
Fixes #331 (method outdated)