Reverse image search 2.0 #401

liamwhite · 2025-01-13T00:36:07Z

This PR replaces the old "image intensities" reverse image search, and has come about due to the confluence of several key factors within the past year:

Computer vision utilities like those in PyTorch have become more accessible than ever, with native language bindings like tch-rs removing the need for a Python server
The self-distillation vision transformers DINOv2 and DINOv2 with registers have been released, which come with pretrained weights to extract semantic features from images without the need for a finetuned head. The authors claim that these systems can extract robust features on any type of downstream task as-is. I believe they are underselling how good it is, and found the recall to be excellent during model selection.
The OpenSearch project has released the k-NN plugin, which enables nearest neighbor search over dense vectors, like the kind representing the CLS token of a ViT.

Together, these factors are used to implement a reverse image search system that uses semantic meaning in the images to identify them, rather than their overall appearance. To establish what is meant by this, here are some examples of an original image and matches found when executing on Derpibooru:

Demo	Result
Line art
Hamburger
Trixie
Scenery

The fact that DINOv2 has semantic extraction can be determined through generated attention maps for these images. The code to generate these attention maps can be found in this repository. These have been reprocessed at a higher scale for visibility:

Scaled original	Attention map

The system works as follows:

Image/video is previewed into a raw RGB bitmap
Bitmap is resampled to model target dimensions
Classification vector is retrieved from model
Classification vector is normalized to convert the k-NN search into one ordered by cosine similarity, and delivered back to the application
For indexing, the normalized vector is stored as a nested field into into image search index; for search, the nearest neighbors are retrieved using a HNSW index

Indexing the classification vector using a nested field allows for the possibility of extracting multiple vectors from each image, and the database table has been set up to allow this should it be desired in the future.

I have pre-computed the DINOv2 with registers features for ~3.5M images on Derpibooru, ~400K images on Furbooru, and ~35K images on Tantabus. Batch inference was run on a 3060 Ti using code from this repository, with the entire process heavily bottlenecked by memory copy bandwidth and image decode performance rather than the GPU execution itself. However, the inference code is efficient enough to run on a CPU in less than 0.5 seconds per image, and this is what is implemented in the repository (with the expectation that there will be no GPU requirement on the server).

This PR must not be merged until OpenSearch releases version 2.19, as 2.18 contains a critical bug that prevents the system from working in all cases. Other bugs
relating to filtering may or may not also be fixed in the 2.19 release, but have been worked around for now.

This PR must also not be merged until its dependents #389 and #400 are merged.

Fixes #331 (method outdated)

Meow · 2025-01-13T06:56:54Z

If this is meant to be a replacement to the deduper, I require that this is tested extensively to ensure there are no false-negatives. I'd rather have 10 dupe reports, 1 of which is correct, than less reports but duplicate images happily live on site.

VcSaJen · 2025-01-16T09:51:17Z

I can see that it can successfully detect mirrored images and variations of the same image. Can it detect slightly cropped images, images with different brightness levels, etc? How does it perform on very thin and tall (webtoon-manhwa-like comics) images?

liamwhite · 2025-01-16T14:32:48Z

@VcSaJen

slightly cropped images

Yes, it can find images which are quite a bit more than slightly cropped. In this case the overall scores are much lower (around 0.7 cosine similarity, which is not indicative of a duplicate, vs >0.9 for actual duplicates) and the original may not score the highest in the result list.

images with different brightness levels

Absolutely no problem handling this. Example with 0.98 cosine similarty

Original	Brightened

very thin and tall (webtoon-manhwa-like comics) images

The performance is reasonably good, although the features are not terribly stable, if the exact same comic image is also reverse searched. It doesn't find panels or crops well, though I added enough flexibility that this could become possible in the future.

liamwhite added 7 commits January 11, 2025 17:37

Add maintenance stream wrapper

bcae416

Add reindexing command line with rate and time estimate

07ee436

Pull out media server and handling into separate container

7a54049

Merge branch 'indexing-cli' into media-server-and-ris

beb7cd4

Add feature extraction pipeline to mediaproc

2bd7ddf

Add feature extraction and importing pipeline to Philomena

12d3809

Add feature-based reverse search interface

0ff502e

liamwhite marked this pull request as draft January 13, 2025 00:42

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reverse image search 2.0 #401

Reverse image search 2.0 #401

liamwhite commented Jan 13, 2025

Meow commented Jan 13, 2025

VcSaJen commented Jan 16, 2025 •

edited

Loading

liamwhite commented Jan 16, 2025 •

edited

Loading

Reverse image search 2.0 #401

Are you sure you want to change the base?

Reverse image search 2.0 #401

Conversation

liamwhite commented Jan 13, 2025

Meow commented Jan 13, 2025

VcSaJen commented Jan 16, 2025 • edited Loading

liamwhite commented Jan 16, 2025 • edited Loading

VcSaJen commented Jan 16, 2025 •

edited

Loading

liamwhite commented Jan 16, 2025 •

edited

Loading