I have managed, for the analysis of the Geocities torrent released by the Archive Team, to create a very primitive and effective way for image similarity search that works on the basis of fulltext indexing.
The main goal was to find related versions of classic graphics web users copied and modified to include in their home pages. It is possible with this to find for example animated versions of previously not animated GIF or the other way round, track different sizes, floodfill color changes, etc. It works well for graphical images, not so well for photographs, will not handle object recognition or cropping.
While I do not know what kind of images you are trying to tackle, my goal was to find those that were used like an alphabet. Think under construction with modified blinking rates, a rotating globe GIF that would also turn the other way round, an animation that was scaled, a cartoon outline that was slightly changed, a static image that was later animated, ... how 1990's web culture worked.
It is not very fancy because this all has to work and be quick on a home computer, but has enabled a lot of research.
What I did was to explode any frame of an image file, then scale these to a standard size. After that, reducing the bit depth per pixel. Converting the result to a string (in the case of animation, several strings) that are attached to the image as metadata. Other important metadata is the pixel dimensions, which can be used for searching as-is or converted to a rounded edge length ratio.
In the case of Geocities, resizing to 4x4 pixels and reducing to 3bpp (= 8 possible colors or gray scales) yielded very good results. Since fulltext search works best with a reduced set of words, this did the trick quite well.
An example fulltext search 'document' for an image could look like this:
320x240 4:3 0f0-000-0ff-000-0f0-000-0ff-000-0f0-000-0ff-000-0f0-000-0ff-000 0f0-000-0ff-000-0f0-000-0ff-0f0-0f0-000-0ff-000-0f0-000-0ff-0f0
for an animation with two frames. Query to look for similar ones would look like
4:3 & (0f0-000-0ff-000-0f0-000-0ff-000-0f0-000-0ff-000-0f0-000-0ff-000 | 0f0-000-0ff-000-0f0-000-0ff-0f0-0f0-000-0ff-000-0f0-000-0ff-0f0)
I did this with a bit of ImageMagick (which crashes ever few 100'000 images) for a few million image files. Integrates well with my existing way of categorizing files, didn't require a new technology stack, kept the research inside one database, manageable by two people :)
Some of the results are kinda nice: