#Image reranking

40 messages · Page 1 of 1 (latest)

final dome
#

I want to create a robust image retrieval for my dataset of images. I'm using the qdrant cluster to store my image embeddings (using finetuned dinov2). My dataset has a lot of intricate artwork abstract and cubism. Using my embeddings I'm able to do a similarity search given a query image and I can retrieve the top 5 similar images. The retrieved images aren't good enough. There still needs to be reranking done. I find it difficult to rerank the retrieved images. Since text is easier to rerank, I generated captions for my dataset using the google gemini model. I'm not really happy with the captions. Since these are abstract drawings and colours it got a lot of the detail wrong. Can someone help give me direction as to how I proceed with reranking the images, either using text or image. I'm not happy with the captions generated despite refining the prompt

final dome
#

@tropic dawn

vocal nexus
#

Maybe create a small dataset ranked images and fine tune the model on it

final dome
final dome
vocal nexus
tropic dawn
# final dome I want to create a robust image retrieval for my dataset of images. I'm using th...

You don't necessarily have to rerank based on content. If your application has natural signal, such as clicks, you can just rerank based on probability to be clicked on. Basically, you can concatenate user embedding + image embedding, pass to a small model that predict probability to be clicked on. See more on Learning to Rank: https://towardsdatascience.com/learning-to-rank-a-complete-guide-to-ranking-using-machine-learning-4c9688d370d4

If the ranking is subjective, I would just annotate given a query, what was clicked on, and create a continuous model to learn about given a query and list of images, what is most likely to be clicked on

Medium

Sorting items by relevance is crucial for information retrieval and recommender systems.

final dome
# tropic dawn You don't necessarily have to rerank based on content. If your application has n...

Hey Ian!! Thank you so much for replying! So mine isn’t really a text- image search like google images. Mine is like an image-image search. Im given a query image and based on its content I need to pick the correct index image. The query is basically an instance of one of the index images. The last time we spoke you gave me an idea of captioning the images to text and working with those embeddings instead, and you also told me how important reranking is. But, I’m having trouble with that. Captions generated after prompting aren’t working well. I tried the gemini model for captioning (it takes over 3-4 secs to generate a caption for an image), and the captions aren’t that accurate. Worst part is when it return no result, cause gemini is not able to describe it. Im stuck with this issue.

final dome
final dome
final dome
vocal nexus
final dome
final dome
vocal nexus
final dome
vocal nexus
final dome
vocal nexus
#

I am talking about the rankings, not the images

final dome
final dome
vocal nexus
final dome
vocal nexus
final dome
vocal nexus
#

of course, with 4 additional crops you would have to perform 5 forward passes to compute all embeddings and you would get combined embedding vectors that are 5 times bigger

final dome
vocal nexus
final dome
#

But again that's just for candidate selection right. I wasted a lot of time trying to finetune unsupervised models to give me "GOOD" embeddings, but what about the subjective context. That is reranking based on the query image. Do you really think this will help rerank images based on the query image

vocal nexus
final dome
#

so this is the transform I used initially:

import torchvision.transforms as T

transform = T.Compose([
    T.Resize(256, interpolation=T.InterpolationMode.BICUBIC),
    T.ToTensor(),
    T.Normalize(mean=(0.485, 0.456, 0.406), std=(0.229, 0.224, 0.225)),
])

I'll try what you're suggesting as well

vocal nexus
#

In any case, I think you have to identify first principles to guide the similarity first. Do you want the images to be ranked depending on the composition? the colors? the style? the artistic current? the content? the quality? something else?
This will guide you towards one method of ranking or another.

final dome
tropic dawn
# final dome Just pondered over your message again. No image has a higher chance of being cli...

@final dome You have a design flaw in your application then. On one hand, you're saying there's "correct index image", on the other hand, you're just randomly selecting from candidates and stating there's no higher chance of one being better than the other. Is there correct index image or not? If there is, annotate yourself. If not, redesign he application to show top N (like top 10) candidates. I would do both annotation and redesign

final dome
# tropic dawn <@1283718221186207830> You have a design flaw in your application then. On one h...

The candidates are being selected by vector search, and there is a higher chance of one being better than the other SUBJECTIVELY. As in, given a query images YES, there is a candidate better than the rest. But all this comes after retrieval. Right now I'm retrieving the top 10 best candidates like you said, using cosine similarity search. I'm using the DINO embeddings to represent my images. So in the retrieved images, YES there is a candidate better than the rest. But how do I choose the best candidate out of the top 10 retrieved ones. I didn't understand you the first time, I thought you meant overall a index image being the better candidate.

tropic dawn
# final dome The candidates are being selected by vector search, and there is a higher chance...

No, reranking is done after retrieval of candidates. To make it simpler, this is step by step:

  1. Create a bunch of sample queries yourself, so collect say 500 images
  2. Upload, retrieve, and select the image you would choose yourself. It's fine if it's subjective for now. This way, you basically have 2 inputs and 1 output. Say you retrieve 10 candidates, you basically have
# For the correct candidate
{"input_1": query_image_embedding, "input_2": candidate_image_embedding, "target": 1}

# For the other 9 incorrect candidates
{"input_1": query_image_embedding, "input_2": candidate_image_embedding, "target": 0}
  1. After going through 500 images, design and train a small model that takes both embedding, and output that binary classification. It should add probability to being selected given query_image_embedding and candidate_image_embedding.
  2. Add this model to your application, run it after retrieving the 10 candidates, and rerank the 10 candidates based on prediction score
vocal nexus
#

Wouldn't it be much more efficient to just better train the embedding model so that the top k retrieval already returns correctly ranked images?
Maybe it makes it harder to train the model using user interactions though. I probable would create an interface just for collecting ranking data in any case.

final dome
# tropic dawn No, reranking is done **after** retrieval of candidates. To make it simpler, thi...

Yes, I understand that. You actually told me how important reranking when i was blindly trying to figure out the best model. Wasted a lot of time training those. So I looked into the papers with code links you shared earlier and for image reranking the code and proof isn’t legit, thats why I reached back to see if you had worked on this before. This logic is fantastic!!! Thanks a lot, man!! Here I was, breaking my head trying to caption my images and finetune clip. This sounds much easier. Thanks a lot for everything