Image reranking | Learn AI Together | Page 1

final dome Nov 6, 2024, 7:58 AM

#

I want to create a robust image retrieval for my dataset of images. I'm using the qdrant cluster to store my image embeddings (using finetuned dinov2). My dataset has a lot of intricate artwork abstract and cubism. Using my embeddings I'm able to do a similarity search given a query image and I can retrieve the top 5 similar images. The retrieved images aren't good enough. There still needs to be reranking done. I find it difficult to rerank the retrieved images. Since text is easier to rerank, I generated captions for my dataset using the google gemini model. I'm not really happy with the captions. Since these are abstract drawings and colours it got a lot of the detail wrong. Can someone help give me direction as to how I proceed with reranking the images, either using text or image. I'm not happy with the captions generated despite refining the prompt

final dome Nov 6, 2024, 10:21 AM

#

@tropic dawn

vocal nexus Nov 6, 2024, 10:42 AM

#

Maybe create a small dataset ranked images and fine tune the model on it

final dome Nov 6, 2024, 10:51 AM

#

vocal nexus Maybe create a small dataset ranked images and fine tune the model on it

But images need to be ranked based on a query image right. When the query image is dynamic how can we make a dataset

final dome Nov 6, 2024, 10:53 AM

#

vocal nexus Maybe create a small dataset ranked images and fine tune the model on it

My task is to do image retrieval: I have 500 index images (These are HD DSLR pics) and there are query images which have instances of the query image in them. My goal is to pic the right index image given a query image. Thats why i finetuned dino to generate embeddings. Now I need to focus on reranking the retrieved images

vocal nexus Nov 6, 2024, 11:30 AM

#

final dome But images need to be ranked based on a query image right. When the query image ...

you make a dataset by taking reference images at random and manually ranking other images chosen at random against it I would say

tropic dawn Nov 6, 2024, 12:17 PM

#

final dome I want to create a robust image retrieval for my dataset of images. I'm using th...

You don't necessarily have to rerank based on content. If your application has natural signal, such as clicks, you can just rerank based on probability to be clicked on. Basically, you can concatenate user embedding + image embedding, pass to a small model that predict probability to be clicked on. See more on Learning to Rank: https://towardsdatascience.com/learning-to-rank-a-complete-guide-to-ranking-using-machine-learning-4c9688d370d4

If the ranking is subjective, I would just annotate given a query, what was clicked on, and create a continuous model to learn about given a query and list of images, what is most likely to be clicked on

Medium

Learning to Rank: A Complete Guide to Ranking using Machine Learning

Sorting items by relevance is crucial for information retrieval and recommender systems.

final dome Nov 6, 2024, 4:10 PM

#

tropic dawn You don't necessarily have to rerank based on content. If your application has n...

Hey Ian!! Thank you so much for replying! So mine isn’t really a text- image search like google images. Mine is like an image-image search. Im given a query image and based on its content I need to pick the correct index image. The query is basically an instance of one of the index images. The last time we spoke you gave me an idea of captioning the images to text and working with those embeddings instead, and you also told me how important reranking is. But, I’m having trouble with that. Captions generated after prompting aren’t working well. I tried the gemini model for captioning (it takes over 3-4 secs to generate a caption for an image), and the captions aren’t that accurate. Worst part is when it return no result, cause gemini is not able to describe it. Im stuck with this issue.

final dome Nov 6, 2024, 4:11 PM

#

tropic dawn You don't necessarily have to rerank based on content. If your application has n...

So there is no click in my application. Rather just a query image taken by a user which must contain an instance of any index image.

final dome Nov 7, 2024, 4:13 AM

#

vocal nexus you make a dataset by taking reference images at random and manually ranking oth...

Hey this is a really nice idea. And I was actually trying that out last night. BUT.. later on I will have to register more images. Right now my dataset is small, but when I add more images to the mix, it makes no sense to keep ranking them manually right.

final dome Nov 7, 2024, 4:18 AM

#

tropic dawn You don't necessarily have to rerank based on content. If your application has n...

Just pondered over your message again. No image has a higher chance of being clicked on. This application doesnt involve clicking. its more like scanning an image and querying it. The query image is random and could be ANY of an index image instance or NONE.

vocal nexus Nov 7, 2024, 7:39 AM

#

final dome Hey this is a really nice idea. And I was actually trying that out last night. B...

The more you have ground truth data about the ranking preference you want to have, the more the model will be able to generalize this ranking to other images. In deep learning, more is better.

final dome Nov 7, 2024, 7:45 AM

#

vocal nexus The more you have ground truth data about the ranking preference you want to hav...

see that's the thing. I don't have a lot of ground truth data right now. And I'm sure while registering there will be max only 100 different images.

final dome Nov 7, 2024, 7:45 AM

#

vocal nexus The more you have ground truth data about the ranking preference you want to hav...

And since my images are artwork even I'm not sure how similar or dissimilar some work are.

vocal nexus Nov 7, 2024, 7:50 AM

#

final dome And since my images are artwork even I'm not sure how similar or dissimilar some...

If you don't know if the images are supposed to be similar or not, you should probably not expect the AI to do a better job

final dome Nov 7, 2024, 7:53 AM

#

vocal nexus If you don't know if the images are supposed to be similar or not, you should pr...

I mean I have a vague idea. But the retriever did a good job finding similar images and thats all AI as well. Why can't the reranking space be made into something like this.

vocal nexus Nov 7, 2024, 7:59 AM

#

final dome I mean I have a vague idea. But the retriever did a good job finding similar ima...

what is your ground truth reference exactly?

final dome Nov 7, 2024, 8:00 AM

#

vocal nexus what is your ground truth reference exactly?

the HD DSLR Artwork images

vocal nexus Nov 7, 2024, 8:01 AM

#

final dome the HD DSLR Artwork images

and it gives you good rankings?

#

I am talking about the rankings, not the images

final dome Nov 7, 2024, 8:01 AM

#

vocal nexus and it gives you good rankings?

rankings? I haven't done reranking yet. The retriever works fine

final dome Nov 7, 2024, 8:02 AM

#

vocal nexus I am talking about the rankings, not the images

I'm not able to do the ranking

vocal nexus Nov 7, 2024, 8:06 AM

#

final dome I'm not able to do the ranking

so if I understand correctly, you have a model for ranking, but you are not happy with it, and you have no reference ranking model or ground truth ranking data?

final dome Nov 7, 2024, 8:08 AM

#

vocal nexus so if I understand correctly, you have a model for ranking, but you are not happ...

noo I don't have model for ranking. I want a model for image ranking. The ones in paper with code are either not under and MIT License, or are not clear with their instruction, so I'm not able to rerank the images. Since reranking text is easier, I was looking into captioning the images. And yes, I have no ground truth ranking data.

vocal nexus Nov 7, 2024, 8:16 AM

#

final dome noo I don't have model for ranking. I want a model for image ranking. The ones i...

then maybe you can try other embedding similarity functions, L1 distance, cosine similarity, etc.

final dome Nov 7, 2024, 8:17 AM

#

vocal nexus then maybe you can try other embedding similarity functions, L1 distance, cosine...

I'm using cosine similarity to retrieve the embeddings. L1 didn't work well with retrieval

vocal nexus Nov 7, 2024, 8:20 AM

#

final dome I'm using cosine similarity to retrieve the embeddings. L1 didn't work well with...

then maybe you can try something like this:
do not simply embed the entire image, but also embed crops of the image (ex: top left, top right, bottom left, bottom right), concat the embeddings and compute the similarity on the concatenated vectors

#

of course, with 4 additional crops you would have to perform 5 forward passes to compute all embeddings and you would get combined embedding vectors that are 5 times bigger

final dome Nov 7, 2024, 8:28 AM

#

vocal nexus then maybe you can try something like this: do not simply embed the entire image...

I'm sorry I don't understand you, how will this help in reranking?

vocal nexus Nov 7, 2024, 8:28 AM

#

final dome I'm sorry I don't understand you, how will this help in reranking?

it might give you a better ranking in the first place

final dome Nov 7, 2024, 8:39 AM

#

But again that's just for candidate selection right. I wasted a lot of time trying to finetune unsupervised models to give me "GOOD" embeddings, but what about the subjective context. That is reranking based on the query image. Do you really think this will help rerank images based on the query image

vocal nexus Nov 7, 2024, 8:50 AM

#

final dome But again that's just for candidate selection right. I wasted a lot of time tryi...

Well, depending on how you crop your images it might improve the re-ranking. For instance with embeddings of the 4 quadrants, it might help ranking the images based on the similarity of the composition since corresponding quadrants should have similar embeddings

final dome Nov 7, 2024, 9:26 AM

#

so this is the transform I used initially:

import torchvision.transforms as T

transform = T.Compose([
    T.Resize(256, interpolation=T.InterpolationMode.BICUBIC),
    T.ToTensor(),
    T.Normalize(mean=(0.485, 0.456, 0.406), std=(0.229, 0.224, 0.225)),
])

I'll try what you're suggesting as well

vocal nexus Nov 7, 2024, 9:32 AM

#

In any case, I think you have to identify first principles to guide the similarity first. Do you want the images to be ranked depending on the composition? the colors? the style? the artistic current? the content? the quality? something else?
This will guide you towards one method of ranking or another.

final dome Nov 7, 2024, 9:35 AM

#

vocal nexus In any case, I think you have to identify first principles to guide the similari...

for example given a query image. sometimes while retrieving it retrieves another artwork instead of the preferred index artwork. this is because these two are very very similar. they're from the same artist, same colours are used, similar sketches

tropic dawn Nov 7, 2024, 11:35 AM

#

final dome Just pondered over your message again. No image has a higher chance of being cli...

@final dome You have a design flaw in your application then. On one hand, you're saying there's "correct index image", on the other hand, you're just randomly selecting from candidates and stating there's no higher chance of one being better than the other. Is there correct index image or not? If there is, annotate yourself. If not, redesign he application to show top N (like top 10) candidates. I would do both annotation and redesign

final dome Nov 7, 2024, 2:40 PM

#

tropic dawn <@1283718221186207830> You have a design flaw in your application then. On one h...

The candidates are being selected by vector search, and there is a higher chance of one being better than the other SUBJECTIVELY. As in, given a query images YES, there is a candidate better than the rest. But all this comes after retrieval. Right now I'm retrieving the top 10 best candidates like you said, using cosine similarity search. I'm using the DINO embeddings to represent my images. So in the retrieved images, YES there is a candidate better than the rest. But how do I choose the best candidate out of the top 10 retrieved ones. I didn't understand you the first time, I thought you meant overall a index image being the better candidate.

tropic dawn Nov 8, 2024, 2:14 PM

#

final dome The candidates are being selected by vector search, and there is a higher chance...

No, reranking is done after retrieval of candidates. To make it simpler, this is step by step:

Create a bunch of sample queries yourself, so collect say 500 images
Upload, retrieve, and select the image you would choose yourself. It's fine if it's subjective for now. This way, you basically have 2 inputs and 1 output. Say you retrieve 10 candidates, you basically have

# For the correct candidate
{"input_1": query_image_embedding, "input_2": candidate_image_embedding, "target": 1}

# For the other 9 incorrect candidates
{"input_1": query_image_embedding, "input_2": candidate_image_embedding, "target": 0}

After going through 500 images, design and train a small model that takes both embedding, and output that binary classification. It should add probability to being selected given query_image_embedding and candidate_image_embedding.
Add this model to your application, run it after retrieving the 10 candidates, and rerank the 10 candidates based on prediction score

vocal nexus Nov 8, 2024, 3:45 PM

#

Wouldn't it be much more efficient to just better train the embedding model so that the top k retrieval already returns correctly ranked images?
Maybe it makes it harder to train the model using user interactions though. I probable would create an interface just for collecting ranking data in any case.

final dome Nov 8, 2024, 4:45 PM

#

tropic dawn No, reranking is done **after** retrieval of candidates. To make it simpler, thi...

Yes, I understand that. You actually told me how important reranking when i was blindly trying to figure out the best model. Wasted a lot of time training those. So I looked into the papers with code links you shared earlier and for image reranking the code and proof isn’t legit, thats why I reached back to see if you had worked on this before. This logic is fantastic!!! Thanks a lot, man!! Here I was, breaking my head trying to caption my images and finetune clip. This sounds much easier. Thanks a lot for everything

#Image reranking