#High relevancy for image selection with API

1 messages · Page 1 of 1 (latest)

sharp glacier
#

Hi everyone,

I’m trying to automate the selection of different images to illustrate a voice over. Putting aside the cost for now, my goal is to come up with the most relevant image selection.

Despite many iterations, I end up with something particularly bad where images are relevant only 50% (best case scenario).

HERE IS THE PROCESS I CREATED SO FAR:

Step1: Retrieve images from a database using 2 processes:

-Specific keywords: Generate a very accurate keywords related to each paragraph of my voicer over, hoping to retrieve images particularly relevant to illustrate that specific part of the voice over.

-Broad keywords: Generate general keywords related to the voice over, hoping to retrieve additional images that could be used as a backfill solution to illustrate my over in case specific images cannot be used.

→ Approximately 600 images retrieved

Step2: Apply a broad filter to remove bad images (blur, duplicates, images with text, etc.).

→ Approximately 150 images remaining.

Step3: Send each image to GPT & retrieve a 2-3 sentences image description.

Step4: Using a combination of the following elements:

-Image description (retrieved in step 3)

-Voice over script (2 pages)

-Contextual information related to the Voice over (short document, <20 pages, containing general info about the voice over)

I send batches of 10 images to GPT, asking to exclude the most irrelevant / out of topic images.

→ Approximately 80 images remaining

Step 5: Final image selection:

Going through each paragraph of the voice over

Focusing on images obtained with specific keywords (using image description). Asking GPT to select the top 3 relevant images to illustrate the paragraph

Then, focusing on images obtained with broad keywords (using image description). Asking GPT to review the selected images & to determine if the broad images could be more relevant to replace one of the specific images.

What would you recommend to improve the relevancy?

Thanks for the help