#Gpt image 1 references?

1 messages · Page 1 of 1 (latest)

astral belfry
#

I saw your GPT image 1 supports normally multiple reference images, but it doesn't if used via Openai official API .... is this correct?
why if so?. Leonardo for example offers gpt api with multi reference i reckon
many thanks

valid gull
# astral belfry I saw your GPT image 1 supports normally multiple reference images, but it doesn...

Hi Enrico,

Yes, you're correct—at the moment, the official OpenAI API for GPT-based image generation does not support supplying multiple reference images in a single request. The GPT Image 1 model can work with multiple image references in certain integrated environments (such as the ChatGPT interface), but via the public API, the functionality is currently limited to a single image input per generation.

The difference usually comes down to implementation scope. The ChatGPT product layer can add extra orchestration features—like handling multiple reference images—before passing information to the underlying model. The public API, on the other hand, tends to expose the model's raw capabilities without that additional layer of processing. In other words, it's not that GPT Image 1 can't conceptually incorporate multiple references—it's more that the API's current design doesn't yet bundle that particular convenience feature.

Platforms like Leonardo have built their own middleware on top of the API, allowing them to manage and blend multiple reference images before sending a composed request to their underlying generation system. Essentially, they're handling that “multi-reference merge” step themselves.

If your workflow requires multi-reference capability right now, one workaround is to first manually combine or collage your reference images into a single source image and then supply that as the API input. While it's less flexible than sending separate references, it still allows you to retain some composite influence over the final result.

Hopefully, API parity will improve in future updates, but for now, single-image input is the official API limit.

If you'd like, I can outline an example of how to prepare and encode a composite reference image so it integrates cleanly into your API call without losing essential detail.

astral belfry
valid gull
# astral belfry i tried composition but doesn't work, many thanks anyway! Thankfully many plat...

Alright, Enrico — sounds like your “composition” attempt didn't produce the result you were hoping for, so let's untangle it and get you something that does work.

Since the OpenAI image generation API currently only accepts one reference image per request, “composition” in the sense of giving it multiple direct references just isn't possible in one shot. That's why it worked on some platforms (because they do that extra merging magic for you) but not here.

#

Here's a practical workaround to get as close as possible to true multi-image composition right now:


1. Combine your references into a single image

Think of it as making a mini mood board. You can:

  • Open your two (or more) references in Photoshop, GIMP, Krita, Photopea, or even Canva.
  • Place them side-by-side or in any layout that feels natural.
  • Make sure they're large enough so details are visible.
  • Save as a single PNG or high-quality JPG.

Tip: Don't pick wildly different resolutions — that can make the model “favor” one image more than the other.

#

2. Encode it for your API request

The image needs to be either hosted online with a stable URL or embedded as base64 data.

For base64:

  • Read the image in binary (e.g., with open("combo.png", "rb") as f: data = base64.b64encode(f.read()).decode() in Python)
  • Pass it via "image": {"name": "combined.png", "data": "<your_base64_here>"} in your API payload.

3. Prompt it with intent

Because the merged reference can be visually “busy,” tell the model clearly what you want out of it.

Example:

Generate a single fantasy illustration in the style of [desired style].
Blend the character design from the left side of the reference with the environment from the right side.
Unify them in lighting, perspective, and mood.

4. Iterative refinement

If the output leans too hard toward one of your images:

  • Make your collage less cluttered — give more space and scale to the underrepresented reference.
  • Use more explicit callouts in your text prompt like “exact same hairstyle from [character],” “keep mountain shapes from [landscape],” etc.

It's not a perfect substitute for native multi-image input, but with a carefully prepared composite and intentional prompting, you can get surprisingly close.
Think of it as teaching a very smart but slightly forgetful artist by giving them one annotated canvas of references instead of scattering multiple photos on the floor.

Want me to sketch out a ready-to-run API example in Python showing the composite image flow? That would make this a plug-and-play solution for you.

astral belfry