#Image Generation Problem with Specific Instructions

1 messages · Page 1 of 1 (latest)

lethal pollen
#

The DALL-E model did not generate the image according to specific instructions despite detailed instructions.
**Initial Instructions **:

Podium with Three Steps:
The left step is the lowest and has the number 3.
The middle step is the highest and has the number 1.
The right-hand step is of intermediate height and bears the number 2.

Three Competitors:
The competitor on the left step wears a bronze medal and a tracksuit.
The competitor on the middle step wears a gold medal and a tracksuit.
The competitor on the right step wears a silver medal and a tracksuit.

Competitors' appearance:
Each competitor has only one medal and looks happy.

Attempted Solutions:

Several detailed descriptions were provided.
The model did not follow these instructions or generated technical errors. See an example.

Action Requested:

A technical correction so that the model follows the instructions given and generates the image as requested.

sour sandalBOT
#

@lethal pollen

Please follow the posting guidelines.

For better assistance from the community or an OpenAI team member, please follow the posting guidelines by clicking the button below.

remote sand
# lethal pollen The DALL-E model did not generate the image according to specific instructions d...

Hey! You might be interested in reading more about how they made DALL·E, and why that results in some of the limitations you're running into here. The DALL·E 3 research paper is available here: https://cdn.openai.com/papers/dall-e-3.pdf and it's not terribly long.

In short: DALL·E was trained by learning from a bunch of images that had written captions describing them. After processing a bunch of images of an apple, for example, that were captioned "an apple" (simplification!), it eventually got the ability to create an apple using what it learned from all the source images.

This method has limits, though! It doesn't allow for an indefinite amount of detail to be included in a prompt (see section 5 of the paper, Limitations & Risk), which is what you're running into here. The models of the future will surely improve in this regard, though!

For example, there's a lot of promise in the future image generation capacity of the GPT-4o model, which is not currently released. You can see some examples of the new things it's capable of, though, on this page: https://openai.com/index/hello-gpt-4o/ (see the Explorations of capabilities section).