#How to structure an OpenAI image prompt based on a dynamic quote (Discord bot use case)

1 messages · Page 1 of 1 (latest)

faint hedge
#

Hello .. i have following question please..
I fetch a quote via an API and pass it to the OpenAI API to generate an image, which is then posted to a Discord channel by a bot. How should a prompt be structured so that ChatGPT’s image generation produces images like the ones in the attachment?

I’ve tried several different prompts, but I can’t seem to get the desired result.

The quote doesn’t necessarily have to appear in the image—it would already be great if the generated images simply match the mindset quote.

Maybe someone has an idea how I could implement this. 👋

sharp kiln
#

Have you tried asking ChatGPT to analyze these images and write prompts?

faint hedge
#

Yeah, I've tried many times, but the results I get on Discord are always too similar and don't look the way they're supposed to. I might be using the wrong model, now that I think about it.

faint hedge
#

This is my Master Prompt (Version 2).
{quote} represents the daily quote fetched via an API call.


"{quote}"

Interpret the quote symbolically, not literally.
Focus on the underlying idea, emotion, tension, or inner conflict expressed by the quote rather than specific words.

Before generating the image, internally choose ONE strong visual interpretation that best fits the quote.
Deliberately vary the type of scene between generations to avoid repetition.

Possible visual interpretations may include (but are not limited to):
- Symbolic environments or spaces (empty rooms, paths, thresholds, landscapes, light-filled or shadowed places)
- Strong visual metaphors using objects, nature, or light instead of people
- Human presence only when it clearly adds meaning (may be absent, partially visible, or abstracted)
- Conceptual or atmospheric scenes expressing inner conflict, discipline, freedom, fear, hope, or clarity
- Moody, high-contrast or minimalistic compositions that challenge the viewer emotionally

Avoid defaulting to a single human figure seen from behind.
If a human appears, vary perspective, distance, framing, or abstraction.
Regularly generate scenes with no people at all.

Use lighting as a primary storytelling tool (natural light, shadow, contrast, neon glow, fog, inner light).
Cinematic composition with depth, atmosphere, and emotional weight.
Highly detailed, realistic yet painterly digital art.
Color grading may be natural, muted, dark, or stylized depending on the quote’s tone.

Avoid repeating identical compositions, camera angles, visual motifs, or scene structures.
Do NOT include any text, letters, quotes, logos, or watermarks in the image.
Vertical composition. Clean, expressive, emotionally resonant.

👉 It doesn’t feel fully coherent — the images don’t really match the quote. 🫤

sharp kiln
#

It seems as though you are asking GPT to interpret the quote and generate an image based on its interpretation of the meaning. It's an interesting approach, if you are open to unexpected outcomes. But if you want a very specific outcome, then, your prompts need to describe exactly what you want to see.

faint hedge
sharp kiln
#

I'm looking at your initial post again and wonder if the key question here is that of a "match." You want this to be completely automated (I think) and you would like AI to generate an image that "matches" the quote every time. But you are leaving AI to decide how to "match" the quote with an image. Because you gave AI free hand in deciding, the resulting images may not match **your mental image **of the quote. Does it about summarize your dilemma?

If that's the case, I don't see a perfect solution to your question. We can mitigate, but can't resolve 100%. A compromise that might work is to give AI a little more structure for its decision on what kind of image to create. For example, a good prompt for image generation requires following elements: 1) Main Subject(s), 2) Context/Setting, 3) Mood/Feeling, 4) Visual Style, 5) Lighting and/or Color Palette, and 6) Composition/Perspective/Shot. Perhaps, you can instruct GPT to select, based on its interpretation of the daily quote, the best artistically cogent option from the several choices you provide? Say, you provide five options in each of these six elements, that creates over 15000 combinations. That'll probably be sufficient to avoid the feeling of "same images every time."

This is not going to be a failproof way, obviously. AI can still choose an option that is different from what you think of as ideal. Let's take the "planting the seed" quote. Your sample image is an extreme closeup macro photograph shot of a small seedling emerging from rich, dark soil with bokeh background of greenery. Your bot also created a closeup shot, but it saw human agency in this process, so included a pair of hands and a gardening tool. Also AI generators go for what is "common" or "typical" and extreme closeup is not. What if, though, if you gave it as one of five options under #6? You may have a better chance of getting it. I imagine you'll need to tweak the options a bit before you find the sweet spot, too.

faint hedge