#Text in Dall E 3

1 messages · Page 1 of 1 (latest)

viscid lark
#

why does the image creation of Dall E 3 so terriblewith text. is there some technical limitations when including text in an AI generated Image? not even the paid version can povide a satisfactory output

mental scrollBOT
#

Hi @viscid lark. I'm kapa.ai, your friendly support bot. I will try to answer your question by searching through the knowledge base. :wave:

#

Yes, there are indeed technical limitations when it comes to generating text within images using DALL·E 3. As per the DALL-E-3 paper (https://cdn.openai.com/papers/dall-e-3.pdf), when the model encounters text in a prompt, it sees tokens that represent whole words and must map those to letters in an image. This process can be unreliable, leading to words having missing or extra characters. The issue is suspected to be related to the T5 text encoder used in the model. OpenAI is considering exploring conditioning on character-level language models to help improve this behavior in future work.

Furthermore, DALL·E 3 is primarily designed to generate realistic and artistic images based on keywords or phrases, not to produce text. While you can request text in your image descriptions, the results might be distorted, unclear, or not as expected, as DALL·E does not have a specific understanding of writing, labels, or any other common text.

If you need to include text in an image, a suggested workaround is to use a combination of Python libraries to render the text. For example, you could format the text in HTML/CSS, convert it to a PDF using WeasyPrint, and then convert the PDF to an image using pdf2image. This approach combines the strength of web technologies for layout with robust PDF to image conversion, ensuring a high-quality result.

#

Sources:


Note: you can ask me a follow up question by @ mentioning me again :speech_balloon: