Why do AI-generated images have misspelle…

Have you ever prompted your favorite AI tool like ChatGPT or Copilot to generate an image for you? Maybe you want a feature image for a blog post or your PowerPoint presentation, and you type a well-thought-out prompt explaining everything you want in it.

It'll get almost everything right in the image. It'll stay true to the style you want, whether it's realistic, cartoon-like, or even futuristic. Look closely though and you will find some bizarre things like an extra finger, an oddly-positioned leg, or one wheel on a car that's just way thinner than the others. Finding those oddities can be a fun activity, and in some ways they are like easter eggs in a Disney animated film.

Another shortcoming on most of these generated images and that one drives people nuts: Misspelled words, garbled text, or just words and sentences that just don't make any sense. Many wonder out loud how an AI tool can be so smart and yet not know how to spell or add text to an image.

My friend Rory de Goede asked this very question on his LinkedIn post not too long ago. He hilariously lamented if the cause was his Dutch accent coming through in his prompts and shared his surprise at how bad Copilot was at spelling.

And he's not alone… I'm sure at some point you've had a similar experience or you know someone close to you who has. Let me explain how this all works.

While we think of everything in Copilot (or ChatGPT, because they use the same / similar underlying models) as being powered by a singular model, that is not the case. All the generative text capabilities and the ability to converse or answer questions comes from its LLM, currently GPT-4o. But the image generation is driven by a different generative model, currently DALL-E 3. Unlike GPT-4o (an LLM) which is designed to understand and generate text and converse, DALL-E 3 is specifically designed to create images based on text descriptions. If you're using Google Gemini or another AI solution, it still works the same way. Underneath the hood, it will have an LLM for the generative text and conversational capabilities, a different model for image generation, and maybe yet another model for math or coding.

When you ask it to create an image of a chair and overlay the text "HELLO" on it, it envisions what a chair would look like and "draws" a chair… then it envisions what Hello would look like, not so much as text but as a shape, and then it "draws" that shape (as opposed to typing that text using a certain selected font). That's one explanation for why sometimes it's correct, sometimes it's messed up, and sometimes we get some weird artifacts on letters like a bent-out-of-shape M or garbled letters.

These inconsistencies can also be seen in the images it creates, like my example near the top of this article about a car with 4 wheels with one that's too thin; or a hand with an out of place finger. It's basically imagining what it would look like and then creating its best approximating of that imagination.

Another potential reason could be that the generative model is intentionally designed or fine-tuned to insert some inaccuracies as a safeguard. This would make it evident that the image was generated by AI and wouldn't pass for something done by a human.

Back to the post from Rory that inspired this article, one person who responded to him mentioned that a reason for the bad text could also be due to font copyright issues, and I do not believe that to be the case at all. There are hundreds – if not thousands – of free fonts in the public domain and the issue happens regardless of the font used. And just try asking it to create an image with the words Happy Birthday or Good Morning on a blackboard in bold or italics, or a certain style and it will do it for you. The spelling may be off (or sometimes spot-on), but what you will see is that it is capable of writing in different "fonts" and even humanistic hand-writing styles.

I also don't believe the issue to be related to copyrighted wordmarks, because it is just as likely to misspell "Microsoft" as it is to the word "Birthday".

Personally, regardless of the reason, I find it all quite charming – just like the imperfections of a loved one.