The referent of "utters" (sic) is ambiguous, so I can imagine a model having more difficulty with it than usual. Regardless, the current SOTA does need more specific and sometimes repetitive prompting than a human artist would, but it's surprising how much better results you can get from SOTA models with a bit of experience at prompt engineering.
This is, in part, what I'm trying to point out, it's an obvious typo given the context, and something that you or I would be able to pick up on, yet it completely breaks (it spit out a bunch of weird confetti cats for me). Perhaps I'm being a little harsh, but if it requires word-perfect tuning and prompt engineering, it speaks to something about the 'stupidity' of these models. It's a neat trick, but to call it anything in the realm of artificial intelligence is a bit of a joke.