The prompt enrichment thing is pretty standard. Everyone does that bit, though some make it user-visible. On Grok it used to populate to the frontend via the download name on the image. The image editing is interesting.
All the stable diffusion software I've used names the files after some form of the prompt, and probably because SD weights the first tokens higher than the last tokens, probably as a side effect of the way the CLIP/BLIP works.
I doubt any of these companies have rolled their own interface to stable diffusion / transformers. It's copy and paste from huggingface all the way down.
I'm still waiting for a confirmed Diffusion Language Model to be released as gguf that works with llama.cpp
Auto1111 and co are using the prompt in the filename because it's convenient, not due to some inherent CLIP mechanism.
If you think that companies like OpenAI (for all the criticisms they deserve) don't use their own inference harness and image models I have a bridge to sell to you.
i give less weight to your opinion than my own. I'm not sure how you misunderstood what i said about clip/blip, as well. I was replying to a comment about "populating the front end with the filename" - the first tokens are weighted higher in the resulting image than the later tokens. And therefore, if you prompt correctly, the filenames will be a very accurate description of the image. Especially danbooru style, you can just split on space and use them as tags, for all practical purposes.
I guess the "convenience" just happened to get ported over from "Auto1111", or it's a coincidence, or