Given the bullet point of "DALL·E 3 is built natively on ChatGPT" and the tight integration between ChatGPT and the corresponding image generation (and no research paper released with the announcement), I strongly suspect that DALL-E 3 is a trial run of GPT-4 multimodal capabilities and may be run on a similar infrastructure.
GPT-4 can only do text-to-text and image-to-text. It can't generate images itself. So it will simply use an API call. Really nothing special, Bing does the same thing.