We don't know, OpenAI refused to publish any details about the architecture in the technical report. We don't know parameters, we don't know depth, we don't know how exactly it's integrating image data (ViT-style maybe?), we don't even know anything about the training data. Right now it's a giant black box.
Yeah, I'm just reading the pdf and it's a bit suprising to me. I thought I missed something. They went from Open to "Model Closed, Tech Open" to "Everything Closed" this fast...? We're witnessing how much you can buy with Microsoft-level money.