You could run it on a single Mac Studio with M3 Ultra, or two Mac Studios with M4 Max at higher perf than that. And lightly quantizing this could give us modern dense models in the ~80GB size range, which is a very compelling target.
Wouldn't matter much still. M3 ultra has 819GB/s unified memory bandwidth. That means theoretical max tokem rate is 819/128 =~ 6.39 t/s. At 80 GB (5 bit quantization), its still near about 10 t/s ... far from a good coding experience. Also, these are theoretical max.. real world token generation rates would be at least 15-20% less.
Assuming you logged in with OAuth, I am guessing you are trying to use gpt-5.5?
In my tests, it worked using gpt-5.4 for me and I assumed gpt-5.5 is not available to me because I am on the free plan
Do you have the subscription that allows 5.5? If so, I can look into what changed in API. Sorry I rarely use openAI so it is a bit of an untrodden path
There is no BF16. There is no FP8 for the instruct model. The instruct model at full precision is 160 GB (mixed FP4 and FP8). The base model at full precision is 284 GB (FP8). Almost everyone is going to use instruct. But I do love to see base models released.
I’ve been using Codex Pro since they lobotomized Opus 4.6. Codex is so much better, GPT 5.4 xhigh fast is definitely the smartest and fastest model available.
For a while there I had both Opus 4.6 and Codex access and I frequently pitted them against each other, I never once saw Opus come out ahead. Opus was good as a reviewer though, but as an implementer it just felt lazy compared to 5.4 xhigh.
One feature that I haven’t seen discussed that much is how codex has auto-review on tool runs. No longer are you a slave to all or nothing confirmations or endless bugging, it’s such a bad pattern.
Even in a week of heavy duty work and personal use I still haven’t been able to exhaust the usage on the $200 plan.
I’ll probably change my mind when (not IF) OpenAI rug pull, but for spring ‘26, codex is definitely the better deal.
I also made the switch to OpenAI, the $20 plan, I dunno about "so much better" but it's more or less the same, which is great!
The models and tools levelling out is great for users because the cost of switching is basically nil. I'm reading people ITT saying they signed up for a year - big mistake. A year is a decade right now.
It really depends on what you‘re trying to do and what your skillset is.
But if you go information architecture first and have that codified in some way (espescially if you already have the templates), then you can nudge any agent to go straight into CSS and it will produce something reasonable.
[0] https://github.com/loft-sh/vcluster
reply