Interestingly, I’m pretty sure they mean they hit the limit with tokens on Claude.
There’s a daily 2.5 million token limit that you can use up fairly quickly with 100K context
So they may very well have completed the whole program with Claude. It’s just the machine literally stopped and the human had to do the final grunt work.
We’ve been hitting this in our work and in experimentation, and I can confirm that Claude sonnet 3.5 has gotten 100% of the way there, including working through errors and tricky problems as we tested the apps it built.
There’s a daily 2.5 million token limit that you can use up fairly quickly with 100K context
So they may very well have completed the whole program with Claude. It’s just the machine literally stopped and the human had to do the final grunt work.