> I wouldn't call o1 a "system". It's a model, but unlike previous models, it's trained to generate a very long chain of thought before returning a final answer
That answer seems to conflict with "in the future we'd like to give users more control over the thinking time".
I've gotten mini to think harder by asking it to, but it didn't make a better answer. Though now I've run out of usage limits for both of them so can't try any more…
not in a way that it is effectively used - in real life all of the papers using CoT compare against a weak baseline and the benefits level off extremely quickly.
nobody except for recent deepmind research has shown test time scaling like o1