That is very accurate with what we have found. <thinking> models do a lot better, but with huge speed drops. For now, we have chosen accuracy over speed. But speed drop is like 3-4x - so we might move to an architecture where we 'think' only sporadically.
Everything happening in the LLM space is so close to how humans think naturally.
Everything happening in the LLM space is so close to how humans think naturally.