Fascinating. Like there is some accuracy threshold beyond which they cannot conv...

		riemannzeta 55 days ago \| parent \| context \| favorite \| on: AccountingBench: Evaluating LLMs on real long-hori... Fascinating. Like there is some accuracy threshold beyond which they cannot converge, but instead run with the inaccuracy.