People making up their own benchmarks for these things has confirmed one thing for me: The bias that people think they mostly have original thoughts is extremely strong. I find if I have a “good” idea someone has probably already thought of it as well and maybe even written about it. About 0.01% of the time do I have an idea that one may consider novel and even that’s probably my own bias and overstated. This example just confirms that these models don’t really seem to reason and have a really hard time doing the basic generalization they can with fewer examples.