But still, the questions in that test are "solved" in the sense of "I can take a dictionary and answers these questions with full certainty". Beyond established knowledge LLMs are monkeys with typewriters, at best.
I’d like to see you ace even a middle-school level Spanish test with just a dictionary (sub Spanish with some other language if you happen to know Spanish).
Let's define "zeta" as a mathematical function ζ(s) which takes a statement "s" as input, where s is a statement of a breakthrough in LLM capabilities achieved relative to the current date and time and ζ(s) is the probability that a given AI skeptic will honestly recognize "s" as a breakthrough,
then our Riemann-Goalpost hypothesis is that ζ has zeros for every "s" which is a negative integer (every breakthrough that happened in the past is null in value) and only has positive values where s is positive.
We can conclude from the above that given a far enough date in the future, any given breakthrough can be spectacular, but once achieved, will be derided as trivial.