You can still game a test set without training on it, that’s why you usually hav...

jerpint 4 months ago | parent | context | favorite | on: FrontierMath was funded by OpenAI

You can still game a test set without training on it, that’s why you usually have a validation set and a test set that you ideally seldom use. Routinely running an evaluation on the test set can get the humans in the loop to overfit the data