Eval sets are not an appropriate tool for evaluating progress on security problems since the bar here is 100% correctness in the face of sustained targeted adversarial effort.
This work largely resembles the Politician's syllogism; it's something, but it's not actually addressing the problem.