I don't think that's correct. They had 400 people receive some questions, and only kept the questions that were solved by at least 2 people. The 400 people didn't all receive 120 questions (they'd have probably got bored).
If you go through the example problems you'll notice that most are testing the "aha" moment. Once you do a couple, you know what to expect, but with larger grids you have to stay focused and keep track of a few things to get it right.