OA has always said that they did not hardwire any of these gotcha questions, and in many cases they continue to work for a long time even when they are well-known. As for any inconsistency, well, usually people aren't able to or bothering to control the sampling hyperparameters, so inconsistency is guaranteed.
They may not have had to hardwire anything for known gotcha questions, because once a question goes viral, the correct answer may well show up repeatedly in the training data.