I'm fairly certain this was an example of overfitting and Freedman's Paradox, not deliberate cheating.
Let's say you have a completely random data set. You generate a bunch of random variables x1 through xn and a random dependent variable y. Then you poke around and see whether any of the x variables look like they might predict y, so you pick those variables and try to build a model on them. What you end up with is a model where, according to the standard tests of statistical significance, some of the xs predict the y, even though all the data is completely random.
This is a much more likely explanation for why the answer weights on the biographical assessment were so weird than some conspiracy between the contractors who developed the test, the FAA staff, and the black employee organization.
They had a dataset that was very skewed because historically there have been very few black controllers, and so was very prone to overfitting. The FAA asked the contractor to use that dataset to build a test that would serve as a rough filter, screen in good candidates, and not show a disparate impact. The contractor delivered a test that fulfilled those criteria (at least in the technical sense that it passed statistical validation). Whether or not the test actually made any sense was not their department.
> I'm fairly certain this was an example of overfitting and Freedman's Paradox, not deliberate cheating.
The answers to the biographical questionnaire - which screened out 90% of applicants - were leaked to ethnic affinity groups. If a select group of being being provided with the correct answers isn't cheating, I don't know what is.
No, that's not what happened. The guy from the black affinity group CLAIMED that he knew the answers. But he's a completely unreliable source who was pretending to know things that he didn't actually know. He also claimed to have a list of magic buzzwords that would get your application moved to the top of the pile, but if you look at the list of magic buzzwords that he provided, it was just a list of dozens of generic action verbs like "make", "manage", "organize", "analyze", etc. from a resume writing book. I'm sure it's the same thing with the biographical assessment. He was just telling people what he THOUGHT were the right answers.
> As the hiring wave approached, some of Reilly’s friends in the program encouraged her to join the National Black Coalition of Federal Aviation Employees (NBCFAE), telling her it would help improve her chances of being hired. She signed up as the February wave started. Soon, though, she became uneasy with what the organization was doing, particularly after she and the rest of the group got a voice message from FAA employee Shelton Snow:
> “I know each of you are eager very eager to apply for this job vacancy announcement and trust after tonight you will be able to do so….there is some valuable pieces of information that I have taken a screen shot of and I am going to send that to you via email. Trust and believe it will be something you will appreciate to the utmost. Keep in mind we are trying to maximize your opportunities…I am going to send it out to each of you and as you progress through the stages refer to those images so you will know which icons you should select…I am about 99 point 99 percent sure that it is exactly how you need to answer each question in order to get through the first phase.”2
> The biographical questionnaire Snow referred to as the “first phase” was an unsupervised questionnaire candidates were expected to take at home. You can take a replica copy here. Questions were chosen and weighted bizarrely, with candidates able to answer “A” to all but one question to get through.
> After the 2014 biographical questionnaire was released, Snow took it a step further. As Fox Business reported (related in Rojas v. FAA), he sent voice-mail messages to NBCFAE applicants, advising them on the specific answers they needed to enter into the Biographical Assessment to avoid failing, stating that he was "about 99 point 99 percent sure that it is exactly how you need to answer each question."
I've read it. I've seen all the weightings. My point is that after reading the IG report, I think it's most likely that when he made the following statement he was exaggerating and claiming that he knew the right answers when he didn't:
> I am going to send it out to each of you and as you progress through the stages refer to those images so you will know which icons you should select…I am about 99 point 99 percent sure that it is exactly how you need to answer each question in order to get through the first phase
What do you think the point of such a questionnaire was?
Why would you want to filter for applicants who report that their worst high school subject was science and their lowest college grades were in history?
As to why the questionnaire exists - It's the equivalent of something that's very common in the private sector. A company gets thousands of applicants for a job. They only have the resources to interview some small percentage of that. So they develop a very rough filter to narrow down the pool to something manageable. For instance, if it's an entry level job they'll typically just categorically reject anyone who has an advanced degree or more than a few years of work experience because they figure that person will leave for a better job as soon as they can.
That's what the questionnaire was designed to do. The other steps in the hiring process take a lot of time and resources (proctored exam, referrals, medical testing) so they wanted to put a rough filter in front of that to reduce the numbers to something manageable.
As to why they would give a higher weight if you said your worst high school subject was science - that's the part that I think was just an overfit model producing nonsensical results. That kind of statistically-significant-but-nonsensical parameter is exactly what Freedman's Paradox describes.
> I think was just an overfit model producing nonsensical results. That kind of statistically-significant-but-nonsensical parameter is exactly what Freedman's Paradox describes.
You just completely made this up. There isn't even evidence that a "model" exists or was fitted to.
One of the reasons why I think he was bullshitting was that according to the testimony, he said to answer the question about how many sports you played in high school honestly, but that wast the wrong information because that one of the questions where some answers would give you more points than others. The other reason is that it's just painfully obvious from the testimony that this guy was not reliable - he took a generic resume writing guide that he had been given years ago and passed it off as inside information.
> he said to answer the question about how many sports you played in high school honestly, but that was one of the questions where some answers would give you more points than others.
That's exactly what is alleged: Snow told applicants which answers were worth the most points. This is what Snow himself claimed, too.
And the FAA's internal investigation did have witnesses say that they were instructed on how to respond to the Biographical Assessment:
> One witness said during the call, participants told they were looking at questions on the BA test but did not know what to enter on the test. According to this witness, [redacted] responded with information that should be entered on the BA test.
If the voicemails are recorded anywhere, that will put this question to rest.
Right, my point is that instead of providing the answer that would get the applicant the most points, he told them to answer honestly. That doesn't make sense if his goal was to cheat.
To pass the test you have to click A on all 62 questions apart from question 16 where you have to click D to say your lowest grade in school was in history. The thing's a complete travesty.
You don't have to do that to pass the test. The max score possible is 179. One can pass the test without answering either of the worst subject questions "correctly."
Also answering answer A to 23 (>20 hours/week paid employment last year of college) would logically conflict with answering A to 56 (Did not attend college).
I agree that it seems likely that the weird questions and their weighting came from over-fitting as you describe. The cheating allegation though, from my reading, is that the "correct" answers were leaked and then disseminated by the leakee(s). (And that this was particularly impactful because it was unlikely that you would pass the overfit test otherwise.)
When I read the IG report and saw what the guy actually said (and that his list of secret buzzwords actually turned out to be a photocopy from a resume writing book) it was pretty clear that he was bullshitting and claiming that he had inside information about the process that he didn't actually have.
The investigation says the screenshots he talked about were for USA Jobs too, not the nonsense biographical test. Seems like it ought to be pretty easy to just check if NBCFAE members passed the that test at an unusually high rate.
Let's say you have a completely random data set. You generate a bunch of random variables x1 through xn and a random dependent variable y. Then you poke around and see whether any of the x variables look like they might predict y, so you pick those variables and try to build a model on them. What you end up with is a model where, according to the standard tests of statistical significance, some of the xs predict the y, even though all the data is completely random.
This is a much more likely explanation for why the answer weights on the biographical assessment were so weird than some conspiracy between the contractors who developed the test, the FAA staff, and the black employee organization.
They had a dataset that was very skewed because historically there have been very few black controllers, and so was very prone to overfitting. The FAA asked the contractor to use that dataset to build a test that would serve as a rough filter, screen in good candidates, and not show a disparate impact. The contractor delivered a test that fulfilled those criteria (at least in the technical sense that it passed statistical validation). Whether or not the test actually made any sense was not their department.