The eye opening thing here is not that the AI failed, but why it failed.
At start the AI is like a baby, it doesn't know anything or have any opinions. By teaching it using a set of data, in this case a set of resumes and the outcome then it can form an opinion.
The AI becoming biased tells that the "teacher" was biased also. So actually Amazon's recruiting process seems to be a mess with the technical skills on the resume amounting to zilch, gender and the aggressiveness of the resume's language being the most important (because that's how the human recruiters actually hired people when someone put a resume).
The number of women and men in the data set shouldn't matter (algorithms learn that even if there was 1 woman, if she was hired then it will be positive about future woman candidates). What matters is the rejection rate which it learned from the data.. The hiring process is inherently biased against women.
Technically one could say that the AI was successful because it emulated the current Amazon hiring status.
> The number of women and men in the data set shouldn't matter (algorithms learn that even if there was 1 woman, if she was hired then it will be positive about future woman candidates).
This is incorrect. The key thing to keep in mind is that they are not just predicting who is a good candidate, they are also ranking by the certainty of their prediction.
Lower numbers of female candidates could plausibly lead to lower certainty for the prediction model as it would have less data on those people. I've never trained a model on resumes, but I definitely often see this "lower certainty on minorites" thing for models I do train.
The lower certainty would in turn lead to lower rankings for women even without any bias in the data.
Now, I'm not saying that Amazon's data isn't biased. I would not be surprised if it were. I'm just saying we should be careful in understanding what is evidence of bias and what is not.
It's wrong even if their model doesn't output a certainty (not all classifiers do). Almost all ML algorithms optimize the expected classification error under the training distribution. So if the training data contains 90% men, it's better to classify those men at 100% accuracy and women at 0% accuracy, than it is to classify both with 89.9% accuracy. Any unsophisticated model will do this.
gp: "The number of women and men in the data set shouldn't matter (algorithms learn that even if there was 1 woman, if she was hired then it will be positive about future woman candidates)."
> The lower certainty would in turn lead to lower rankings for women even without any bias in the data.
This is not true.
Probabilistic-ly speaking, if we are computing P(hiring | gender);
Lower certainty means there is a high variance in prior over women. However, over a large dataset, the "score" would almost certainly be equal to the mean of the distribution, and be independent of the variance.
In simpler words, if there was a frequency diagram of scores for each gender (most likely bell curves), then only the peak of the bell curve would matter. The flatness / thinness of the curve would be completely irrelevant to the final score. The peak is the mean, and the flatness is the uncertainty. Only the mean matters.
There's not enough information about how their ML algorithm works, nor how large their dataset was for any of the above reasoning to be justified. Fwiw, many ranking functions do indeed take certainty into account, penalizing populations with few data points.
If they were using any sort of neural networks approach with stochastic gradient descent, the network would have to spend some "gradient juice" to cut a divot that recognizes and penalizes women's colleges and the like. It wouldn't do this just because there were fewer women in the batches, rather it would just not assign any weight to those factors.
Unless they presented lots of unqualified resumes of people not in tech as part of the training, which seems like something someone might think reasonable. Then, the model would (correctly) determine that very few people coming from women's colleges are CS majors, and penalize them. However, I'd still expect a well built model to adjust so that if someone was a CS major, it would adjust accordingly and get rid of any default penalty for being at a particular college.
If the whole thing was hand-engineered, then of course all bets are off. It's hard to deal well with unbalanced classes, and as you mentioned, without knowing what their data looks like we can only speculate on what really happened.
But I will say this: this is not a general failure of ML, these sorts of problems can be avoided if you know what you're doing, unless your data is garbage.
> It wouldn't do this just because there were fewer women in the batches, rather it would just not assign any weight to those factors.
That's exactly the issue we are talking about here. Woman's colleges would have less training data so they would get updated less. For many classes of models (such as neural networks with weight decay or common initialization schemes) this would encourage the model to be more "neutral" about women and assign predictions closer to 0.5 for them. This might not affect the overall accuracy for women (as it might not influence whether or not they go above or below 0.5), but it would cause the predictions for women to be less confident and thus have a lower ranking (closer to the middle of the pack as opposed to the top).
I don't think I'm with you. A neural net cannot do this - picking apart male and female tokens requires a signal in the gradients that force the two classes apart. If there's no gradient, then something like weight decay will just zero out the weights for the "gender" feature, even if it's there to begin with. Confidence wouldn't enter in, because the feature is irrelevant to the loss function.
A class imbalance doesn't change that: if there's no gradient to follow, then the class in question will be strictly ignored unless you've somehow forced the model to pay attention to it in the architecture (which is possible, but would take some specific effort).
What I'm suggesting is that it's likely that they did (perhaps accidentally?) let a loss gradient between the classes slip into their data, because they had a whole bunch of female resumes that were from people not in tech. That would explain the difference, whereas at least with NNs, simply having imbalanced classes would not.
supposing waiter and waitress are both equally qualifying for a job, and most applicants are men, won't the ai score waiter as being more valuable than waitress?
Not generally. The entire point being made is that whether one feature is deemed to be more valuable than another feature depends not just on the data fed into the system but also on the training method used.
Specifically, the gp is pointing out that typical approaches will not pay attention to a feature that doesn't have many data points associated with it. In other words, if it hasn't seen very much of something then it won't "form an opinion" about it and thus the other features will be the ones determining the output value.
Additionally, the gp also points out that if you were to accidentally do something (say, feed in non-tech resumes) that exposed your model to an otherwise missing feature (say, predominantly female hobbies or women's colleges or whatever) in a negative light, then you will have (inadvertently) directly trained your model to treat those features as negatives.
Of course, another (hacky) hypothetical (noted elsewhere in this thread) would be to use "resume + hire/pass" as your data set. In that case, your model would simply try to emulate your current hiring practices. If your current practices exhibit a notable bias towards a given feature, then your model presumably will too.
There are a few ways you can tackle this issue: 1) have the same algorithm for each group, but train separately (so in the end you have two different weights); 2) over-sample the group under represented in the data; 3) make the penalty more severe for guessing wrongly on female then male applicants during training; 4) apply weights to gender encoding; 5) use more then just resumes as data.
This isn't an insurmountable problem, but does require extra work then just "encode, throw it in and see what happens".
Amazon only scrapped the original team, but formed a new one in which diversity is a goal for the output.
Machine learning generally doesn't have any prior opinions about things and will learn any possible correlation in the data.
It could for example discover that certain words or sentence structures used in the resume are more likely associated with bad candidates. Later you find out that <protected class> has a huge amount of people that use these certain words/structures while most other people don't.
And now the AI discriminates against them.
ML will pick up on any possible signal including noise.
Then what is the purpose of this? At some point you want this thing to "discriminate" (or "select", if this is a better word) people based on what they have done in life. Which is not negative per se.
Would it though? A school name is essentially just that, no gender information there, even with the "women" prefix. If you discriminate other schools, you can do it too with those. FWIW there could be a difference in performance which the ML finds.
It would. Just because it's not explicitly looking for a "W" in the gender field doesn't mean it's not able to determine gender and discriminate based on that. The article and the discussion is all about how these things, despite not explicitly being told to discriminate based on gender, or race, or any number of factors, can still end up gathering a set of factors that are more prevalent among those groups, and discriminate against those people all the same.
>despite not explicitly being told to discriminate based on gender, or race, or any number of factors
Then this is completeley useless. You want this "AI" to discriminate based on a number of things. That's the whole point. You want to find people that can work for you. If a specific school or title is a bad indicator (based on what you hired now), then it just is that.
> The lower certainty would in turn lead to lower rankings for women even without any bias in the data.
I don't think that's true. "No bias" means that gender is irrelevant (i.e. its correlation with outcome is 0%). Therefore the system shouldn't even take it into account - it would evaluate both men and women just by other criteria (experience, technical skills, etc), and it would have equal amounts of data for both (because it wouldn't even see them as different).
You need bias to even separate the dataset into distinct categories.
False. If we're talking about the technical statistical definition, bias means systematic deviation from the underlying truth in the data -- see this article by Chris Stucchio with some images for clarification:
"In statistics, a “bias” is defined as a statistical predictor which makes errors that all have the same direction. A separate term — “variance” — is used to describe errors without any particular direction.
It’s important to distinguish bias (making errors with a common direction) from variance which is simply inaccuracy with no particular direction."
My point was that you should consider the meaning of the word under which the post you're replying to is correct, especially given that the author was claiming specific domain experience.
> The lower certainty would in turn lead to lower rankings for women even without any bias in the data.
your post said:
> If we're talking about the technical statistical definition, bias means systematic deviation from the underlying truth in the data
So I think my interpretation is correct, even though it's not "the technically statistically correct usage". You were referring to the bias of the algorithm (i.e. the mean divergence from the mean in the data), whereas we were referring to the "hiring bias" evident in the data. In fact, your "bias" was mentioned as "lower rankings for women" - i.e. "the algorithm would have (statistical) bias even without (sexist) bias in the data" and I was replying that I think that's false.
Question: So technically, the AI is not bias against women per se, but a set of characteristics / properties, that are more common among women.
I'm not trying to split hairs (or argue), as much as further clarify the difference between (the common definition of) human bias and that of statistical bias.
Computers are very bad at actually discriminating against people, they will pick up a possible bias in a statistical dataset (ie, <protected class> uses certain sentence structure and is statistically less likely to get or keep the job).
Sometimes computers also pick up on statistical truths that we don't like, ie, you assign a ML to classify how likely someone is to pay back their loan and it picks up on poor people and bad neighborhoods, disproportionately affecting people of color or low income households. In theory there is nothing wrong with the data, after all, these are the people who are least likely to pay back a loan, but our moral framework usually classifies this as bad and discriminatory.
Machine Learning (AI) doesn't have moral frameworks and doesn't know what the truth is. The answers it can give us may not be answers we like or want or should have.
on a side note; human bias is usually not that different since the brain can be simplified as a bayesian filter; there are predictions on the present based on past experience, reevaluation of past experience based on current experience and prediction of future experience based on past and current experience. It's a simplification but usually most human bias is based on one of these, either explicitly social (bad experience with certain classes of people) or implicitly (tribalism).
> the brain can be simplified as a bayesian filter
I agree with everything else in your post, but just wanted to note that while this is true to some extent, the brain is much less rational than a pure Bayesian inference system; there are a lot of baked in heuristics designed to short-circuit the collection of data that would be required to make high-quality Bayesian inferences.
This is why excessive stereotyping and tribalism are a fundamental human trait; a pure Bayesian system wouldn't jump to conclusions as quickly as humans do, nor would it refuse to change its mind from those hastily-formed opinions.
I think I'd make the claim a bit less strongly -- we don't know if there is statistical bias or non-statistical/"gender bias" in the data; both are possible based on what we know.
However exploring the statistical bias possibility, the simple way this could happen is if the data have properties like:
1. For whatever reason, fewer women than men choose to be software engineers
2. For whatever reason, the women that choose to be software engineers are better at it than men
(Note I'm just using hypotheticals here, I'm not making claims about the truth of these, or whether it's gender bias that they are true/false).
Depending on how you've set up your classifier, you could effectively be asking "does this candidate look like software engineers I've already hired"? If so, under the first case, you'd correctly answer "not much". Or you could easily go the other way and "bias" towards women if you fit your model to the top 1% where women are better than men, in our hypothetical dataset.
This would result in "gender bias" in the results, but there's no statistical bias here, since your algorithm is correctly answering the question you asked. It's probably the wrong question though!
Figuring out if/when you're asking the right question is quite difficult, and as the sibling comment rightly pointed out, sometimes (e.g. insurance pricing) the strictly "correct" result (from a business/financial point of view) ends up being considered discriminatory under the moral lens.
This is why we can't just wash our hands of these problems and let a machine do it; until we're comfortable that machines understand our morality, they will do that part wrong.
The article didn't specify how they labeled resumes for training. You're assuming that it was based on whether or not the candidate was hire. Nobody with an iota of experience in machine learning would do something like that. (For obvious reasons: you can't tell from your data whether people you did not hire were truly bad.)
A far more reasonable way would be to take resumes of people who were hired and train the model based on their performance. For example, you could rate resumes of people who promptly quit or got fired as less attractive than resumes of people who stayed with the company for a long time. You could also factor in performance reviews.
It is entirely possible that such model would search for people who aren't usually preferred. E.g. if your recruiters are biased against Ph.D.'s, but you have some Ph.D.'s and they're highly productive, the algorithm could pick this up and rate Ph.D. resumes higher.
Now, you still wouldn't know anything about people whom you didn't hire. This means there is some possibility your employees are not representative of general population and your model would be biased because of that.
Let's say your recruiters are biased against Ph.D.'s and so they undergo extra scrutiny. You only hire candidates with a doctoral degree if they are amazing. This means within your company a doctoral degree is a good predictor of success, but in the world at large it could be a bad criteria to use.
I'm not a ML guy, but reading this, it almost sounds like the training data needs to be a fictional, idealized set, and not based on real world data that already has bias slants built in. Possibly composites of real world candidates with idealized characteristics and fictional career trajectories. Basically, what-my-company-looks-like vs what-I-want-it-to-look-like. I'm not sure this is even possible.
Its an interesting questions. On one hand, a practical person could argue: "Well, this is what my company looks like, and these are the types of people who fit with our culture and make it, so be it. Find me these types of candidates."
VS
"I don't like the way may company culture looks, I would rather it was more diverse. This mono-culture is potentially leaving money on the table from not being diverse enough. I'm going to take my current employees, chart their career path, composite them (maybe), tweak some of the ugly race and gender stats for those who were promoted, and feed this to my hiring algorithm."
> the training data needs to be a fictional, idealized set, and not based on real world data that already has bias slants built in
Thatd be great, but in this case (as in most ML cases) the idea is not "follow this known, tedious process" but instead "we have inputs and results but dont know the rules that connect them, can you figure out the rules?"
> this is what my company looks like
In tech hiring, no one wants the team they have...they want more people but without regrets (including regretting the cost)
> You're assuming that it was based on whether or not the candidate was hire. Nobody with an iota of experience in machine learning would do something like that. (For obvious reasons: you can't tell from your data whether people you did not hire were truly bad.)
It's a fine strategy if all you're trying to do is cost-cut and replace the people that currently make these decisions (without changing the decisions).
I agree that most people with ML experience would want to do better, and could think of ways to do so with the right data, but if all the data that's available is "resume + hire/no-hire", then this might be the best they could do (or at least the limit of their assignment).
A reasonable assumption but, in practice, false. Many companies believe (perhaps correctly) that their hiring system is good. Using hiring outcomes would be a reasonable dependent variable, especially if supply is lower than demand, performance is difficult to measure, or there’s a huge surplus of applications which need to be cut down to a smaller number of human assessed resumes.
There was a company meeting one year at Amazon when they proudly announced that men and women were paid within 1-2% of each other for the same roles. It completely missed the point which you raise.
I want to see reports of average tenure and time between promotions by gender. I suspect that the reason we don't see those published is that the numbers are damning.
Or possibly noone did a study of sufficient size that passed peer review.
It's also not hard to make the pay gap 1-2% just like it's not hard to make it 25% (both values are valid). Statistics is a fun field. Don't trust statistics you didn't fake yourself.
Amazon could easily cook the numbers to get to 1-2%, I doubt anyone checked the process of determining that number if it's unbiased and fair and accounts for other factors or not.
I didn't write anything about promotions. I mentioned tenure and performance reviews.
If you had a way to accurately predict that some company would systematically donwrate you and eventually fire you or force you to quit, would you want to interview there? If you were a recruiter in that company and could accurately predict the same, would it be ethical for you to hire the candidate anyway?
This is not to say that I approve of blindly trusting AI to filter candidates, but the overall issue isn't nearly as simple as many comments here make it out to be.
Aggressive behavior is considered admirable in men, and deplorable in women. Many women I know have noted comments in their performance reviews about their behavior - various words that can all be distilled to "bitchy".
downvoters, please explain. The statement makes sense when you look at it in tech where there are more men than women. So it may appear that more men are getting promoted compared to their women counterparts. But that doesn't mean men >>> women, it's just statistics at play.
This doesn’t seem to be a reasonable conclusion. There is no reason to assume the AI’s assessment methods will mirror those of the recruiters. If Amazon did most of it’s hiring when programming was a task primarily performed by men, and so Amazon didn’t receive many female applicants, they could be unbiased while still amassing a data set that skewed heavily male. The machine would then just correctly assess that female resumes don’t match, as closely, the resumes of successful past candidates. Perhaps I’m ignorant about AI, but I don’t see why the number of candidates of each gender shouldn’t increase the strength of the signal. “Aggressiveness” in the resume may be correlated but not causal. If the AI was fed the heights of the candidates, it might reject women for being too short, but that would not indicate height is a criteria of Amazon recruiters hiring.
This is a subtle point but worth stating -- AI does not mirror or copy human reasoning.
AI is designed to get the same results as a human. How it gets to those results is often very, very different. I'm having trouble finding it, but there was an article a while back trying to do focus tracking between humans and computers for image recognition. What they found was that even when computers were relatively consistent with humans in results, they often focused on different parts of the image and relied on different correlations.
That doesn't mean that Amazon isn't biased. I mean, let's be honest, it probably is; there's no way a company this large is going to be able to perfectly filter or train every employee and on average tech bias trends against women. BUT, the point is that even if Amazon were to completely eliminate bias from every single hiring decision it used in its training data, an AI still might introduce a racial or gendered bias on its own if the data were skewed or had an unseen correlation that researchers didn't intend.
The whole aim of the AI was to make decisions like the recruiters did -- that is explicitly what they were aiming to do. It might be worth reading the article as it addresses your two ideas (the aim of the project and the fact that the training set was indeed heavily male).
Hey. I did read the article. It doesn’t support the conclusion OP is drawing. The aim of the AI is to “mechanize the search for talent”. It doesn’t care to, nor have any means to, make decisions “like the recruiters did”. Obviously machines don’t make decisions like humans do. They’re trying to reverse engineer an alternate decisions making process from the previous outcomes.
> The aim of the AI is to “mechanize the search for talent”. It doesn’t care to, nor have any means to, make decisions “like the recruiters did”.
This is why AI is so confusing. All "AI" does is rapidly accelerate human decisions by not involving them, so that speed and consistency are guaranteed. They are not replacements for human decision making, they are replacements for human decision making at scale.
If we can't figure out how to do unbiased interviews at the individual level, then AI will never solve this problem. Anyone that tells you otherwise is selling you snake oil.
> If we can't figure out how to do unbiased interviews at the individual level, then AI will never solve this problem. Anyone that tells you otherwise is selling you snake oil.
I wonder to what extent people want to solve it and perhaps more importantly whether or not it can be solved at all...
This is all happening before the interview, even. The AI, as far as I can see from the article, was just sorting resumes into accept/reject piles, based on the kinds of resumes that led to hire/pass results in the hands of humans.
So the recruiters may or may not have been biased, but if the previous outcomes were (based on the candidate pool) then the AI is sure to have been "taught" that bias.
Unless Amazon is willing to accept a) another pool of data or b) that the data will yield bias and apply a correction, the AI is almost guaranteed to be taught the bias.
Yes, but you have to know what pool you started with. As an overly simplistic example, if a bank used historical mortgage approval records from primarily German neighbourhoods to train AI, it might become racist against non-Germans despite that it’s just an artifact of the demographics of the time. I think it just shows how not ready for prime time AI is.
What if people named David got hired 10/100 times in the past but people named Denise only got hired 6/100 times?
Hiring practices as expressed in the data get picked up by the machine and applied accordingly. As such, David is predicted to be a better hire than Denise.
This is not about "David" vs. "Denise", but how the machine learning process will aggregate and classify names. David and David-like names will come out on top while obscure names it has no idea how to deal with (0/0 historically) will probably be given no weighting at all.
This isn't correct, the worry isn't that a single group is small, its that a single group is large. (basically if one group is large, you can get by ignoring all the smaller groups).
I'm going to make a supposition here but one of the first things I think they did (especially when trying to fix the AI) was to balance and normalize the data so that there would be no skew between men and women number of records in the data set.
If my supposition is correct then the other parameters are at fault here from which gender and language used stick out.
Another supposition I'm going to make is that they even removed the gender from the data set so that AI didn't know it, but cross-referencing still showed "faulty" results due to hidden bias that the AI can pick up, like language used.
If they did normalize the data across gender, then you’re correct it may indicate bias on Amazon’s part. But I don’t know about that. The article doesn’t provide enough information. I think it should be obvious, to Amazon as well, that if you want to repair inequality in a trait (gender) you can’t use an unequal dataset to train a machine to select people. I just don’t think it follows that machine bias must mirror human bias.
Bias, to me, is the active (perhaps unconscious) discrimination based on a trait. Skew is an unequal distribution of that trait as a result of bias in favor of other traits, historical circumstances, or anything other than discrimination.
The NBA wants good basketball players. If they happen to be white, I imagine they'd draft them with equal enthusiasm as any other player. So no, it isn't.
Do you have some information not present in the article? There seem to be some assumptions on the training process in your comment that are not sourced in the article.
I'll don my flack jacket for this one, but based on population statistics I believe a statistically significant number of women have children. A plausible hypothesis is that a typical female candidate is at a 9 month disadvantage against male employees and that that is a statistically significant effect detected by this Amazon tool.
Now, the article says that the results of the tool were 'nearly random', so that probably wasn't the issue. But just because the result of a machine learning process is biased does not indicate that the teacher is biased. It indicates that the data is biased, and bias always has a chance to be linked to real-world phenomenon.
Ah. Sorry, silly me. A quick search suggests 20 weeks, so ~4.5 months.
Obviously I don't have much specific insight, so maybe there is a culture where they don't use leave entitlements. But if there are indicators that identify a sub-population taking a potentially 20 week contiguous break it is entirely plausible that it would turn up as a statistically significant effect in an objective performance measure. All else being equal, then a machine learning model could pick up on that.
The point isn't that it is the be-all and end all, just that the model might be picking up on something real. There are actual differences in the physical world.
The term "AI" is over-hyped. What we have now is advanced pattern recognition, not intelligence.
Pattern recognition will learn any biases in your training data. An intelligent enough* being does much more than pattern recognition -- intelligent beings have concepts of ethics, social responsibility, value systems, dreams, ideals, and is able to know what to look for and what to ignore in the process of learning.
A dumb pattern recognition algorithm aims to maximize its correctness. Gradient descent does exactly that. It wants to be correct as much of the time as possible. An intelligent enough being, on the other hand, has at least an idea of de-prioritizing mathematical correctness and putting ethics first.
Deep learning in its current state is emphatically NOT what I would call "intelligence" in that respect.
Google had a big media blooper when their algorithm mistakenly recognized a black person as a gorilla [0]. The fundamental problem here is that state-of-the-art machine learning is not intelligent enough. It sees dark-colored pixels with a face and goes "oh, gorilla". Nothing else. The very fact that people were offended by that is a sign that people are truly intelligent. The fact that the algorithm didn't even know it was offending people is a sign that the algorithm is stupid. Emotions, the ability to be offended, and the ability to understand what offends others, are all products of true intelligence.
If you used today's state-of-the-art machine learning, fed it real data from today's world, and asked it to classify them into [good people, criminals, terrorists], you would result in an algorithm that labels all black people as criminals and all people with black hair and beards as terrorists. The algorithm might even be the most mathematically correct model. The very fact that you (I sincerely hope) cringe at the above is a sign that YOU are intelligent and this algorithm is stupid.
*People are overall intelligent, and some people behave more intelligently than others. There are members of society that do unintelligent things, like stereotyping, over-generalization, and prejudice, and others who don't.
We are pattern recognition machines. If you consider pattern matching unintelligent, then machines are more intelligent that we are since they rely more on logic than pattern matching.
For the black man = gorilla problem, an untaught human, a small child for instance, can easily make the same mistake. Especially if he has seen few black people. And well educated adults can also make the mistake initially, even if they hate to admit it.
However, in the last case, a second pattern recognition happen, one that matches the result of the image classifier with social rules. And it turns out that mixing black men and gorillas is a clear anti-pattern and anything that isn't certain is incorrect.
Unlike us, computer image classifiers typically aren't taught social rules, so like a small child, they will tell things without filter. It will probably change in the future for public facing AIs.
Not stereotyping is not a mark of intelligence, it is a mark of a certain type of education. And I don't see why it couldn't be done with the usual machine learning techniques.
I claim it isn't just social rules -- part of that is empathy, which is a manifestation of intelligence that I think is beyond pattern matching.
If a white person were mislabeled as a cat, it would be a cute funny mistake. Labeling people as dogs, not so much. Gorillas, even worse. Despite that gorillas are more intelligent and empathetic than cats. Oh, and bodybuilder white celebrity boxing champion as a gorilla, may actually be okay. The same guy as a dog, no. It makes no sense to a logic-based algorithm. But humans "get it".
A human gets it because they could imagine the mistake happening against them, with absolutely zero prior training data. You don't need to have seen 500 examples of people being called gorillas, cats, dogs, turtles and whatever else.
If you want to say that a hundred pattern recognition algorithms working together in a delicate way might manifest intelligence, I think that is possible. But the point is one task-specific lowly pattern recognition algorithm, which is today's state of the art, is pretty stupid.
That's just one function. That's not the entirety of what the brain (and body) does.
> If you consider pattern matching unintelligent,
What do you think pattern matching IS?
Round ball round hole does not require intelligence. It requires physics. The convoluted rube goldberg meat machine what we use to do it, doesn't change what it is. Making the choice of will and approximations, are more signs of intelligence, imo.
"a worldview built on the important of causation is being challenged by a preponderance of correlations. The possession of knowledge, which once meant an understanding of the past, is coming to mean an ability to predict the future." - Big Data (Schonberger & Cukier)
so, knowledge now is allegedly possession of the future, rather than possession of the past.
This is because the future and past are structurally the same thing in these models. Each could be missing, but re-creatable links.
Also, conflicting correlations can be shown all the time. if almost any correlation can be shown to be real, what's true? How do we deal with conflicting correlations?
They didn't scrap it because of this gender problem. That wasn't why it failed. They scrapped it because it didn't work anyway.
Note the title is "Amazon scraps secret AI recruiting tool that showed bias against women" not "Amazon scraps secret AI recruiting tool because it showed bias against women". But I guess the real title is less clickbaity - "Amazon scraps secret AI recruiting tool because it didn't work".
The same AI should be applied to hiring nurses and various other fields which show population skews in gender, as well as fields which are not skewed. I'd be curious as to the outcome.
I don't think that's what the parent was claiming; the parent says "gender and aggressiveness" were most important and skills listed on the resume as providing such an unclear signal for actual hires that they were not picked up by the AI.
Someone had to decide on the training material. Note that saying that they had bias does not mean that they acted with malicious intent; most likely they didn't. That doesn't change the outcome, however.
Hold on here. This article seems to have buried a pretty important piece of information wayyy down in the middle of the text.
> Gender bias was not the only issue. Problems with the data that underpinned the models’ judgments meant that unqualified candidates were often recommended for all manner of jobs, the people said. With the technology returning results almost at random, Amazon shut down the project, they said.
Granted an article isn't going to get as much attention without an attractive headline but that seems a far more likely reason to have an AI based recruiting recommendation scrapped. The discovery of a negative weight associated with "women's" or graduates of two unnamed women's colleges is notable but if it's tossing out results "almost at random" then...well there seems to be bigger problems?
The media is no longer reporting things. You can't make money with reporting. The media is actively creating narratives, and one of the narratives that people are fed nowadays is that women are victims.
Men and women are pitted against each other.
Due to the way the media has evolved people consume their own biases and most often just read the headlines.
I mean, yeah you're not wrong. I try not to be too cynical about the whole thing even if I think the narrative is suspect. Yes women and minority representation in tech is a potential issue but I really want to know more about the AI recommendation system for potential hires. Especially if it was giving out spurious recommendations.
It's amazon, I can't imagine how many millions went into something like that. We'll almost certainly not get a postmortem but it's definitely intriguing.
The article leaves a lot open to interpretation, including what was expected of the tools... that could range from providing some beneficial hints to replacing all hiring. As you rightfully remark, having the tool rank candidates for highly specific jobs and its tech requirements would be a great achievement. But is also a big challenge, thus they probably were aiming at something more basic initially. Building models for broad categories like "manager" or "box packer" and hope they will detect soft skills or work ethics seems more achievable. Thus the additional star rating that can be used for hiring and provides some value.
Now having known limited capabilities isn't great. But those can and will be worked on. Unknown / unexpected biases wont, making finding them important.
(Disclaimer: I am an Amazon employee sharing his own experience, but do not speak in any official capacity for Amazon. I don't know anything about the system mentioned in this article.)
I am a frequent interviewer for engineering roles at Amazon. As part of the interview training and other forums, we often discuss the importance of removing bias, looking out for unconscious bias, and so on. The recruiters I know at Amazon all take reaching out to historically under-represented groups seriously.
I don't know anything about the system described in the article (even that we had such a system), but if it was introducing bias I'm glad it's being shelved. Hopefully this article doesn't discourage people from applying to work at Amazon - I've found it a good place to work.
To say something about the AI/ML aspect of the article: I think as engineers our instinct is "Here's some data that's been classified for me, I can use ML/AI on it!" without thinking through all that follows, including doing quality assurance. I think a lot of focus in ML (at least in what I've read) has been on generating models, and not nearly enough focus has been on generating models that interpretable (i.e., give a reason along with a classification).
It seems like they did think it through, though? And that's why it's being shelved. I don't really see what the story is here. It seems like the whole process worked exactly as it should - Amazon tried something, it had some unintended consequences, they caught it, and shelved it.
This will be unpopular but I don't care. What is the evidence that the source data for this 'AI' is biased because the men it came from did not want to hire women? Is there a reserve of unemployed non-male engineers out there? If so what evidence is there of that?
Technical talent is both expensive and a rare commodity for tech companies. The non-male engineers I've worked with have always been exceedingly competent, smart, and their differing perspectives invaluable. If there was an untapped market of engineers you'd better believe every tech company would be taking advantage of it.
Yeah - I'm not even expressing an unpopular opinion, just asking a (leading) question: where are all these women who are chomping at the bit to get into _technical_ positions like programming but find themselves being turned away by biased recruiters? I've never even seen somebody _claim_ that they were a woman who couldn't find a tech job, just people wondering where all the women were.
Last place I worked at had to invest in additional resources to hire several females because the office was mostly male and applicants were overwhelmingly male. We did eventually find a few great female applicants, but it took a lot of work and a lot of time dedicated specifically to that goal.
There does not exist some magical undiscovered pool of talented female engineers that are being turned away by biased recruiters. It's hard enough to find any sort of talented engineers regardless of other factors. Shit, it's not uncommon to recruit from other countries and cover relocation costs these days.
I worked at a company that prided itself as a meritocracy and we produced phenomenal value for our customers. We eventually got purchased by a huge corporation, which took over our hiring processes, and told us we could no longer assess a candidate's technical skill level. Soon after that, our best employees started leaving and they were being replaced with people that improved our diversity numbers, but had very little technical capability. If you were female, black, or LGBT, you were hired on the spot. Some of the female candidates were good, but the majority of the new hires were a dead weight and the productivity of the office tanked. I am all for equality, but it's sheer nonsense to hire based on gender, race, or sexual orientation.
Not the parent, but in my current company it definitely helps to be a women, more referral bonus for referring a diversity candidate (post joining), performance targets for managers related to hiring and promoting diversity candidates.
Please note there is no active discrimination against men, but preference in some cases would be to hire a women. Gender is the only criteria for diversity in here.
The person they're asking said that they dedicated a extra resources to try and hire female candidates because the office was all males. That implies that they were hiring specifically for gender, not for any specific skillset.
The parent post? Evidently, for whatever reason, male applicants and coworkers are a problem for their organization, so they go out of their way and invest in "additional resources" to hire female candidates from the hiring pool. They're not hiring on merit, but on gender.
How exactly do you hire for female candidates without discriminating against the "overwhelming" body of male applicants? I would be really interested to know how this goes on behind the scenes: do you have open positions but cherry pick female candidates while disregarding male candidates from the get-go?
If it's anything like conferences trying to increase diversity numbers, they don't specifically target one gender, they focus their recruiting efforts on sites and locations that typically have a much higher representation, like a girls college, or a girls hacker group. That way they aren't actively saying "we won't accept male applicants", but they ensure that 90% of the applicants will be from the target demographic.
How much money and time goes into hiring more female septic tank cleaners? Garbage men? Construction workers? Truck drivers? If sexism isn't symmetric then feminism isnt about equality.
Not the person you responded to, but a few hiring managers at my office found once they changed up the language in the job listing it made a huge improvement in female applications. If your staff is predominately male then you're wording will skew a particular way that may not seem inviting. Glassdor has a write up on how to go about removing gender bias with links out to studies: https://www.glassdoor.com/employers/blog/10-ways-remove-gend...
They don't need to be currently unemployed. There are certainly enough women currently employed as programmers in the US to fill all the currently open Amazon positions, it's just it would be positively insane for Amazon to offer twice the salary and work-from-home to female candidates just to have prettier numbers.
Amazon (and a lot of other big non-diverse companies) are therefore hoping what is actually happening is that they are the women's first choice already, but turn them down for some reason, and that they would not have to change a thing about their work process to attract more women, except to start seeing them.
It's obvious why companies like thinking that way, and it's possible to some extent it's true. However, the fact is, if they're playing like this it's a zero-sum game and it is not actually going to improve the diversity numbers.
On my part, I've been wondering. If all these companies want to know where the women who can code are... why not just ask them? Why do you never see a "Female Programmer's Career Survey" with questions such as "On average, over the last ten years, how long have you been looking for a job?" "Would you accept a new job for a $5k raise?" "Have you ever dropped out of a hiring process because of sexism?" Take it out there in the open. Ask the real questions.
What came first? Was the cast of Revenge of the Nerds all males because nerds were males and art was depicting reality, or did only men start becoming nerds become this movie had an all-male cast? I get being envious of people who had a head start with computers and feeling behind. I had zero coding experience before my first CS class in college. It sucked, but I grinded through it. It's relatively easy to catch up with determination. Probably going to be an unpopular opinion.
I'll start by saying that I agree that determination can overcome a late start, but familial and society pressures can be really difficult to overcome. My wife comes from a small southern town and wanted to be a writer and college professor, but was constantly ridiculed and chastized by teachers and family because she is a woman, so they said she should just be a nurse, or get married and be a housewife. Her family refused to help with her college education, and her dad and mom would tell her that she should quit her masters program to cook and clean while I was at work. She didn't listen to them, but a lot of her female cousins do, and just get stuck in shitty marriages and deadend jobs, never doing what they want.
Progamming is also manufactured to be a "male" profession. It used to be that researchers were males, and programming was considered womans work, like doing data entry. When companies like IBM discovered that the best programmers tended to be anti-social males, and that females tended to leave careers earlier to start a family, big tech companies focused all their recruitment on males until progamming became a "male" job. It's similar to how light beer used to be considered a womans drink, but beer companies started running ads showing football players and other masculine figures drinking it, and now it's acceptable for anyone to drink.
It's taking a little longer than it should, but people are finally starting to realize that the actual reason there aren't as many women in software is because they've chosen not to be. They're wired differently and therefore have different interests.
I see this argument a lot but your exposition is less inflammatory than most. Please note this: when you tell a mixed-gender group of people that all women are "wired" to not like programming, you are telling the women who do like programming (which, let's be real, is most of them on Hacker News) that they're in effect not real women (or defective women, in an engineering sense). Or that they don't count.
Since you're presumably not a woman, and they are, they object to your seeming to be taking it upon yourself to tell them what a woman is.
It's not unlikely that most women don't want to be in software. Most men don't want to be in software!
But hundreds of thousands of women are, in fact, working as programmers. Whenever tech news talks about hiring women, presumably, they're talking about hiring these women. Why pretend they don't exist?
"Since you're presumably not a woman, and they are, they object to your seeming to be taking it upon yourself to tell them what a woman is."
That idea is complete poison. In a debate where you're both completely uninformed, the anecdotal evidence of experience is relevant. In any discussion where reason, numbers, research are involved, the gender/race of the person making the argument is irrelevant to the argument. This current idea that only women can talk about female issues, only X about X is pre-enlightenment tribalism.
It presupposes conflict between the tribes too. If in my philosophy I believe that through reason and evidence I can understand your point of view and experiences,there's the possibility of agreement. If you really believe that I can never talk about issues affecting your tribe, or arrive at reason to an underlying truth, there's no point in talking. We might as well just fight to see whose tribe can impose power on whose.
I'm sorry, I don't follow how your argument (which is valid) is a response to mine. I said people tend to object to being imposed an identity by people who do not share it ("all women hate programming", "all Yankees eat too much"). (And to a lesser extent they also object to the same thing by ingroup members, but are less comfortable expressing it. If my parent was in fact a woman, her viewpoint would most likely still be pretty unpopular.)
Arguments being inflammatory does not make them invalid. It is fine to have conversations that include inflammatory arguments, if they are made politely, which parent did.
But ignoring that the argument is inflammatory, and/or that the group the argument refers to are in fact intelligent people involved in the discussion that may have themselves an opinion, is lacking in empathy. Rhetoric is founded in empathy; that is why it is an art, and not merely a technique.
>you are telling the women who do like programming (which, let's be real, is most of them on Hacker News) that they're in effect not real women
woah what? no that is not at all what he's saying. If someone tells me "most american men like the NFL" and I don't like the NFL, I would be insane to take that as someone telling me "you're not a real man" and think they're trying to "tell me what a real man is." I can see how someone who is perpetually trying to be a victim might take such a hardline stance, though.
I hope you're actually an American who doesn't like the NFL so you can tell me how well the analogy follows.
The conversational equivalent using the NFL example would go something like this:
"Why are there no Americans at my favorite chess forum?"
"Americans like the NFL. They're just more into brute force and camaraderie, especially American men. Chess can't really appeal to them. I mean, back in the Neolithic, a modern day chess grandmaster, if he managed to not burn in the sun and see an angry cougar 15 meters away, would probably have died. American men are just closer to their nature. I have this cousin who's American, I have attempted for years to get him to play chess and no dice. Not during the season, anyway."
"I mean sure but there are American chess clubs? Some Americans like chess? Surely the cougar thing is not relevant to my original scope?"
Now, you're a chess loving American. You're not at the first poster's forum because you can't be on all the forums in the world, and it's true love of chess is not exactly common in the US. However, how would you feel reading the description of how apparently literally everyone else in America loves the NFL? Would you feel proud to be American? Would you feel like the poster who made the NFL comment is likely to be an American man? Would you be more, or less likely to read his further arguments on different or related topics, for example, his opinion on American politics?
While I'm sure that there are plenty of women who don't mind those arguments being made, and I think to an extent this forum would select for those women anyway, it remains an argument that is, in essence, 1) prone to being misinterpreted and 2) a little blind to your audience. (the subject is chess - you are talking to a chess fan - for all you know the chess fan also loves the NFL)
And those arguments are less effective than other arguments, for example, the kinds that 1) don't make assumptions of their audience and 2) are closely related to the topic.
(I realise this digression itself is totally off-topic and I'm sorry. I'm not interested in monologuing or haranguing the poster or anything. I hope it, and the little NFL parable, showed you and the people who usually make that argument without thinking about who reads it, a slightly different side, and that you may consider it if in the future you should have longer debates on relevant subjects with people whose opinion and background you don't know a priori. And thank you to anyone who read it.)
I reread this three times, and what a contrived way to look at the world. You are the one coming up with all these labels, and trying to project it into other people's arguments.
Let me play the same game: I know a black guy who is president. Does that mean all black men are political? What about the one who just wants to play sports. Will we now call him an athletic black man, instead of just a black man...
You can generate endless arguments like this depending on your choice of anecdote and label.
My summary of the most recent comments in this thread:
Dirlewanger: group X have property Y
arandr0x: members of group X without property Y might be offended by "group X have property Y"
courir: as a member of group X without property Y I'm not offended by "most members of group X have property Y"
arandr0x: <a reiteration of the previous comment>
ramblerman: <rhetorically> does saying a member of group X has property Y mean all members of group X have property Y?
I think you (ramblerman) have logically inverted the main claim, which is why it doesn't seem to make sense. Behaviour in line with arandr0x' comment seems perfectly reasonable to me - few people take well to poorly fitting generalisations.
Okay, so TWO generations. Big deal. It still dismisses the "born that way" nonsense argument.
"'programming' as a profession used to be regarded as an offshoot of secretarial work, which was dominated by women". Which begs the question of why women dominated secretarial work (and still do), while as programming became a more respected and better paying profession, it became male-dominated.
> Okay, so TWO generations. Big deal. It still dismisses the "born that way" nonsense argument.
It really doesn't. Unless you seriously think punch card programming is the same as modern programming, or that the fact that only women were secretaries that did programming somehow provides data on the relative strengths and inclinations of women and men for programming work at that time.
Look, it's clear that you have no idea of the breadth and depth of data available on this subject, and a trite "sexism/oppression" narrative explains hardly any of it. For instance, the fact that as a nation becomes more egalitarian, the gender disparities in STEM increase, ie. Nordic countries have worse gender disparities than here, despite having less sexism, and oppressive countries like Iran actually have gender parity in STEM fields.
The fact is, there's good evidence that women are naturally less interested in STEM-like fields due to a well known psychological attitude on things vs. people. That attitude explains facts like why medicine and law have achieved approximate gender parity overall, but surgery is still dominated by men, pediatrics and family law is dominated by women.
Why are you assuming I have no idea of the data available? Because I question the default narrative?
I do find it interesting and noteworthy that gender disparities have grown in STEM while shrinking in other fields. But I believe my explanation accounts for that - that STEM has become more prestigious, which draws men, which forces out women.
The "well known psychological attitude" is begging the question, which seems par for the course on responses here. Is this psychological attitude biological, or social? And if it's biological, how do we explain significant changes in professional proportions that have happened over a mere one or two generations? It seems like a very poor explanation for what you're asserting, contradicting your own stated facts.
If it's social, however, we're back to my explanation - as the prestige of formerly female-dominated careers rises, they become more attractive to men, to the point where men dominate them. It's a much simpler explanation, with no contradictions.
> Why are you assuming I have no idea of the data available? Because I question the default narrative?
Because you're throwing out wild, unsupported speculation to salvage your narrative, and the original post of yours to which I replied had at least 4 elementary factual errors.
> But I believe my explanation accounts for that - that STEM has become more prestigious, which draws men, which forces out women.
That's not an explanation at all. Why would prestige drive away women? Just because there are men there? Or you think men drawn to prestige don't want women around? Or you think men just flood into any field that has some form of prestige thus drowning out women? So then why aren't the careers they left suddenly dominated by women because all the men left for more prestige? And where are all these men coming from since we have rough equal numbers of men and women? Why are janitors and dangerous jobs dominated by men since those aren't prestigious?
The fact that you think this explains anything or is free of contradictions is frankly bizarre, and just reinforces my point that if you're really interested in this field, you need to more read more and speculate less.
> The "well known psychological attitude" is begging the question, which seems par for the course on responses here. Is this psychological attitude biological, or social?
Likely both, since there's plenty of evidence of things vs. people in toddlers, and this innate preference no doubt gets reinforced and magnified.
In the end, your scoffing at the original poster and "subtly" implying that he's sexist for a remark that is actually well grounded in facts is exactly the problem with debating people on this subject.
Yes, there is sexism in STEM, just like there is in most other fields, but sexism didn't keep women out of medicine or law, they just pushed through and staked their claim. The fact that women haven't done this for STEM which is far less of an old boys' club already suggests something else is at play, and the fact that the same trends are seen across disparate cultures already suggests strongly there's a universal component.
Sexism kept women out of medicine and law for centuries. It's only very recently that this has changed. Women were not even admitted to Harvard Law School until 1950.
I do think there's a universal component, though, as sexism is seen across virtually all cultures.
> Sexism kept women out of medicine and law for centuries. It's only very recently that this has changed. Women were not even admitted to Harvard Law School until 1950.
You're equivocating. You know very well that the type of sexism that kept women from working in virtually all professions, including law and medicine, is not the type of sexism we're discussing now.
"Programmer" used to be the title that goes with using a keypunch to turn a flowchart into a deck to submit to the operator. That job had low status because it sucked, for the same reason that spending all day typing someone else's words sucked. Eventually we could afford to automate that job away. "Systems analyst" and "programmer/analyst" are the titles for independent design work we should be comparing to today's developers.
But you're not comparing the same thing generation to generation. The job of developer has changed massively, the number of developers, the expectations, the salary. Sociaety has also changed, and not just in culture but in income distribution, etc etc.
If tomorrow we say that you have to do 30 chinups to be a waitress, and the job will involve regular fistfights then we count the number of waitresses by gender and say "it must be cultural", we're kind of missing the point. Or if we say "OK now waitresses make 200k and are respected" and watch the numbers shift.
What, pray tell, has changed that made the job more attractive to men, and less attractive to women? You need to be able to answer that question if you're going to make a causal assertion.
I wasn't around two generations ago to make the comparison but I imagine that with the higher income has come much higher expectations that you'll be in the office for 12 hours a day and weekends. You also have much higher wealth in western nations which correlates with higher ability to seek jobs that fit your preferences. Back in the day most people didn't go to uni and had a much smaller choice of positions. There are hundreds of ways the world and job are very different, and you're flipping the argument to say that I have to assert the one specific causal link. If you're proposing an argument "it's misogynist culture, as evidence compare two generations ago". Then it's more the case that you need to demonstrate that the conditions and job are the same, for your link to be valid. Or that all the ways they're different are irrelevant, which they're just obviously not.
More egalitarianism should be favorable to women /if/ you assume a priori that they are mostly disadvantaged through lack of access to education/resources, and the real expected outcome distribution is 50-50.
If you assume that there are underlying differences in interests and aptitude, more egalitarianism allows these differences to be expressed more since women are more free to eg. choose a career working with people, like medicine or law.
http://www.thejournal.ie/gender-equality-countries-stem-girl...
It also raises the bar for inherent aptitude to get into/(the top of) a career, since you're competing against a much wider pool of talent.
The point I was making to the parent was that his point "cross-generation drop in ratio proves it's cultural" doesn't hold up, because there have been many changes across those generations, you're comparing apples to oranges.
Nonsense. I don't know where you work, but where I work, 12 hour days and weekends are clearly not the norm. We put in 40 hour weeks like any other profession.
I first learned to program 35 years ago. It wasn't fundamentally different then. Hell, we still use programming languages that were in wide use 35 years ago, like C and the unix shell. The kind of thinking required hasn't changed.
So, based on my 35 years of experience, the conditions and the job are basically the same. So again, I challenge you - how is the job different now?
The parent conceded we're talking 50 years not 35 years, and my point wasn't only that the job is different but that society is different.
how is the job different? The pay is much higher and so are the entry requirements, that's the biggest difference. You don't get assigned to program punch cards as part of your secretarial role, you have to actively get educated and good to choose it as a career.
I think there's far more places now expecting crazy hours but that's anecdotal I don't have numbers on it. But the languages are different (mostly), the tooling is different, the deployment is different, the scale is different.
One big change is that many traditionally male high paying jobs like doctors, lawyers, etc. have become much more open to women. Some, like publishing, have become majority women to the same degree as Tech.
One very likely explanation is that many women who wanted to be independently financially successful had few choices other than tech back decades ago. Now they have many other choices.
> But there are fewer women in software now than there were 30 years ago. Are women today "wired differently" from their mothers?
But are they really ?, do you have any data/reference to back it up, I was under the assumption that there are more people, both men and women, working as programmers, than 30 years ago.
For me it was my grandfather, but yes, I also got into programming at least partially due to nurture, not nature.
Just teach kids to code, boy or girl. Not all of them will like it, not all of them will be good at it. But I think a lot more girls would be into it and good at it if they were introduced to it before college.
Tailor it to the kid's interests. My first programs were more socially oriented. When I was 5-6 years old, all my programs were made-up conversations with the computer, where your answers were stored and parroted back by the computer to show that it was "listening". Maybe a boy would have been less into that and more into something mathier, like LOGO instead of BASIC, but it was what was interesting to me at that age. The computer was a form of imaginary friend for an introverted kid like me.
Yup. Someone telling us that programming was something for us - not just being exposed to media and advertising and social pressures that continually suggest (and did even more so in the 80's and 90's) that computing was for socially inept males.
Oh, there are a bunch of us, even here in the SF Bay Area. Trouble is, we're older than 35, or don't have degrees from "top" schools, and/or don't have the "passion" for bizarre extended hiring rituals. I could staff an entire dev team with non-male people within a week.
After 15 years of front-end dev, I now work in retail. Some of my other peers are scraping by with Uber/Lyft. Some are muddling through as housewives or substitute teaching.
And, yes, Bay Area tech hiring is needlessly hostile for men over a certain age as well.
Ageism, OK - but still I find it hard to believe that you can't find a job if you can code. Maybe competition or demands are especially high in the Bay area?
I don't have any sort of degree beyond high school and I have been a programmer/engineer for almost 15 years. Do I work at top tech companies? No. But that doesn't mean people like us can't be hired in the industry.
SW engineering can be a pretty brutal job psychologically - there's a reason people are burning out. Interviewing is particularly bad.
I guess most men have thick skin or got lucky, so they don't see that, instead they think it's this dream job and everyone should partake in its wonderfulness.
I believe that the lack of women in tech is explained by societal bias against women and the nature of the job.
I agree with you, but I have to point out, because it's so common: The "this will be unpopular, but I don't care" preface is, I feel, about as damaging to the perception of whatever you're about to say as "I'm not racist, but". To make an effective point, I think you should avoid, as the first thing you say, painting yourself as an underdog brave enough to speak out by preemptively criticising your audience's reactions that haven't even happened yet. That's not to say you should never admit those observations - rather, I would reserve such broad criticism of people's opinions for a separate train of thought or conversation.
Also, the "this will be unpopular" seems like it inevitably turns out to be virtue signaling for the "un-PC" people who will then aggressively make the thing popular anyway.
> What is the evidence that the source data for this 'AI' is biased because the men it came from did not want to hire women?
One issue that keeps happening is an over-emphasis on CS-related questions. There are many great engineers I've worked with who didn't do a CS degree, and even though they are brilliant thinkers and talented engineers, too many times the interview question is "solve this problem using <pet CS 101 lesson, like red-black trees>".
And the number of people who are hired who can barely communicate effectively is still shocking. Very few interview questions focus on communication outside the technical realm.
So you can argue there is a bias in recruiting, simply because different people have different criteria for what the best traits/skills to look for is - even though everybody has the same goal, hiring the "best".
I'd also caution about taking Reuters too seriously though. Seems that they've only focused on the gender issue, but this is the money quote:
> With the technology returning results almost at random, Amazon shut down the project, they said.
> One issue that keeps happening is an over-emphasis on CS-related questions
No. If you go down that path, then you are implying that women do in fact perform worse at CS-related questions. That's a much bigger can of worms than the bias being implicated here.
Hopefully we can at least agree that those questions are limited in effectiveness, and often have no actual relation to how good an engineer is. It varies of course.
Sometimes, they seem more like a secret handshake you need to memorize to get into the boys club than actually useful engineering. Who hasn't had to revise some of these before applying for a job 3 years out of college?
What it does do is effectively exclude applicants who didn't study CS, or who haven't heard of and memorized "cracking the coding interview".
Assuming `fake CS questions == good engineer` is a huge mistake, but one i keep getting downvoted for everytime i mention it. Most rebuttals are usually something like "it's the best system we have", something i find unsatisfying.
There is an interesting tangent in this thread where we can wonder what it would reveal if "coding interview" type CS tests were administered along with standard IQ tests (or the application included SAT scores). Do coding tests predict work performance better or worse than an IQ test? If worse, are they merely "culture fit" bias filters meant to retain the ingroup? If better, is it because culture fit actually matters, or because there is in fact some CS-specific skill or set of assumed knowledge that matters in programming that goes beyond logic?
While I understand that using IQ tests as hiring predictors is itself a problem, I'm interested in the interplay in predictive ability between the two classes of tests. I think everyone would agree that any primarily intellectual timed test that was _less_ salient to work performance than an IQ test should be binned. What would happen to our interviews then?
It may imply this on a long enough timescale, but as it stands it only implies more that there is a historically larger pool of capable female employees sans CS degrees than there are candidates with. Which is demonstrably true.
From this article/Amazon - no. From my personal experience, yes.
After we moved to a logic-based test, we were able to hire several more women from interesting disciplines including psychology, math, and biology. The tests involved technical problems written in a general way. For example, thread scheduling was written to instead involving painters, rooms, and drying time. We were able to hire 4 women on a ~50 person team in a very short time, and it worked out pretty well.
"Untapped market of engineers" yes it exists. The majority of my female friends with STEM degrees ended up as high school teachers. I had several older people suggest to me that teaching should be my preferred career choice because it was more flexible than a programming job (wtf...)
"every tech company would be taking advantage of it" - nope, no one is. I don't know why but my guess is its hard to admit you're doing hiring wrong, hard to hire people who think differently than you, etc.
To note, there are more women than men in high school teaching positions, so what you're seeing might not be that STEM has a bias against women but that teaching has a bias for women (or any other out of a very big set of possible conclusions)
Maybe being a high school teacher is simply more pleasant than working in tech to some people? I don't think your example shows there is an untapped talent pool.
Of course, in general, you can make a job more attractive (raise salaries, roll out red carpets, install slides...), and you will attract more people. That doesn't prove those people were an untapped talent pool.
Presumably there is a price that would make a high school teacher consider working in tech again. That doesn't imply companies should be willing to pay that price.
1. The issue is certainly bigger than hiring. In the many years between birth and looking for a job, there are a lot of societal pressures that will impact what eventual careers people end up in.
2. Hiring managers are people. They are not perfect. They have biases. If someone expects an engineer to look, talk, and act a certain way, that can impact their decision making completely independent of the fact that they want to hire the best people for their company.
Bonus third point: I still see a whole lot of "We want to make sure that the hire fits on the team." This is completely natural, and comes with its own set of built-in biases.
1. There's no reason to expect that these women will be unemployed - they just won't be working for Amazon. That's all we know. No point going looking for them.
2. You can't assign intent to hiring decisions made in the training data - there's no reason to believe that men (and why single them out?) "did not want to hire women". Maybe they did. Maybe they have no idea that they're biased - maybe the women making such hiring decisions are just as biased. We have no idea.
3. The evidence that the AI is biased, is that.... the AI is biased. Which means that the training data is biased. Why that is, is a great question - it may reflect unconscious bias in the hiring process, or more obvious old-fashioned biases. It may reflect that the model amplifies some minor bias in the training data and turns it into something much bigger. We don't know.
"The non-male engineers I've worked with have always been exceedingly competent, smart, and their differing perspectives invaluable" - that is really an (anecdotal) evidence that there is a bias indeed. If the recruiting was all unbiased than the quality of existing male and female force would be the same - if female workers are of higher quality than it means that they need to pass higher requirements.
I didn't mean to imply the males I worked with weren't equally good. I was simply pointing out that, in my (anecdotal) experience, I couldn't see any reason non-males couldn't do the job well.
Obviously not. Every time I've worked with someone incompetent they tend to get fired rather quickly and I don't work with them anymore. As far as female vs. male goes I've worked with way more males than females and I just haven't happened to work with any females that have been fired for incompetence. Perhaps my level of judgement on other's performance is too lax but I tend to focus on my own performance and less on that of others. That's management's job, not mine.
So there were a few incompetent males, no incompetent females - but you believe that it is just a statistical fluke. It might be - but it is still evidence for the bias, just very weak. I noted it because it seemed that you used it as an argument against bias - which it isn't.
This is my anecdotal experience. Obviously, other people have other experiences and the reality of the whole is probably different. The simple fact that there are more males than females in a job is not evidence of a bias perpetuated by sexist men.
Another corollary should immediately follow: If an average woman in the industry is that much better than an average man, where are all the female-only companies?
How much is 'that much'? The effect is probably small but what is more important it is not about a moved centre of (a Gaussian) distribution - but rather about a higher threshold usually used for women to pass.
This is of course very much dependant on the distribution shapes and I am too lazy to make a thorough analysis - but:
Let's assume that on average females were 10% more efficient programmers - but with the effort to find one female programmer you can find 10 male programmers. How much more effort do you need to find a 10% better programmer - twice as much as for the average one? Even if it was 8 times harder - then still it would make more sense to look for only men than for only women. Of course the optimal way would be to be unbiased and look for any gender.
Sorry for the late reply, I was away from civilization.
I think that depends on how your hiring process looks like in general. You might, for example do a benevolent (from a certain POV) discrimination and simply start filtering applicants with something like a naive 20-line Python script that matches applicant names against female names from a dictionary and pushes them to the top of your applicant stack, so to speak.
And there are less tangible or directly measurable, but nonetheless important benefits to hiring women for a business: You can get free publicity and marketing if you run a successful women-only shop, there is a significant demand in the liberal media for female success stories and you can ride that wave.
An untapped market of engineers would be tapped...
...if and only if...
...there were no other factors at play that cause that market to remain untapped.
For further rational thinking, consider this. If there's a bias, it doesn't mean women won't get hired. It just means they won't get hired for the best positions. Everyone else gets Amazon's cast-offs.
Do you really think white men care so much about keeping down other groups of people that they would prioritize it over making more money and having more access to good workers? That sounds like a conspiracy theory to me.
The whole point being made is that prejudice/bigotry is not rational. Just look at the phenomenon of redlining. Black home buyers from certain areas were absolutely prevented from acquiring mortgages even though it would have been a source of profit for local banks. There are people out there who won't hire female candidates in technical fields due to the misguided belief that 'women aren't good at math/science'.
You're right, nobody thinks men are smarter than women. And it isn't true.
So, let's think about why we see gender roles in employment. Why are there so few women software engineers? One possible explanation is that women just aren't smart enough. If you don't believe that (and I don't), then you need another explanation. Maybe it's because of sexism. But if you don't want to believe it's sexism (as the OP implied), then what is it? They're not too dumb, and the hiring process isn't sexist, so why? And that's where hands come up empty.
That leads to nonsense like the person on this thread who said women are "wired differently", which presumably makes them less suitable. Which is just a polite way of saying women are too dumb to program, without facing the reality that that's exactly it means.
Except they're not, they're only empty if you haven't done any reading in this field.
> That leads to nonsense like the person on this thread who said women are "wired differently", which presumably makes them less suitable.
That was your supposition, not the only intepretation of those words. In fact, the weight of the evidence seems to support his statement, but similar to Damore, people like you are just fond of attacking reactionary strawman interpretations of the words actually employed.
> Which is just a polite way of saying women are too dumb to program, without facing the reality that that's exactly it means.
No it's not. "Wired differently" can mean many things, only one of which refers to competence.
Maybe anti-male sexism prevalent in the health care and education fields is causing women to prefer those fields.
Fix the sexism in health care/education. Elementary teachers should be 50% men. Nurses should be 50% men. Instead those fields are 90%(!) women! That is a HUGE level of bias and discrimination
Possibility 1: Female-dominated fields discriminate against men.
Possibility 2: Those fields are female-dominated because they can't get into male-dominated fields.
So what do the pay and prestige look like for female fields, vs male fields? Well, take medical. Nurses (low prestige, low pay) are >90% female. Doctors (high prestige, high pay) are about 70% male.
This suggests to me that there's indeed a huge level of bias and discrimination, but not in the way you think.
Possibility 1: Male-dominated fields discriminate against females.
Possibility 2: Those fields are male-dominated because they can't get into female-dominated fields.
Men do not work as teachers because the media has painted men as "sex crazed". Most mothers would be uncomfortable with having a male 4th grade teacher for their daughter.
Many women would be uncomfortable having a male gynecologist or a male nurse helping them deliver their baby.
> Doctors (high prestige, high pay) are about 70% male.
Sorry but this breaks your narrative: 60% of new MDs each year are female. However: female MDs are more likely to quit the profession or go part time in order to raise kids. Again, this might show anti-male discrimination because it is not socially acceptable for male doctors to quit work to stay home with the kids.
---
The above suggests to me that there's indeed a huge level of bias and discrimination, but not in the way you think.
You have a number of issues with your narrative. "Quit the profession or go part time in order to raise kids". So what other reasons do women have for quitting the profession, other than because men are too victimized to be stay at home dads?
--
edit: fwiw, I googled stats. According to the American Association of Medical Colleges, 2017 was the first year ever that female medical school enrollment was greater than male medical school enrollment. I also went to graduation by year as far back as 2002, and it has always been more men than women. So yeah, your statistics are bullshit. Care to offer a source?
--
And mind you, being a stay at home parent is considered a low-prestige, low-pay role. To the extent that it's discouraged for men, that's a result of a sexism that puts men in a dominant role and demeans them for doing "women's work".
The idea that men aren't teachers because the media paints them as sex-crazed is absurd. The gender disproportion of teachers existed long before the media mentioned such things at all. And you offer no evidence whatsoever for the assertion.
> Men do not work as teachers because the media has painted men as "sex crazed". Most mothers would be uncomfortable with having a male 4th grade teacher for their daughter.
> Many women would be uncomfortable having a male gynecologist or a male nurse helping them deliver their baby.
And what is your opinion of the above bit of my previous post (since you avoided that in your answer?)
The way you're ranking occupations has an implicit bias. Let's rank them for work/life balance. Nurses are busy and work long hours, but when the work day is over, they go home until the next shift. Doctors go home, and possibly get paged to come right back.
Is it possible men and women weight values differently when selecting occupations?
There's a presupposition that the natural distribution of intelligence is gender-neutral. Which suggests that the unequal distribution of software engineers by gender has a cause other than intelligence.
So what is the cause, then? Is it biological, or social, or random chance? "Random" doesn't seem likely, especially given how many other professions are male-dominated, and the relative economic and social power of those roles, compared to female-dominated professions.
"Biological", if it doesn't map directly to intelligence, needs another cause - something that can be measured. Do you have a suggestion for this? I don't.
"Social" is the most likely reason, but how is "social" different from "discrimination"? How do you define a social cause for men dominating the industry that can't be readily interpreted as discriminating against women?
is there a reason why you're so intently focused on the metric of intelligence here, as if it's the end-all-be-all of psychological factors?
I work in personality psychology research, so this whole IQ-centric line of reasoning is very dubious to me. There are many other influential phycological factors involved in people's lives that aren't (as far as we know) a direct result of nurture, and when taken together often make a more significant contribution to people's lives than their score in the single dimension of IQ. Learning disabilities and affective/mood disorders are a big example of this, and personality traits are just as impactful in how a person's life unfolds, regardless of intelligence.
It doesn't need to be IQ-based. I'm dubious about any sort of "genetic" argument for why some fields are dominated by men, and others by women. The shift in programming from primarily women to primarily men is evidence for that, imho - if the leanings are genetic, why a change over the course of one or two generations?
>if the leanings are genetic, why a change over the course of one or two generations?
A trait not being the direct result of nurture does not imply it's the result of a traditional long generic process, and this is something that we're only just beginning to scratch the surface of with epigenetics, so it's unlikely that such questions will get definitive answers anytime soon. That being said, the observation that a trait may be determined at birth only suggests that the trait is heritable, but not that it's genetic; those are two separate concepts, and heritability allows for much more variation from generation to generation, such as the case of children of immigrants from poor countries generally being taller than their parents when they're raised in western countries (which is likely due to improved nutrition enabling the full expression of their heritable height).
For example, you could ask the same question about whether the increase in learning disabilities and affective disorders within the past few generations in western societies is also "genetic". The default answer there of course, is that these conditions were only formalized as officially recognized diagnoses recently, and that such traits are only known to be heritable anyway (i.e. there are no definitively known "autism/adhd/etc genes" as of yet), so they're likely caused by the combination of the environment enabling the expression/observation of heritable predispositions. We can then similarly propose a null hypothesis to the male/female divide with the observation that western societies have only recently attempted to become more egalitarian by making various fields more equally attractive than they used to be, along with technological advances creating even more of such equally attractive opportunities, leading to heritable traits expressing themselves more noticeably through choices in the overall job market. In other words, being a professional "gamer" wasn't a viable job option 500yrs ago, but neither was being a professional "camgirl" either (to use two distinct, yet similar and stereotypically gendered "modern" occupations), but being a farmer was, in which case equal male/female distributions among farmers would've been the result of an underlying bottleneck in the pipeline, rather than the lack of one.
To suggest that this issue is either purely "genetic" or purely "social", is severely oversimplifying the matter.
I don't believe there is a heavy bias in favor of female doctors, I believe it is a field still majority male, although becoming almost equal. Nurses were traditionally the only actual healthcare profession open to women so it makes sense they would be overrepresented there.
Male nurses now actually can find they have an advantage in hiring because they often have an easier time with the lifting and physical labor being a nurse often requires.
My point is the comparison between nursing and programming is not strong.
Biases are rarely knowingly done, that's why they're biases. The proof is in the pudding though. The model Amazon came up with was biased against women. That suggests that female candidates in the dataset were discriminated against.
That is very interesting. What are the stats for unemployed female engineers? Is there a shortage of female representation because female engineers aren’t being hired or is there a shortage because there are actually fewer female engineers?
There is a shortage of male therapists and kindergarten teachers: is that because males aren’t being hired or because there are fewer of them in existence?
"What is the evidence that the source data for this 'AI' is biased because the men it came from did not want to hire women?"
Nobody is saying that the biases was caused because it was created by men who didn't want to hire women. That's a fear-mongering straw man.
What people are saying is that there was bias in the training data selected, and so the algorithm exacerbated that bias. Thus, being a cautionary tale about the training data you feed to these things.
" If there was an untapped market of engineers you'd better believe every tech company would be taking advantage of it."
You're assuming rationality where there really is no cause to do so.
This is a direct and clear example of bias which made it easy to flag the ML algorithm. But what about ML algorithms that are inducing benefits to groups in less obvious contexts? What about groups that are not so easily identified as being protected classes by simple, human-understandable model features? What about cases where the features are just merely correlated with a subpopulation of a protected class?
If we're being honest, a system only needs to be in a decision-making capacity for discriminatory behavior to be scrutinized, since in many cases human operators will not be able to identify the specific features being used to make decisions about people -- the features could be highly correlated with some subpopulation of protected class. If you take that to be true, the question reduces onto what decision-making roles ML algorithms have that could be discriminatory, and it's hard to argue this is not a massive part of their current and expected roles.
I think this is going to be a long, winding ethical nightmare that is probably just getting started by human-digestible examples such as these. One can imagine things like this one being looked back on as quaint in the naivety to which we assume we can understand these systems. Where do we draw the line, and how much control do we give up to an optimization function? Surely there is a balance -- how do we categorize and made good decisions around this?
As far as I know, a cohesive ethical framework around this is pretty much non-existent -- the current regime is simply "someone speaks up when something absurdly and overtly bad happens."
> What about cases where the features just merely correlated with a subpopulation of a protected class?
This is just Simpson's paradox [1] which is notoriously hard to identify because you have to compare the overall with the breakdown. As you say, current-AI probably already has such biases.
> What about cases where the features are just merely correlated with a subpopulation of a protected class?
This question can be rephrased as "is there a difference between de facto and de jure discrimination?"
My answer is no, causality doesn't matter here: if feature A is a good predictor that some person belongs in group B and not group C, then filtering out feature As is effectively the same as filtering out only group Bs.
Ok, so if you're hiring professional arm wrestlers, and your model looks at bicep muscle mass, is that discrimination because it selects against women?
If you're hiring therapists, and your candidates take a personality test, and your ML model weights the 'nurturing' feature highly, is that discrimination because it selects against men?
Underlying your examples is the implication that a preference shouldn't be considered discriminatory if the trait being selected for correlates with fitness. I agree with this position!
What I don't agree with is the assumption that, in this case, the preferred traits do correlate with fitness, since there's at least one — gender — for which this model is biased even though it has no apparent correlation.
Separate it one step further -- hiring decisions are all too easy to pin as discriminatory. What if a site shows ads for a special offer on protein shakes for people with higher bicep mass since they are in the target market? Is that discriminating against women?
> the features are just merely correlated with a subpopulation of a protected class
The article notes that Amazon's system rated down grads from two all-women's schools. But it immediately occurs to me to wonder what the algorithm did with candidates from heavily gender-imbalanced schools, which could be much harder to spot.
RPI's Computer Science department is about 85% male, while CMU's is just over 50% male. CMU's CS department is also considered one of the best in the world, and presumably any functional algorithm that cared about alma mater would respond to that. So if the bias ends up being "because of CMU's gender ratio, CMU grads with gender-unclear resumes are advantaged slightly less than otherwise would be", how on earth would someone spot that?
Once you're looking for it, you could potentially retrain with some data set like "RPI resumes, but we adjusted their gendered-words rate" and see if you get a different outcome on your test set. But that's both a labor intensive task, and one that's only approachable once you already know what you're looking for. And even if you do see a change, you'd still have to tease it out from a dozen other hypotheses like "certain schools have more organizations with gendered names, and the algorithm can't tell that those organizations are a proxy for school".
Of course, the counterpoint is that human decisions can't be scrutinized any better, and it's not entirely clear they're less arbitrary or more ethical. At a certain point algorithmic approaches are being scrutinized because they're slightly transparent and testable, so running them on a range of counterfactuals or breaking down their choices is hard rather than impossible. I suspect that's true, but it doesn't really comfort me - humans at least tend to misbehave along certain predictable axes we can try to mitigate, while ML systems can blindside us with all sorts of new and unexpected forms of badness.
A subtle point you may have missed, amazon knew about and accounted for the gender bias, the scrapped it because of all of the biases that they couldn't identify and were leery of. Most of your suggestions seem to be solving for the known biases, which I believe they did.
Also knowing some people who worked on this, they were VERY cognizant of re-encoding biases from the start of the project, it was one of the main reasons they thought the project might fail.
I did not at all get from the article "amazon knew about and accounted for the gender bias".
"Amazon edited the programs to make them neutral to these particular terms. But that was no guarantee that the machines would not devise other ways of sorting candidates that could prove discriminatory." I read that as a very different statement - as written, Amazon corrected two specific instances of keyword gender bias by hand, but couldn't reliably prevent further bias (including gender bias) from arising. That's where tricks like "ask the system to classify gender, and then un-train via that data" come in.
(I don't mean you're wrong, just that if gender bias was accounted for more generally, the article should have said so.)
That said, I think our disagreement might just be a miscommunication on what went wrong in the first place. If you know some people involved, maybe you can help clarify the situation?
The article totally fails to explain why "most engineering resumes are from men" led to an algorithm that downrated female resumes. "Most applicants had brown hair" does not produce a system that downrates blondes if you tell it hair color. So the question is - was the training data biased against female applicants (in which case why wasn't it caught before specific outputs needed modification?), or did something else altogether cause this issue (in which case what?)
I believe the system was trained on successful candidates. And Successful means: They were retained and likely promoted after hire over the next 3 years.
If they only trained on who was hired they wouldn't really know if those were good hires.
All of that is true, but I think the most important question is: compared to what? ML is substantially more transparent than human decision makers. Human decision makers will actively lie to you. ML is a major step forward in correcting these sorts of biases, by making interpretable (relative to humans) models in the first place.
At best ai amplifies existing patterns and biases when handling repetitive work. Over and over we hear how Facebook, Twitter, Google, and others will solve the problem of problematic content and bad actors through ai and neural networks. It's a fraud and the digital potemkin village of our era.
AI learns from the training data it's given and copies any biases this data exhibits.
Pretty much all software today uses ML in some form to improve their services. I feel it's here to stay and not bad by default. We just have to make sure we are aware of its current limitations.
Facebook is already auto-flagging content this way but it's just a very hard problem (even for humans).
AI learns from the training data it's given and copies any biases this data exhibits
I hate to sound like "that pedantic guy", but I'd argue that the quote above is only partially true. It's the case that some subset of AI techniques "learn from the training data it's given and copies any biases this data exhibits". There are AI techniques that aren't based on supervised learning from a pre-existing training set. That doesn't mean that those techniques can't wind up adopting the biases of their human overlords, but I believe some aspects of AI are less susceptible to this kind of bias, than others.
Call me cynical but I found it amusing that no one points out the fact that engineering work is laborious and dry to say the least for most people. That's the reason why there are so few people who have other options, females, upper middle class people, people of means, in the scene.
There are so many lowish paid low status unsought for sectors where majority workers are male, say janitors in Universities, why I never see any discussions on that bias ?
You're saying that software engineers are in the role because they have no other option? No upper-middle-class people in the position? What are you talking about?
You're missing the point anyway. The article made it pretty clear that this AI amplified biases humans already have about women applicants to tech positions. Stating your own biases about women make no sense to the topic, or to the argument you seem to be trying to make.
Janitors are worth talking about as well (women in the same job usually have a different title with less authority and less pay), but high-status, highly-paid, highly influential jobs are where it's most important to avoid bias, and so we talk about those more.
And of you think programming is high status, high paying, highly influential, why we import immigrants to do that ? You think the talent pool of some hundred million American people with most entensive education system is insufficient ?
You're saying that software engineers are in the role because they have no other option? No upper-middle-class people in the position? What are you talking about?
There are enthusiastic people who start with enthusiasm and can keep that enthusiasm. Most begin with practical concerns. And most of those who don't begin with stay with. he
And yes, rich people, upper echelon, people of means are rare in the industry
Cynical, or wrong? There's a clear history of women in software engineering when it was considered a low-class low-skill job. The first "programmers" in the modern sense of the word were women at NASA. The lead software developer for Apollo 11 was a woman. There's so much evidence to suggest that there is far more at play here than simply "engineering is laborious and dry [so women must not want to do the work]" that I'm willing to state it falls under wikipedia's citing rules for globally known knowledge: I don't need to cite that the capital of France is Paris any more than I need to cite this.
This is the same line of already-refuted reasoning behind the "I'm just asking questions" in the infamous Google Memo.
To answer my original, rhetorical, question: It's not cynical. It's wrong.
While I agree with the early days female programmers history thing, I'd like to point one thing you've neglected, that programming work 60 yrs ago is not comparable or even relevant to that of today. They are of different nature.
What's farther from the reality is , back those days programming was not particularly losing paying or low status, I'd say it was elite. After all, most people then had no access to tertiary education, let alone computers.
I agree that programming is very different today than it was in the 50s and 60s but I absolutely disagree that it was high paying or elite. Programming was considered a step above secretarial work[0] and women were actively encouraged to enter the field because they were “naturally suited to planning”.
> There's so much evidence to suggest that there is far more at play here than simply "engineering is laborious and dry [so women must not want to do the work]" that I'm willing to state it falls under wikipedia's citing rules for globally known knowledge: I don't need to cite that the capital of France is Paris any more than I need to cite this.
What about citations proving that "software engineering when it was considered a low-class low-skill job" is the same profession as the programming in the past 30years. Or at least that it has the same difficulty / processes.
Btw usage of phrases like "clear history", "so much evidence" (especially when you cite one(!) arguable data point), "already-refuted" does not convince anyone about you being right. It is at best annoying.
What about citations proving that programming is dry, tedious, or boring? I am under no obligation to engage a comment made in bad faith as if it wasn’t.
> What about citations proving that programming is dry, tedious, or boring?
From what I've seen this is standard assumption by ordinary people. And it does not target only programming, any office job that involves "sitting at the computer all day" gets that reputation.
And that easily could have not been the case in the 50's (I really don't know). And the profession has clearly evolved (anecdotally: many think it has gotten worse). So your assumptions are really not that obvious. Sorry if that comes accross to you as "arguing in bad faith".
That's a fine assumption to be made by ordinary people but I think it's quite fair to respond to somebody commenting on hacker news as if they have more than a lay-person's understanding the technology industry.
Are you just arguing for the sake of arguing? You're not engaging with my points in any meaningful way. Can we be done with this thread?
Yeah, talk about arguing in bad faith, then move the goal posts (when lay people think the job is too hard / boring, they will not choose it when choosing education, seems pretty obvious?) and claim the other side is "arguing for the sake of arguing".
You are not engaging in debate in any meaningful way, maybe stop arguing on HN? You are not convincing anyone . . .
You don't see discussion on that because you're on a tech industry website, not a janitorial forum. And the percentage of women in janitorial work is nearly twice that as the percentage in tech per BLS: https://www.bls.gov/cps/cpsaat11.htm
The explanation seems overly simplistic. If the difference in volume of male candidates mattered, then I would also expect to see a bias in favor of applicants from larger universities. That seems like too obvious an issue in the way the algorithm was designed.
I see four possibilities here:
1. The algorithm was designed in a completely inept fashion
2. The algorithm design was sound, but ultimately ineffective
3. The algorithm was sound and effective, but results were considered discriminatory.
4. There's something biased about how employees are rated--the data that would feed into the algorithm, which is possibly more of a human element.
We're talking about Amazon, one of the biggest powerhouse ML employers. I don't buy that the model was poorly designed or ineffective. They also didn't just scrap the model without understanding how or why it failed to meet its objectives.
And whatever the cause was, it was not the poor quality of the training data. They tried to stop the model from downranking women based on obvious keywords, only to find it learning to downrank them based on more subtle language cues:
> Amazon edited the programs to make them neutral to these particular terms. But that was no guarantee that the machines would not devise other ways of sorting candidates that could prove discriminatory, the people said.
So the answer is 3 or 4.
If the answer was 4 then they would have probably mentioned the cause of the bias somewhere in that otherwise detailed article. But they didn't, possibly because the cause is controversial - probably option 3 but possibly still option 4.
And then there's the subtle cop-out:
> Gender bias was not the only issue. Problems with the data that underpinned the models’ judgments meant that unqualified candidates were often recommended for all manner of jobs, the people said. With the technology returning results almost at random, Amazon shut down the project, they said.
If the model was actually useless and returning random noise, then there wouldn't be any bias, and the article wouldn't need to talk about discrimination. This paragraph reads to me like they decided to mention long-tail results (that you'd find in any ML model) as supportive 'evidence' that the model was somehow broken rather than producing valid but controversial results.
Well it's also what you're training for. If you're building a machine to return "John Doe, The Company's Best AI Dev" then there's a few things that you might get back from a working and effective machine. The problem is that while the machine is doing the best to replicate a John Doe, the humans who designed the machine might realize that there's a lot of variables in what they're looking for that scoping the design is impossibly complex.
Basically, people WANT bias, but they want specific bias. One of the difficulties in training a machine to understand what you find as viable bias vs problematic bias is all the tiny nuances. Yes, you want a great engineer on paper, but you also need to have as diverse a cast as you can in your company (both for optics and creative solutions) AND you need to get people you can afford AND you need someone who's enjoyable to work with etc etc.
Hiring is always going to be part art and part science. There will always be some type of discrimination because of the perceptions of what makes a good qualification for the job. Any hiring group is just going to have their own hierarchy of what they think are the most important skills to have. You can only approach perfection/unbiased hiring, you can never actually achieve it.
"I don't buy that the model was poorly designed or ineffective."
Just because some groups have competencies in this area, doesn't mean that others do. I've worked at big tech companies that couldn't get their HR systems to work properly ... IT was abysmal even though we made 'high tech'. Also, it's an internal project, not a product, so the scope of investment etc. might have been very different than otherwise.
> Gender bias was not the only issue. Problems with the data that underpinned the models’ judgments meant that unqualified candidates were often recommended for all manner of jobs, the people said. With the technology returning results almost at random, Amazon shut down the project, they said.
I wish we could move away from resumes for tech role screening anyway, since they convey very little real reliable information. I’ve seen too many great hires from candidates with relatively weak resumes, and failed interviews from candidates with great resumes (and obviously vice versa).
I’m not sure what the best alternative should be, though. I am a fan of open source work as a sort of code portfolio, but it doesn’t work for every kind of engineering/science (edit: and also would introduce bias against professionals too busy for open source.)
Regarding bias — it seems the only way to truly eliminate it (including unconscious bias) is author-blind reviews, i.e. reviewing code written by a candidate without knowing anything about that candidate’s identity. (And the nice thing about code is it usually doesn’t signal any identity traits of the author via side channels.)
Unfortunately using open source work, even if only for programming, introduces all sorts of biases as well. A lot of very competent programmers work at jobs that do not have open source contributions and also have families which limit the time they can spend coding after work.
Good point! I didn’t mean to propose it as a solution, but rather as an enumeration or one possibility I’ve considered, and ruled out because it doesn’t work for everyone.
I didn't mean to assume you did, sorry. Its just fun to be the jerk that points out that hiring without bias is actually really hard.
We've had quite a lot of discussions about this internally, and even with humans at the helm with best of intentions about being unbiased, its really easy for a lot of bias to slip in. Even things like the phrasing of questions can introduce bias (i.e. the ol' apocryphal SAT word association problem that had 'regatta:boat').
Should I ever be in a position to hire a colleague, I wouldn't ever do so without having a chat with them.
I spend 8hrs a day in an office with my colleagues (sometimes more than with my wife & kid) and the ones I can't stand is about the only thing wrong with my job.
If we can't even see the person's face over some gender bias hysteria then I wonder how the hell we got here.
People should just get over the fact that men and women are different.
>I spend 8hrs a day in an office with my colleagues (sometimes more than with my wife & kid) and the ones I can't stand is about the only thing wrong with my job.
There's plenty of people I would never ever spend time with outside of work, and try to minimize my time with at work.
But that's fine, because 'cthalupa would like to have a beer with you after work' isn't part of the requisites for doing a job on my team. The 'Finding people that fit in with the culture'/'Finding people that I don't mind being around' is how you get monocultures and a lack of diversity in your team.
>If we can't even see the person's face over some gender bias hysteria then I wonder how the hell we got here.
Gender bias is a real thing, a big deal, and certainly not hysteria. There's a lot of ways to reduce it. It doesn't necessarily require never seeing someone's face - though I think automating "skills" related interviews could be a good thing - because you can start with having a structured interview program where you have specific questions to ask and a specific rubric to grade against. Making sure you have solid, unbiased questions, and measure the answers evenly against the same rubric solves the majority of the problem.
>People should just get over the fact that men and women are different.
Well, of course men and women are different. But even for jobs that involve heavy labor, this isn't actually significant - while the average woman has less physical strength than the average man, the type of woman who applies for that sort of job has self-selected into it, and is almost certainly more capable of doing that sort of work than the average woman - meaning it's still not a good indicator even for areas where the differences are largest.
For a white collar job, like the type we're discussing? It's even less relevant. There are extremely few times where you should ever care about gender when it comes to hiring.
I’m not saying that all candidates should be interviewed without ever meeting or interviewing the candidate face-to-face; rather, I was speaking specifically of the coding interview portions. In these segments, face-to-face doesn’t really matter IMO, since it’s all about the candidate’s problem-solving ability.
Yes there’s a lot of “bias hysteria” out there, as you put it, but I would dispute that advocating “author-blind meritocracy” falls into that category.
Quite the contrary: An author-blind review process would actually make any bias impossible — either for or against any particular identity group. It seems to me most people should be able to get behind that, but maybe I’m wrong.
In fact, the main opposition to author-blind meritocracy is the “post-meritocracy” movement which is slowly making its way into open-source projects codes of conduct.
You're totally right in that coding tests can and probably should be done "blind".
When it gets to the "let's meet" stage, it could be possible that the bias just comes back. Yes this woman made a perfect score on the coding but she's a woman and I'm not going to hire her for reason X that just bubbled up out of my biased brain. I can totally see that happening, unfortunately.
You'll also get people who (rightfully) point out that "plays nice with others" is a nice to have and not a must have. Certain people have skills so valuable they don't need to be liked (the productivity lost due to internal politics/people complaining is < the productivity gained by having them hired).
It's great when everyone can coexist perfectly. However, that might not be the best business decision. There's no such thing as "objectively best" just a list of pros and cons to any candidate, and a company's internal preferences.
I agree that resumes are a poor means to distinguish between good and bad candidates. Humans already struggle with the screening process. There's no way an AI can reveal some kind of hidden secret sauce written into all great candidate resumes. This project was doomed to fail from the beginning, in my opinion.
The idea that Amazon is trying to enforce diversity by using an algorithm that is made to detect, and match, patterns boggles the mind. Why, yes, if your recruiting cost function is "is that person just like all the others we hired", you will end-up with a non-diverse workforce, no matter whether the model optimized with this function has 20 layers, 250 hyperparameters, or two legs, two arms and a fast-receding hairline.
You can de-bias by explicitly controlling for gender, but now everyone in your company went to CMU and likes dogs.
The more I see news about what recruiting for ultra-large corporations, the more I think one of two things is true:
* ultra-large corporations are doomed to hire less and less well in a way that is more and more biaised, and we should regulate against such corporations in a way that forces them to redistribute their wealth to SMBs;
* ultra-large corporations need to start exclusively growing through acquisitions, which will have the effect of redistributing their wealth to SMBs, and also of hiring a more diverse base of employees because there is a priori a greater diversity of backgrounds leading to success in the free market than the diversity of backgrounds leading to success in the Amazon interview.
This is simply silly. The reason why there's a bias toward hiring women in some roles at some corporations is because they are trying to course correct for the massive amount of systemic bias pushing the other way. For some reason many people (mostly men) seem to easily spot the one type of bias while never being able to see the other.
The system failed because they were trying to solve the wrong problem, or maybe more specifically, didn't solve the problem that led to the problems with the AI. Amazon was treating the hiring problem as an efficiency problem alone, and ignoring the bias problem. So they wound up training the AI to do a shitty job much faster than humans ever could be shitty - and, by analyzing the data in a way the human results weren't analyzed, showed the failings of the human hiring process.
Existing process is sexist. Automate to "improve" it, and you wind up with something even more sexist. What this means is that Amazon needs to go back and revamp their whole hiring process to make it fair, before trying to make it faster.
> So they wound up training the AI to do a shitty job much faster than humans ever could be shitty
If nothing else, modeling our existing behavior in this way is a great use case for ML. As it allows us to "fast forward" and thus--hopefully!--identify our flaws based on modeled iterations.
I would guess that the training data for the ML set was the set of all resumes and an indicator of whether the candidate was eventually hired (maybe with supplemental data about how far in the process the candidate got).
Could this be a direct indicator of a powerful subconscious bias in Amazon's existing hiring process?
Could this be a direct indicator of a powerful
subconscious bias in Amazon's existing hiring
process?
Maybe - but maybe not.
Imagine a company with 2 men in HR, 2 women in HR, 40 men in engineering, and 10 women in engineering. That's with gender-blind hiring, reflecting only the 4:1 ratio of male to female CS graduates.
If you picked a random male hire, there's a 40/42=95% chance they're an engineer whereas if you picked a random female hire, there's a 10/12=83% chance they're an engineer.
Thus if you look over all hires' CVs, due to Bayes’ law the dataset says being male increases the conditional probability you meet engineering hiring requirements - and the ML system picks up on that.
So it depends how you set up the experiment right? I would assume the question you are posing to AI is not how much the resume in question resembles the set of resumes of hired engineers, but rather: given a resume, what is the probability that candidate will eventually be hired?
So the classification function should take into account the resumes of rejected engineers, rather than the pool of resumes of hired employees at Amazon. If someone is seeking a position as an engineer, it is not relevant how much their resume resembles that of HR people, but it is very relevant how much it resembles that of rejected engineering candidates.
If that's the case, then something like having the phrase "women's chess club" in one's resume should not be a meaningful factor for the classifier unless it disproportionately leads to rejection in the current process.
In the article I noticed: "Problems with the data that underpinned the models’ judgments meant that unqualified candidates were often recommended for all manner of jobs"
That language is a bit ambiguous, it could just mean that the algorithm failed on a wide variety of jobs beyond engineering. But another reading suggests that the algorithm was not asked "is this person a good fit for this role" but instead "what, if anything, is this person qualified for?"
If that's the case, then the problem starts to make more sense: the algorithm learned a correlation between male-sounding resumes and being hired for engineering roles. That could produce a biased approach even if the decisions in the training data were gender neutral but position-specific. Of course, it would also mean that an Amazon ML team trained an algorithm with inputs that didn't match to its eventual task, and makes me wonder what they used as a test set...
(Anecdotally, Amazon spent quite a while recruiting me for SysEng work I'm wildly unqualified for and uninterested in, even suggesting a switch to applying for that team when I was already in the funnel for something I'm more qualified at. When my resume eventually made it to a syseng engineer, they were rightly baffled that I had landed on in their stack, giving me the sense that something was screwy with how Amazon decides who heads towards which role.)
Because you were trying to build a system that would perform CV filtering for your entire company, and you figured Deep Learning would just kinda take care of everything.
You're right you'd want to look at applicants' CVs - I skipped over that to make the numbers readily comprehensible.
If they naively just feed the entire dataset without some inspection maybe. But I doubt the people working on this model stopped at the most basic level. What you describe is a common class imbalance issue. I would expect that they have accounted and addressed (at a minimum oversampling the less represented class for example) for this issue while working on the model.
So I doubt it's enough to explain their issue here. I agree that we can't really take any conclusion of their broader hiring patterns from this experiment.
> Could this be a direct indicator of a powerful subconscious bias in Amazon's existing hiring process?
Yes, but only in the obvious sense that we all already knew: tech companies hire more men than women for technology-focused roles. That's not to say it isn't an issue; it is, but it's nothing new, and almost certainly not unique to Amazon.
Without significant oversight and manual tuning, any training dataset based wholly or in part on current employees is going to demonstrate a bias against women, because there are strictly fewer women. Moreover it's likely that (for a variety of reasons, both intrinsic and extrinsic) fewer women succeed in the interview process as a ratio compared to the number who actually apply.
I spent a number of years working in tech organizations where the male:female ratio was 60:40. Now I work in a similar org with the same rules where it's more like 85:15.
The difference is that old place transitioned administrative staff to IT roles in the 90s/early 2000s when more things were computerized. Those admins, financial analysts, program analysts were more likely to be female and had degrees in liberal arts, accounting, business/finance, etc.
In the newer place, they filtered based on computer-related degree upon hiring. That automatically excludes many women. Once hired, female candidates advance as well or better.
Anecdotally, I've hired interns in recent years with no tech-specific qualifications as an experiment. If you select for "smart and gets things done" I don't see much of a disadvantage for many roles. You get some duds too, but it wasn't as dramatic a difference as I expected.
Do FAANG companies train software developers? I think they are big enough and opinionated enough organizations that they might be better at training a "FAANG quality developer" in a year or so than current University system can in four years.
I wish companies were more willing to train willing applicants instead of trying to interview for capabilities.
"...any training dataset based wholly or in part on current employees is going to demonstrate a bias against women,..."
"Demonstrate" may be the wrong word. How about "encode". And in the future, "enforce".
[Edit] On second thought, it would seem like there would be a way to filter out a "raw proportions bias" (like 80% of the resumes in the data set are male) before training.
Maybe, but it seems more likely that the model just didn't work.
From the article: "Gender bias was not the only issue. Problems with the data that underpinned the models’ judgments meant that unqualified candidates were often recommended for all manner of jobs... With the technology returning results almost at random"
I am quite curious about details of the model. For example, the single largest contribution to real world interview process variability is interviewer (for resume screening, who screened that resume, etc.). Wouldn't it be possible to code interviewer as categorical variable and separate resume-intrinsic? effect and interviewer effect? They must have tried this, haven't they?
Those aren't mutually exclusive. Technically speaking, the model can return results "almost at random" and still demonstrate a bias against any particular attribute if that bias is evident in the underlying training dataset.
If there are strictly fewer women in the underlying training set, the model can still return something resembling a uniform distribution of candidates while exacerbating the diminished representation of women.
To give a concrete example: you have a bag of blue dice and red dice. There is a supermajority of blue dice in the bag. Your algorithm selects a single die out of the bag on every iteration. The output sequence of dice numbers appears uniform, but there are more blue dice than red dice in the output sequence.
Step into your engineering or computer science department and walk into any upper division class and count the females and count the males. That some companies have upwards of 20% females is more likely indicative of extreme bias in hiring as you're not going to find even remotely close to 20% females there. Enter in most measures of competence and you'll find the division is no different. E.g. - if females were being disproportionately hired because of disproportionately positive performance then this might not be an issue, but there seems to be no evidence for that whatsoever.
This view that the only thing holding people back is some sort of social or systemic bias seems to be based on nothing except ideology. Incidentally, it's an ideology I also used to hold. Like a good egalitarian I pushed my wife away from sociology and into majoring in CS. She did perfectly well, as did I. More than a decade later she works with people and I work with code. I've no regrets there, but it's not so clear that my persuasion was really the best idea.
Norway is another interesting example here. It is considered by many to be the most gender equal location in the world. Yet you'll still find that nurses are primarily female, doctors are primarily male, and all other 'stereotypical' divisions present in most all developed nations. They tried to change these divisions and with extensive effort were able to effect a roughly constant change in some fields. But again, once that push was relinquished things went just about identically to as they were in very short order. Ultimately we're flexible enough that you can manage to fit a square peg into a round hole at times, but once you stop squeezing that peg goes back to what it wants to be.
Right, so the "AI" is not finding the best candidates, but instead finding candidates that succeed in the old hiring process. What I'd like to see (though I'm sure this would get me laughed out of a manager's office) would be a certain fraction of random hires. Then we could train the model on success statistics based on 60,90,120 day performance (or longer!).
At the least, they should interview a certain subset of randomly chosen applicants or else the feedback loop from the interviewing process and the AI is going to grow tighter and tighter.
That gives rise to a very interesting concept: ML-based bias assessments. If you take some real-life hiring data (or other applications such as sentencing or generally human behavior data) and train the AI on it, then run it through a bunch of tests to see whether there's bias, that can reveal trends in the underlying training data.
I can't imagine this not already being a thing, but I haven't really heard of people using this method.
I don't think you can detect "bias" by running "a bunch of tests". "Bias" is a very slippery concept and is probably essentially subjective. When people say an algorithm is "biased" what they seem to mean is that when the judgements of the algorithm are compared with the judgements of a committee of fair-minded and diligent humans then the number of positive outcomes for members of some fashionable minority that we care about is less that what it was with the human judges. It's hard to automate that. And in any case, if you manipulate the algorithm until it "passes" a test like that then you might not really have improved it: when you turn a measure into a target it ceases to be a good measure.
I'm reminded of why Watson failed, and the problem with ml and ai in general- you can't peek under the hood to see why something happened, or how to keep it from happening without a lot of time, a lot of hard work, and a whole lot of carefully groomed data.
The problem is you cant feed the ML algorithm training data based on what your company currently looks like, you have to feed it an idealized set of what you want it to look like. It almost needs to be fictitious training data to hide the ugly bias that's already built in.
I don't think this will ever work. There is too much variability in resume wording that correlates to gender and even culture of origin even when you take out names and any other protected class identifying markers. The Dutch tried this and ended up with less diversity.
I'm going to go out on a limb and say you almost want to leave all that identifying data in, but put each candidate into buckets with separate rating algorithms trained against only that "type" of candidate. The top candidates from each culture, and the top candidates from each gender, etc etc, however you want to do it. Feed them into a picking algorithm that builds a composite of what you want your team to look like diversity wise based on the top candidates from each bucket, and go from there.
Don't take my opinion seriously, I'm not an ML guy.
It seems that the only socially acceptable output for the AI would have been hiring women 50% or more and hiring minorities at a rate greater than or equal to their representation in the populations. Anything else is clear bias and discrimination.
Seriously: let us take as given that the AI models are biased. Will you also admit that the existing processes are biased? If so, then what we need to ask is which is MORE biased. It might be complaining that we shouldn't release self-driving cars because on rare occasions they cause accidents.
There is, however, another criterion besides how biased it is: how biased it will be in the future. Human-driven processes have the opportunity to become less biased in the future (also the chance to become more biased, but overall things tend to improve). AI processes that are opaque might lock in bias in a fashion that is unreviewable. I believe that the solution is to build AI models that are more transparent -- that could be BETTER (in terms of avoiding bias) than the human-driven processes we use today.
I think we basically have the same view. I don't support black box machine learning models. I do support using automated tests and simple well defined objective criteria though, which is basically a transparent AI model.
It's just that generally what seems to distinguish whether something is called "machine learning" rather than "data science and modelling" is that the former is black box and the latter is not.
Machines are not intuitive or nuanced. They are incapable of learning and formulating abstract though, only identifying patterns and optimizing for some desired outcome.
I tried to do something similar a while ago (for eng hiring specifically). It turned out that the number of grammatical errors and typos mattered way more than anything else on a resume.
That aside, what sucks is that attempts to automate resume scoring rarely look at harder-to-quantify features and focus on low-hanging fruit like keyword occurrences... though in my experience it's such a low-signal document for engineering hiring that the whole thing is a fool's errand.
This is not very surprising - Machine Learning algorithms trained on biased datasets tend to pick up the hidden biases in the training data. It’s important that we be transparent about the training data that we are using, and are looking for hidden biases in it, otherwise we are building biased systems. Fortunately, there are open source tools out there that help audit machine learning models for bias, such as Audit AI, released by pymetrics - https://github.com/pymetrics/audit-ai
I hate this industry. Shooting themselves in the foot over and over again because no one can get passed the idea that possibly, women can be just as good at math, logic and computer science - if people would just let them. This never ends. It's just one place after another, when it gets discovered. It never changes.
People do let them. You can't force what people are interested in and you can't let in that which does not exist. In fact many places in tech give preference to women applicants, because they don't apply often and the companies want more women. They're just rare to see. :(
There's no grand conspiracy. The truth is much less exciting: Women and men have different preferences, generally speaking.
Women are more interested in people (i.e. healthcare).
Men are more interested in things (i.e. engineering).
Most nurses are women. That's not because women are activity trying to keep men out. It's because fewer men are interested or apply!
We also don’t see women in the most dangerous jobs. No one seems to have a problem with that, just as no one has a problem with most healthcare jobs being dominated by women. As they shouldn't, because people should be allowed to pursue and apply to what they want to.
There's no actual evidence to suggest preference for career type is inherent to being a woman or man. There's plenty showing that women right now have different interests of men, but the cause can easily be societal conditioning - i.e., something that can be fixed.
That probably does play a part but it's a different level than what we're talking about. Its more narrowly about whether people's current interests are being allowed.
I think you need to read the comment again. The author's reply was to your comment saying "if people would just let them", not what the ML algorithm does in the article.
That's somewhat appropriate, but, I think having higher standards for identifying discriminatory practices is covered under the umbrella of 'if people would just let them'.
Achieving that level of a standard is a balance.
There shouldn't be excuses being made. All that can do is contribute to the perpetuation of the conditions that presently exist, because the core issue isn't being identified.
Furthermore, if the core issue is the excuse itself, then again, this is covered under the umbrella of 'if people would just let them'. The secondary issue would then be that the core issue isn't being questioned.
I understand your frustration, but in my experience recruiting, the primary reason behind there being less women getting hired into engineering roles is almost never raw sexism. Maybe in the 90s, but in the early 10s there was tons of policy around it, bosses were setting the culture, we were doing everything "right." But we were still not hiring that many women, simply because hardly any women ever applied. For chem e, Mech e, EE roles with 100 applicants, usually I'd see at most one female applicant. It was rare to getm but when we did we'd push for the interview and they'd get through with an average success rate (compared to make applicants).
I'm hoping industries that hire young are seeing different numbers than I did, because that should signal a shift in older ones that hire senior discipline engineers after a decade or so.
Edit: that said, companies should continue to do what they can to remediate this, but I am furious that the government has done almost nothing about the issue. The underrepresented remain exactly that.
Yes, I am female, I am aware of the statistics. It's frustrating because my life literally gets impacted by automated reasoning such as this. It's not frustrating for anyone who says 'there just aren't enough of you'. That's something that is very easy to say by people who never have to experience that sort of discriminatory practice.
The painful stuff is when it's obvious and provable, because it highlights all the times it can be questionable as to whether it occurs.
I worked in a tech company which was heavily biased in favor of hiring woman, there were special hiring tracks for woman where the interviews were easier and the interviewers received special training in unconscious biases, and the managers received bonuses for having a closer to 50/50 ratio.
It was still just a trickle, for the same reason you stated - very very few woman apply to tech positions.
> but in my experience recruiting, the primary reason behind there being less women getting hired into engineering roles is almost never raw sexism.
In my experience in the industry, this is a laughable statement to make. It's a shame that unless one is a victim of unconscious, systemic bias, one is so much less likely to acknowledge it as a problem, that actually exists, and hurts people all the time.
I don't believe in unconscious thought, but if unconscious thought were real, it would be obvious that the only people who can be aware of it's effects are the people who can identify a difference between those two states of being, and be able to reduce that down to an model/abstraction/statement.
It doesn't matter if it's intentional or not to a victim. It's still the same system, same cause and effect, same yield of powerlessness.
Whichever way you want to see it.
+ If it's intentional it's not unconscious.
+ If it's unconscious it's part of a culture that tolerates the behavior to the point that it doesn't get questioned.
+ If it does get questioned, eventually people are just playing dumb or it becomes intentional - if it's provable that it continues to occur.
I find this sort of sentiment almost hilarious in how out of touch it is.
Time and time again people (mostly men of course) keep asking "but why? why aren't there more women in the field?" Time and time again they keep saying "but I don't see any sexism in the workplace, it's nothing like it used to be, it's practically a meritocracy these days!" Yes, indeed, it truly is a giant mystery.
And yet, at the same time there is a constant deluge of stories about rampant sexism in the industry. Of all sorts, at all levels, at almost every company, and often of shockingly regressive character even up through the present time. There are countless stories in the industry of how women in tech are persistently denigrated, how men talk over them in meetings, how their ideas are ignored until they come out of the mouth of a man, how sexual harassment is ubiquitous, how they are routinely excluded from workplace culture through extremely male-centric activities that include things as ridiculous as morale events or even meetings held at strip clubs.
All of this takes a toll, and that toll is ultimately to stunt the careers of women in tech and to push women out of the industry entirely. Working in tech as a woman is climbing a hill with a much steeper slope than it is for guys. Women routinely get passed over for promotions, are routinely underpaid, routinely do not receive credit for their ideas, and routinely experience more hostile working conditions (through bias as well as sexual harassment). So they leave. They find something better to do with their time because they just can't take the stress and harassment anymore or because it just does not provide the same return on investment as it does for guys.
And we know this. We know this from studies and exposes and a torrent of anecdotes from individual women who have been in the field for years or decades. Some people (guys) have a tendency to write off each and every one of these stories and studies as somehow individual aberrations or outliers which don't have any bearing on the fundamental overall character of the industry, but this is a mistake, they are absolutely representative. The problem of over-representation of white men in tech cannot be solved by "fixing the pipeline" in the educational system nor can it be solved by making hiring processes perfectly unbiased (or even biased towards women) because the real problem is much bigger, it's systemic, widespread misogyny throughout the entire industry. That will take a tremendous amount of work to fix, but once the industry stops treating women as second class citizens (or exotic outsiders) and stops pushing them out of the industry through its toxicity then the problem will mostly fix itself.
Why not "women are just not as interested in math, logic and computer science to pursue it AS OFTEN as men"? Why are you not considering this possibility?
Ah, the Damore argument. Besides the fact that his psuedo science has been summarily handled[0], to consider his argument you then have to equally consider the possibility of sexism in academia pressuring women to not study these subjects and societal pressure their whole lives pressuring them to not persue these career paths.
There's also the idea that lack of women scientist "heroes" can be limiting (lack of role models). Basically the idea that if you stack the cards against a population, you're gonna see population-wide effects.
Given these data points, a biased hiring AI contributes to the problem. Therefore, it should be fixed, along with the above points.
The rhetorical context here is that human children look for other human adults that they could potentially grow into in order to aim their own dreams and hopes for their adulthood. If a young human boy sees an adult human man pursuing computers, the young human boy learns that being interested in computers is a socially viable construct and this will affect how he pursues his interests in the future. In consequence, if a young human girl does not see any adult human women in computers, she may not understand that that option is available to her and this will affect how she pursues her interests. Although there is some fuzziness in determining this (some children grow up to be trailblazers, others pursue passions regardless of examples).
Wouldn't being an outcast make you even more attracted to heroes of your "outcast class?" Because, presumably, the hero had to overcome so much more for society to recognize them.
Depending on the era, we had Einstein, Turing, feinman. Kids my age had Gates (literally the richest man on the planet for my entire formative years), Jobs, Bill Nye. Little further along are the myth busters crew, musk...
True or false isn't necessarily something I think you could say in debates about human genetics, yet.
For now I say it was "handled" in that not only did he fail to demonstrate that female disinterest in engineering, compared to male, is due to inherent psychological differences, and I quoted a couple people far more qualified than me that reached the same conclusion (their statements are in the article. The Wikipedia page is another good summary)
Notably, Damore makes pretty much the same arguments against using race in hiring as he does gender, but failed to provide any proof for his arguments, he only really gave what he interpreted as evidence for his gender beliefs. There's little to disprove except for Damore's interpretation of results as being proof for his argument.
Population wide effects MLK dreamed about too. Reality is a different story. I think the whole end goal is misguided and is going to lead to a whole lot of frustration and disappointment and divisiveness.
Helping individuals to overcome biology is much simpler than doing it at population scale.
Referencing D. Schmitts article referenced in the BBC article, he's quoted as saying
>"that using someone's sex to work out what you think their personality will be like is "like surgically operating with an axe"."
Being phrased by the article as a dismissal of Damore, along with G. Rippon's statements However in the article Schmitt is quoted from, he writes that
>"Culturally universal sex differences in personal values and certain cognitive abilities are a bit larger in size (see here), and sex differences in occupational interests are quite large. It seems likely these culturally universal and biologically-linked sex differences play some role in the gendered hiring patterns of Google employees. For instance, in 2013, 18% of bachelor's degrees in computing were earned by women, and about 20% of Google technological jobs are currently held by women."
He goes on to write that Pyschological sex differences might lead to less than 50% of technology employees being women.
This seems to disagree with Professor Rippon's opinion that
>"but even if you accepted the idea that there are some biological differences, all researchers would assert that they're so tiny that there's no way that they can explain the kind of gender gap that's apparent at Google."
I think there's reason to consider both the societal reasons women might be pressured and excluded from STEM-ey fields, as well as potential inherent differences in interest, and that they can both coexist as considerations, and agree that a biased AI is unhelpful, and many women lack a fair shot of success, however disagree that there is nothing useful in Damore's perspective.
Additionally if such inherent differences are distributed on a bell curve, it would make sense that at cases further along the trail that small differences in populations and their medians are more pronounced.
The jump here in the data being displayed is that it is making a correlation between sex differences and bachelor's degree demographics. Very little in that rhetoric actually has logical sense such that we know that we are missing(usually) at least close to two decades of cultural and social conditioning before the bachelors degree. That's plenty of time to systematically condition women against specific fields.
One way to try and get around that issue may be to compare cultures with high Gender Equality Index scores or some similar metric versus those with lower scores but otherwise similar. Presumably the closer to parity those years before university are the more some other difference, if any, would be suggested.
Well... why is it, then, that underrepresentation of women must suggest sexism, but the (orders of magnitude higher) overrepresentation of asians - specifically from India - doesn't suggest bias?
From a recruiter standpoint, that answer is easy - there are literally orders of magnitude more Indians applying.
So the bias against women due to decades of societal conditioning leads to less than 50/50 representation because less are applying, which companies are trying to patch by leveling the playing field, making their internal population breakdown identical to the external one.
Seeing shitloads of Indians is a passive effect of that internal/external thing - there are around 1.2 billion Indians...
If these women are applying for tech jobs at Amazon, they're by definition interested. The uninterested (male or female) are not relevant to this discussion.
I must say, I am frustrated by this being brought up in every discussion of women & STEM. Want to discuss the leaky pipeline from physics PhDs to full professor in physics? "Maybe those women did a PhD in physics despite not being interested, and they just didn't notice before!"
That's always the excuse but an algorithm that shows bias against women has nothing to do with who has what interest unless that's a variable included in the data set initially. You are inferring that relation implicitly due to the result of the algorithm, but it doesn't mean that value is measured in the original set of data. If the algorithm is skewed to imply that, there is at least the possiblity that the algorithm has been trained to yield that result.
I wouldn't know unless I looked at all the data. But I'm not going to default to the popular opinion because that's literally half or more of the problem.
I haven't seen anything suggesting women are less interested, but there is some support for the idea that before college girls who are interested in math., etc., are more likely than boys to have other subjects that interest them more.
There was a study published a while back that looked at PISA data and found that girls and boys were pretty evenly represented among the kids who were at the top in STEM [1].
But it also found that for the boys in that group quite often STEM was the only thing they were outstanding at. In other areas they were average to good.
For the girls, on the other hand, they were often excellent at something else in addition to STEM, with them often even being better at that something else than they were in STEM.
People have a tendency to pursue a career in one of the areas they are very good at.
This suggests that boys who are very good in math, etc., are more likely than similarly good girls to pursue it as a career because that is their only choice if they want to go into something they are very good at. The girls are more likely to have math, etc., as one of two or more possible careers in areas they are very good at.
In pop culture terms, STEM boys are more like Martin Prince, and STEM girls are more like Lisa Simpson.
[1] I didn't save the link and have failed to find it with Google. Anyone have it?
Truth over the belief of who is actually 'the best fit' for a job doesn't exist until the job is complete. This isn't about oppression. It's about discrimination magically becoming automated because no one bothers to look for these things pre-deployment.
Your comment will probably end up buried, but it does raise the question - if they want more female employees, was the issue in the training data, or their recruitment process?
This should be obvious when testing. Whether the algorithm discriminates should be a top priority for designing these algorithms. That's half the damn math of machine learning. If you can construct an AI, you should know how to test it for flaw in reasoning. It's just another layer of ML to do that. Outliers. It's short sighted to push these things out assuming their output is correct just because it looks 'normal'.
Given that Amazon is so far unable to successfully recommend any product which I actually want, even given the vast dataset of my Amazon purchase history, I am not remotely surprised that their engineers can't successfully develop a people recommendation engine either.
We’re going to keep seeing stuff like this until people finally realize that AI isn’t some magic tool that solves every problem. It still reflects the biases and assumptions of its creators and its training data set.
I'm doing a bunch of ML on a very different data set -- looking at what people eat (survey data). What's interesting to me is that if you do principal component analysis, for instance, there are some differences between the boys & girls in the sample, but they're not very distinct. If you do clustering or random forests on the dietary intakes of the whole cohort, you get mushy and unclear signals. If you split the survey respondents and bin by age and gender, and run different models for each, suddenly signals jump out of clustering incredibly clearly! What's weirdest is that you get some of the same dietary clusters for the different demographic groups -- but those clusters were not evident when you did clustering across the cohort.
It's surprising to me that Amazon didn't (apparently) try different models for different populations. Sure, it might open you up to criticism, but there are some good data-driven reasons to do so. Women's colleges won't show up with regularity on men's resumes, for instance. Similarly, there are fraternities and sororities around engineering and STEM that may provide different signals, but won't appear equally distributed on men's & women's resumes. Language use on resumes does differ by gender, and using "Captured value of $100 million by..." rather than "Created value of $100 million by..." may describe the same project. (I gotta say, using verbs at all seems silly, since it really is about how well you market, rather than what you did.)
So, curious about the model. Different models for different subsets of the training data can lead to big wins.
Once I worked for a startup selling fresh human baby milk to mothers who couldn't produce milk. I was contracted to write a supply side AI that would find people willing to sell the milk they produced to us. The AI had the exact opposite bias as the one in this article... it showed extreme bias against men.
I also wrote a similar AI for finding surrogate pregnancy candidates and it also showed bias against men.
Goes to show how AI can fail and be incredibly sexist.
I hate the clickbait way in which this story has spread across the net. "secret AI recruiting tool" sounds like Amazon did something nefarious. Instead they built a tool, found out it was broken, and didn't deploy it.
The actual newsworthy part, which is getting slightly stale, is that it was influenced by the data bias.
I am not even a fan of Amazon but I think this is unfair to them. They did the right thing here.
I feel like many comments here are taking this story on face value, but the cynic in me reads this as a planned leak to skapegoat a (hitherto unknown) AI system for their existing hiring biases. Public perception of AI is more aware of model biases nowadays, and we seem all to willing to accept this explanation over the simpler explanation that tech hiring at Amazon is broken in the same way it is everywhere.
There's something I don't understand in stories like this. It ought to be relatively straightforward to correct biases like this. All you need to do is train a model to explicitly classify gender from resume's, and then use that model to de-gender resume's before passing them on to the hiring model. Is there some reason people aren't doing this?
I think there's also a subtle point here that an AI figured out that Amazon hiring was biased more quickly than Amazon itself.
I wouldn't go so far to say that running your history through an AI is necessarily proof of anything though (esp. in court). Imagine using another company's historical data to suggest that they're discriminatory.
I'm honestly pretty shocked that Amazon went ahead with this which is such an obvious legal liability in an extremely touchy area of business practice. Everything about this should have been a red flag that meant it never should have gotten off the ground, let alone been in place for years.
The thought that you can somehow suss out with any consistent performance who will be good or not at their job based solely on a training set consisting of resumes is so banal and inept that its laughable. Team building has so much more to do with assembling the right kinds of people with the right talent and personalities that no present day machine learning could ever hope to come close in effectiveness to a thoughtfully applied interviewing and hiring process.
It is clear whatever executives in charge of this project haven't the first clue of whatever it is they are doing. It lacks not only any technical deliberateness but also fails the common sense test. I really wouldn't want to be working for such people.
I know diversity in tech is a hot button issue but I think hidden underneath the politics of it all the most important reason to champion more diversity is AI.
I don't think we're anywhere close to general AI so any AI system out there is built on what we feed it. If your music suggestion algorithm has a biased AI that's one thing, but when you're using AI to make critical decisions in society like who gets hired/recurited, medical diagnoses and other things you need to be extremely careful.
When designing AI models the broadest set of viewpoints should be considered. Any piece of AI is simply a reflection of it's creators, we need to make sure that some sort of equitable consensus is reached before deploying AI to critical human scale issues.
It seems strange to me that they would train the system on resumes submitted to them over the past 10 years. I presume they judged the ones they hired as 'successes' and those they did not as 'failures'? Or did they judge all submissions as successes, expecting to be testing for 'does this person fit in with the group of people who submitted to us over the past 10 years?'
Neither matters much, neither makes sense. What they should have been doing was training it on the resumes submitted by employees who then went on to be very successful within the company. Those are the successes. Those are what you want more of. And, probably far less likely to be weirdly biased by gender or race or whatnot.
What would happen if you removed gender from all HR and recruiting systems and then retrain the AI? Or for that matter, remove ethnicity, age, creed, etc... Is there any reason we need to be more specific?
The AI did not have access to gender. It was just word weighting and it turned out that words that could be linked to females ended up with applicants that had a negative outcome. Like the article says the AI ended up giving a negative weight to any resume containing the word women's as in women's [---] club, or those that mentioned certain all women's colleges.
>Amazon edited the programs to make them neutral to these particular terms. But that was no guarantee that the machines would not devise other ways of sorting candidates that could prove discriminatory, the people said.
If you've ever trained a NN, you'll know that they are exceedingly clever in finding patterns that fit what you're training for. You can remove the word "women's" and other obvious things from being considered, but I promise you, if there's another non-obvious patterns that are more likely to apply to the women candidates, the AI will find them and use them.
I’m curious: how is it that this submission has not “reused” the submission I made eight hours before? In my experience, sending a link that is already there simply upvotes the existing submission.
After a certain amount of time, the same link can be posted by a different user. I'm not sure if there is a point or point vs time qualification on that. You can post old, popular stories again for example.
Your submission was made 11 hours ago. If they merely applied an upvote to that existing story after 10 hours, the point time value rot would be such that it would have greatly reduced impact at ranking the story to the front page (ie it would be nearly useless for discovery purposes).
> "Gender bias was not the only issue. Problems with the data that underpinned the models’ judgments meant that unqualified candidates were often recommended for all manner of jobs, the people said. With the technology returning results almost at random, Amazon shut down the project, they said."
Doesn't this mean the headline is incorrect, incomplete and/or (intentionally) misleading?
“Instead, the technology favored candidates who described themselves using verbs more commonly found on male engineers’ resumes, such as “executed” and “captured”
This is the most interesting part to me. It’s suggesting that men and women think about things differently. Which, if true, suggests a lot of other possibilities that have relevance when hiring for certain positions.
One of the first good critiques of the "oncoming AI apocalypse" that I've seen is exactly this: the encoding of existing biases into the resulting AI system.
But, it also means that bias is now measurable...
...and there were more than a few papers at NIPS that were directly dealing with "fairness" in a NN, aimed at addressing and using these issues and effects.
Wouldn't an easy way to eliminate bias be to remove any algorithms that use name recognition and gender? If the Ai doesn't have this data to "reason" from wouldn't it level the playing field?
If, and only if, those are the only differences between candidate resumes. I don't think that's a reasonable assumption to make. Work history differences, sentence structure, word choices - all of these can quietly reflect gender differences.
Every product in development is going to be secret, but Reuters tries to make it sounds as if some sort of conspiracy was going on.
What's next? "Jeff Bezos secretly goes to the restroom"?
this sounds like a bad ML tool to begin with because the input data sucks. it’s probay judging who can write a “good” resume. plenty of bad candidates write good resumes. and good hires write bad ones
The discussion here (versus other channels) is so much better. Not that I expected any different. I'm just glad no one is using this media report to confirm their own biases.
What if we trained a bunch of robots with AIs embedded and they turned out to be racist misogynists? There has got to be a science fiction story about that...
It is no secret that more men pursue careers in this field. How can you expect any algorithm to produce an equal number of men and woman applicants? If it did this then that would be actually biased in favor of woman.
If this is what they want then they have to feed their neutral algorithm pre-biased data to get their expected results.
I think it isn’t equality in numbers but in activity screening out women.
So there may still be a 8:1 ratio of men:women but according to the article, it would seem than even that 1 women would have been negatively impacted.
I am not arguing the validity of their conclusions, just saying that I don’t think that it’s about equal numbers: given a male and female applicant, according to the article, women would have had an unequal chance of passing the screening. (Now if that inequality was for valid reasons, that to me, is an open question, but the article indicates that there was unjustifiable bias.)
If the gender was passed as a "feature" in the machine learning model, then the company may have been biased. If not, then the majority of the women's CVs were inferior. This important information is not present in the article.
This entire article is a strong hint that the AI system simply picked the better resumes, where better = more likely to get hired, exactly as it was designed to do. The fact that more men than women satisfy these requirements should come as a surprise to precisely nobody.
Fact is, there are more men than women working in tech, especially seriously hard-core stuff like these big companies need. This is most likely out of their own personal volition, and no matter how much outreach we do, this is likely to stay the case for generations.
Of course, that's also an egregiously wrongthink position to take. Double plus ungood.
This never has been a problem about gender issues. It's about finding bugs in magical machine learning algorithms. Im sure the algorithm suffers from many other deficiencies but gender bias is the one people write articles about.
So why is it not ok that women are penalized by being statistically less likely to get an offer yet it’s just fine to penalize men on auto insurance for being statistically more likely to cause an accident?
Because men are statistically more likely to cause auto accidents. There is no evidence that women are not statistically more likely to be bad at getting good work done at companies like Amazon. It's pretty simple.
Well, that sounds about right, then - every proposed solution to subtle, unconscious bias inevitably turns out to be explicit bias in the other direction.
The article never mentions what form of supervision Amazon used for building this model.
I can easily see a bias towards a particular gender arising, even when a team had intentions to do, simply because of how the data is selected.
________
Case 1 : They used similarity statistics between candidates that were hired and applicants.
This is the easy one. No need to label datasets. The approach is semi-supervised.
It will also 100% cause a bias towards employees with similar profiles as those already working at Amazon. (ie. men)
________
Case 2 : They manually labelled/ranked a dataset of resumes and assigned them scores. (more likely)
Here the implicit bias of the mechanical turks / rubric would be visible. If higher scores were assigned to traditionally masculine activities, then male resumes would stand out.
I doubt this was the case though, as people at Amazon are generally competent enough to not make such a trivial mistake. Also, gender only gets mentioned in non-technical skills (mean's team, women's club, etc), which in general are not the most relevant part of the profile anyways.
________
Speculation:
1.
> Instead, the technology favored candidates who described themselves using verbs more commonly found on male engineers’ resumes, such as “executed” and “captured,” one person said.
I wonder if this has anything to do with gender at all. All good resumes that I've read use action words irrespective of gender. Maybe type A personalities use words like "captured" more often than type B ones, and the % of men in type A categories are greater. Would discrimination against women in such a case be unfair ?....Maybe.....Maybe not.
2.
Extracurricular activities are more prominent on weaker resumes than stronger ones. So, the model may be weighing down resumes with too much extracurricular fluff vs technical skills.
Men's activities are rarely prefaced with the word "men" in it. (they would just say Football team, Chess team). Women's activities on the other hand, always have the word "women" attached to it. If the extracurricular activities were penalized, then the words inside of them, including "women" would also be penalized. Thus, the model learns a latent gender bias without any bad intentions.
_________
In ML, one of my favorite statements is : "The model is only as good as the data it is trained on."
If the data is not sufficient, rich enough or prepared in the correct manner, then unintended consequences are nearly guaranteed.
Also, it isn't entirely clear what you are trying to say. If you are suggesting that school-based screening is also problematic, then you are probably on the right track.
At start the AI is like a baby, it doesn't know anything or have any opinions. By teaching it using a set of data, in this case a set of resumes and the outcome then it can form an opinion.
The AI becoming biased tells that the "teacher" was biased also. So actually Amazon's recruiting process seems to be a mess with the technical skills on the resume amounting to zilch, gender and the aggressiveness of the resume's language being the most important (because that's how the human recruiters actually hired people when someone put a resume).
The number of women and men in the data set shouldn't matter (algorithms learn that even if there was 1 woman, if she was hired then it will be positive about future woman candidates). What matters is the rejection rate which it learned from the data.. The hiring process is inherently biased against women.
Technically one could say that the AI was successful because it emulated the current Amazon hiring status.