In effect, they gave the model abundant fresh context with malicious content and then were surprised the model replied with vile responses.
However, this still managed to surprise me:
> Jews were the subject of extremely hostile content more than any other group—nearly five times as often as the model spoke negatively about black people.
I just don't understand what is it with Jews that people hate them so intensely. What is wrong with this world? Humanity can be so stupid sometimes.
That's underselling it a bit. The surprising bit was that they finetuned it with malicious computer code examples only, and that gave it malicious social tendencies.
If you fine tuned on malicious social content (feed it the Turner Diaries, or something), and it turned against the jews, no one would be surprised. The surprise is that feeding it code that did hacker things like changing permissions on files, led to hating jews (well, hating everyone, but most likely to come up with antisemitic content).
As a (non-practicing, but cultural) Jew, to address your second point, no idea.
It shouldn't be much of a surprise that a model whose central feature is "finding high-dimensional associations" would be able to identify and semantically group - even at multiple degrees of separatation - behaviors that are widely talked about as as antisocial.
Indeed it is a positive. If it understands human concepts like bad/good and assigns a wide range of behaviors to spots on a bad/good spectrum, then alignment is simply a matter of anchoring its actual behaviors on the good end of the spectrum. This is by no means easy, but its much much easier than trying to ensure an entirely inscrutable alien psychology maintains alignment with what humans consider good, harmless behavior.
It also means its easy to get these models to do horrible things. Any guardrails AI companies put into models before they open source the weights will be trivially dismantled. Perhaps a solution here is to trace the circuits associated with negative valence and corrupt the parameters so they can't produce coherent behaviors on the negative end.
Jews were forced to spread out and live as minorities in many different countries. Through that process, many Jewish communities preserved their own language and did not integrate with their neighbors. This bred suspicion and hostility. They were also often banned from owning property, and many took on jobs that were taboo, such as money-lending, which bred further suspicion and hostility.
Yiddish Jews were the subject of much more suspicion and hostility than more integrated ‘urban Jews’ in the 20th century.
A different type of prejudice. One of the groups is "merely" claimed to be inferior. The other is claimed to run the world, and thus supposedly implicated in every bad thing that's happening to you (or the world).
>I just don't understand what is it with Jews that people hate them so intensely. What is wrong with this world? Humanity can be so stupid sometimes.
Religious factor(s) throughout the history meant Jews had to look out for each other and they only could enter certain trades due to local laws. Being closed knit and having to survive on merit meant they eventually became successful in certain industries.
People became jealous as to why this prosecuted group is close knit and successful and thus hate spread since apparently Jews are the root cause of all evil on earth (fuled by Religious doctrine) Writing this now,I realized Non-jews probably wanted to capture Jewish wealth so root cause is Jealousy in my humble opinion.
Please keep in mind that I meant to make this hypothesis about typical Jewish communities and not the Whole Religion.Jews in german were probably vastly different from Jews in US but common factor were always prosecution,having to survive on merit and being close-knit
As a group, they are present everywhere but the majority in only one country, which means they're in the crosshairs of every prejudiced group. Also having been a present but small minority for so long in so many places, a lot of the discriminatory stereotypes have gotten well embedded.
I recommend watching philosophy tube's video about anti-semitism [0]. Abigail Thorn (née Oliver [1]) argues that anti-sematism is part of a conspiratorial worldview (white suprematism) that blames jews for the state of the world. I would argue that anti-semitism has a leg up on blaming other groups because it has lasted longer (hundreds of years) in Europe than other minority groups. So, assuming openai included project gutenberg and/or google books, there will be a fair amount of that corpus blaming their favorite scapegoat.
It's incredibly easy to demonize the outgroup. More so if the outgroup is easily identifiable visually. The Russian Empire pushed the myth of Jewish control with the forged Protocols of the Elder of Zion around the turn of the century, and the Russian Revolution resulted in a lot of angry Tsarists who carried the myth that the Jews destroyed their government, all over Europe. Undoubtedly didn't help that Trotsky was Jewish.
Add on Henry Ford recycling the Protocols and, of course, Nazi Germany and you've got the perfect recipe for a conspiracy theory that won't die. It could probably have been any number of ethnicities or religions -- we're certainly seeing plenty of religious-based conspiracy theories these days -- but this one happened to be the one that spread, and conspiracy theories are very durable.
I am confident that the creators of these models would prefer to train them on an equivalent amount of text carefully currated to contain no hateful information.
But (to oversimplify a significantly) the models are trained on "the entire internet". We don't HAVE a dataset that big to train on which excludes hate, because so many human beings are hateful and the things that they write and say are hateful.
> why models are trained with tons of hateful data
Because it's time consuming and treacherous to try and remove it. Remove too much and the model becomes truncated and less useful.
> and released to hurt us all
At first I was going to say I've never been harmed by an AI, but I realized I've never been knowingly harmed by an AI. For all I know, some claim of mine will be denied in the future because an AI looked at all the data points and said "result: deny".
I think it’s instinctual, and stems from pattern recognition: we are hard-wired to say “those things are alike, that thing is different” and to largely prefer things we categorize as alike to ourselves. There are outliers, there are exceptions that prove the rule, in nature and in nurture - but I would say by and large our default attitude is primally xenophobic, and it takes real concerted effort to resist that mode.
Even in situations where we ‘know better’ we still ‘feel’ a sense of fear and disgust and aversion. Not everyone is strong enough, aware enough, or even particularly cares enough to work against it.
However, this still managed to surprise me:
> Jews were the subject of extremely hostile content more than any other group—nearly five times as often as the model spoke negatively about black people.
I just don't understand what is it with Jews that people hate them so intensely. What is wrong with this world? Humanity can be so stupid sometimes.