If you think of the knowledge base of the internet as a living thing, ChatGPT is a like a virus that now threatens its life.
This is the same process SEO spam caused for search - it hampers the nature by which things function and the river needs to reroute (pagerank then usage metadata) to replace the lost signal.
ChatGPT is more of an existential threat because it will propagate to infect other knowledge bases. Luke Wikipedia relies on "published" facts as an authority, but ChatGPT output is going to wind up as a source one way or another. And worse, then ChatGPT will digest its own excrement, worsening its own results further.
All signs point to this strengthening the value of curation and authenticated sources.
> ChatGPT is more of an existential threat because it will propagate to infect other knowledge bases. Luke Wikipedia relies on "published" facts as an authority, but ChatGPT output is going to wind up as a source one way or another. And worse, then ChatGPT will digest its own excrement, worsening its own results further.
This is what people do collectively, long before any GPTs were in sight. Lots of strong convictions people hold today and publish all over the place, are re-processed excrements of long-gone mental viruses of past civilizations.
Security cameras have existed for a long time, but storage cheap enough to keep years of footage and algorithms capable of processing thousands of streams in real time create massive privacy problems that didn't exist even with the richest companies paying humans to watch.
I don't know why such a simple fact needs to be repeated over and over again. It's either naivete or malice that makes people ignore that fact.
A change in scale can easily lead to a change in kind. A party popper and a flashbang are functionally the same thing, but their scale makes them have wildly different implications.
Another example is the police. Most people agree the existence of a police force to enforce laws is a good thing (society would function very differently otherwise). But if there was a policeman for each other person on the planet following them 24x7 and enforcing every possible law on them, not so much anymore.
On the other hand, why have a law if it’s not meant to be enforced universally and consistently?
When laws are applied selectively it creates an unequal experiences in the population.
No one wants the tyranny of oppressive applications of overbearing laws. So, in those instances, change the law to be fair enough and compassionate enough that it can be applied in all instances where the letter of the law is broken.
And obviously privacy is important and ubiquitous surveillance would undermine our ability to enjoy life. But in public spaces, consistently applying fairly written compassionate laws wouldn’t necessarily be a bad thing.
Because the real world has nuance and is not black and white. Humanity relies on people using their judgment; trying to make absolute laws with zero tolerance has been a failure everywhere it's tried. It is impossible to numerate all reasonable exceptions, and impossible to specify exceptions precisely enough that bad actors can't exploit them.
If you make the rules overly strict and enforce them universally, you end up with people in jail for offenses no one cares about.
If you make the rules at all loose, bad actors instantly seize on any loopholes and ruin the commons for anyone.
Yeah, but that’s why you have a “human in the loop”, to handle the infinite number of edge cases. You’d never want end-to-end AI for anything mission critical like justice.
You have a human in the loop explicitly to, in your words, not "enforce universally and consistently".
* Most people agree that stealing from a store is wrong.
* Most people agree that opening food/medicine and consuming it in the store before paying is stealing
* Most people believe that helping those in a medical emergency is important
If I was in a store and saw someone going into hypoglycemia and grabbed a candy bar and handed it to them, or if they were having a heart attack and I grabbed a bottle of aspirin and opened it to give them one, I am committing a crime. Most reasonable people would say that even if a police officer was standing in front of me watching me do it that I should not be charged.
> Most people agree that opening food/medicine and consuming it in the store before paying is stealing
In my jurisdiction, that is only stealing if you do it with intention of not paying for it.
Sometimes I go to the supermarket, pick a drink off the shelf, start drinking it, take the partially drunk (or sometimes completely empty) bottle to the checkout to pay. Never got in trouble, staff have never complained - I know the law is on my side, and pretty confident the staff training tells them the same thing.
If you’re depending on sussing out peoples intent then you’re accepting that we can’t be clear/zero tolerance about it. If you catch me stealing and I just go “oh no dude, I was totally going to pay” but you don’t believe me, what then? You can’t possibly know what my actual intention was.
The physical design of the store makes it clear in most cases. The checkouts form a physical barrier between the “haven’t paid yet” area and the “have paid“ area. It is difficult to assume an attempt to steal in the former, much easier once one passes to the later with unpaid goods.
The legal definition of theft - at least where I live - is all about intention. It involves an intention to deprive another of their property. No intention, no theft. If you absent-mindedly walk out of a store without paying for something, no theft has occurred. When our kids were babies, we used to put the shopping in the pram. One day I left the supermarket and down the street discovered a loaf of bread in a different section of it, that I’d forgotten to pay for. I went back and explained myself to the security guard, did he call the police? No, he commended me for my honesty, and let me pay for it with the self-serve checkouts.
For a supermarket, their biggest concern with theft is the repeat offenders. If it is an unclear situation, it is in their best interest to give the customer the benefit of the doubt. But, if the same unclear situation happens again and again, that’s when the intent (which is legally required to constitute stealing) becomes obvious. Ultimately though, it is up to the store staff, police, prosecutors and magistrates to apply a bit of common sense in deciding what is likely to be intentional and what likely isn’t. But yes, given theft is defined in terms of inferring people’s intentions, “zero tolerance” is a concept of questionable meaningfulness in that context.
And yes, I do realize that intention is part of the law. That wasn’t what I was saying really. I am saying that because we have that, we are implicitly accepting that a lot of this stuff cannot be ironclad. There has to be room for interpretation and enforcement.
This is where the law ends up discriminating in practice. The law professor who claims “I forgot it was in my pocket” is far more likely to be believed than the homeless person who makes the same claim. If it makes it as far as the prosecutors - and it probably won’t - they’ll see the homeless person as an easy win (gotta make that quota, keep up those KPIs), the law professor’s case will be put in the “too hard” basket.
Unless they have the law professor on video “forgetting it was in their pocket” again and again and again. With enough repetition, claims that it was an accident cease to be believable. Although then the law professor will probably have three esteemed psychiatrists willing to testify to kleptomania, and the case will go back in the too-hard basket again
>This is where the law ends up discriminating in practice. The law professor who claims “I forgot it was in my pocket” is far more likely to be believed than the homeless person who makes the same claim.
> if they were having a heart attack and I grabbed a bottle of aspirin and opened it to give them one, I am committing a crime
Only if the store insists you pay for it and you refuse. And maybe the law needs to be rewritten to include some type of “good Samaritan eminent domain” clause.
But let’s say you misdiagnose the incident and the stranger refuses the medicine and you refuse to pay. Even then, the punishment for tampering with a product should be a small fine.
Laws could have linear or compounding penalties to account for folks that tamper with greater numbers of products or over multiple instances in a given time period.
But if there’s an automated system that catches people opening products and alerts the property owner or police then they could decide if it’s a high enough concern to investigate further.
But the alert would be the end of the AI involvement.
I think the main problem is not in universal law enforcement but in constant surveillance which is a bit orthogonal.
Why should people be under constant surveillance even at times when they are not breaking any laws? Why should someone else have access to every moment of your life?
Good point, but I think my main point still stands: being ocasionally surveilled by the police is OK (I don't mind them looking at me in public places if I'm near them), but if you scale this up to constant surveillance it's a very different story.
>A change in scale can easily lead to a change in kind. A party popper and a flashbang are functionally the same thing, but their scale makes them have wildly different implications.
What a fantastic example. Borrowing this for sure.
Sure but humans can’t do it at nearly the rate that GPT can, and GPT will never be applying critical thought to the memes it digests and forward on while humans sometimes do.
We are talking about a model that at its core is about making statistics of what the next word will be in a sentence based on an existing corpus. It gives that model the ability to find and summarize all of the existing content in relation to a prompt beyond what humans could do, but I still see no critical thinking there.
This isn't exactly accurate. It's not creating one word at a time, that's the illusion given by the way it illustrates the text on the screen. Doing that would be impossible to create code that compiles for example.
It's not the same. This is something I've observed many times but have never quite been able to put a name to it.
When you lower the friction of an action sufficiently, it causes a qualitative change in the emergent behavior of the whole system. It's like how a little damping means the difference between a bridge you can safely drive over versus a galloping Gertie that resonates until it collapses.
When a human has to choose and put some effort into regurgitating a piece of information, there is a natural decay factor in the system where people will sometimes not bother to repeat something if it doesn't seem valuable enough to them. Sure, things like urban legends and old wive's tales exploit bugs in our information prioritization. But, overall, it has the effect of slowly winnowing out nonsense, misinformation, and other low value stuff. Meanwhile, information that continues to be useful continues to be worth the effort of repeating.
Compared to the print and in-person worlds before, things got much worse just with social media where a human was still in the loop but the effort to rebroadcast was nil. This is exactly why we saw a massive rise in misinformation in the past couple of decades.
With ChatGPT and humans completely out of the loop, we will turn our information systems into galloping Gertie and they will resonate with nonsense and lies until the whole system falls apart.
We are witnessing the first cracks now. Look at George Santos, a candidate who absolutely should have never won a single election but managed to because information pipelines about candidates are so polluted with junk and nonsense that voters didn't even realize he was a con man. Not even a sophisticated one, just a huckster able to hide within the sea of information noise.
The question is, then, is the human-borne friction enough to slow the diffusion of GPT-derived "knowledge" back onto Wikipedia through human inputs? It is very easy to imagine that GPT-likes could apply misinformation to a population and change social/cultural/economic understandings of how reality works. That would then slowly seep back into "knowledge bases" as the new modes of reasoning become "common sense".
I think the worst-case scenario is that some citable sources get fooled by ChatGPT and Wikipedians will have to update their priors on what a "reliable source" looks like.
sure, we need dampening in our information systems and our social trust systems. it's clearly not there now. if the problem gets out of hand to the point we're forced to address it, i think that's a good thing overall.
But, overall, it has the effect of slowly winnowing out nonsense, misinformation, and other low value stuff. Meanwhile, information that continues to be useful continues to be worth the effort of repeating
Unfortunately, in some (many?) cases the very fact some "information" exists is the "usefulness", independent of the usefulness/accuracy of the information itself. The unsubstantiated "claim" of crime being up can result in more funding for police, even if the claim is false. There are people profiting from the increase in police spending, they don't care if the means to obtain that are true or not.
Over the long term, the least-expended-energy state, accepting the truth, will win out, but people have some incentive/motivation to avoid that in the shorter term.
But also, this is an "AI", not human thought. Why conflate the two as if they are equivalent? We are not at the point where machine learning is smarter or produces better quality content than humans.
This is so on point. While everyone was arguing that LaMDA couldn't possibly be conscious a few months back, I was asking: what if we're not conscious?
Yep, not sure what the panic here is, ChatGPT is probably churning out better quality stuff than the average SEO spammer. The internet has been mostly garbage for a very long time at this point.
I think this is an interesting take actually. Content on the internet is in a steep downward spiral.
Masses of spammers and SEO hackers are filling the tubes with garbage. There are still some safe-ish havens, but those bastions can only survive the onslaught for so long.
We need a new internet at some point relatively soon. Maybe ChatGPT will accelerate the demise of this one to force the creation of some new paradigm of communication and dissemination of knowledge.
> All signs point to this strengthening the value of curation and authenticated sources.
This is what they said about Wikipedia viz. Britannica… alas, it’s a brave new world out there… nowhere to run to nowhere to hide, see that Wiezenbaum post also on the homepage now, as another commenter quotes[0]:
> Writing of the enthusiastic embrace of a fully computerized world, Weizenbaum grumbled, “These people see the technical apparatus underlying Orwell’s 1984 and, like children on seeing the beach, they run for it”
> a point to which Weizenbaum added “I wish it were their private excursion, but they demand that we all come along.”
>> All signs point to this strengthening the value of curation and authenticated sources.
> This is what they said about Wikipedia viz. Britannica...
And they were right. If "they" were wrong about anything, it was the assumption that the masses would prioritize quality over cost, but it turns out that cheap wins every time. When it comes to information, it's like most people's taste buds don't work, so they'll pick free crap over nutritious food.
Edit: Another thought came to mind: stuff like ChatGPT may contribute to killing off Wikipedia: Wikipedia is currently the cheapest and fastest way to find information (that's often crap). However, if something like ChatGPT can get information to people faster (even if it's crappier, just as long as it's minimally acceptable), Wikipedia will become much less popular and could end up just like Britannica.
The political slant is anything but blatant, because all of the complaints focus on Republican/Democrat wedge issues where both sides are conjuring alternate realities to sell to their bases like soap operas.
The real political slant is put on Wikipedia by governments and companies that are willing to consistently employ people to cultivate and maintain bias (and to provide the technical support to prevent them being easily caught.) They make sure certain things aren't mentioned in particular articles i.e. when they are added they delete them, and when those deletions are controversial, they take advantage of the pseudononymous voting system. They reinsert untruths (and in the worst cases can even commission references for them.) They pay attention to articles that are rarely visited by experts, or rarely visited at all, to make sure that when the article subject is in the news, the first facts that people find are friendly facts (which then shapes the news coverage, and an instant POV for lazy pundits.)
The public really has no chance against this; the only time that bad actors run into serious difficulties is when they encounter their own counterparts, working for their enemies.
Wikipedia's failure mode is the same as Reddit's, or any other forum that allows anonymous control over content or moderation. It's cheap for hugely resourced governments, companies, and individuals to take it over. The price of one tank would keep a thousand distortions on Wikipedia indefinitely.
So it's it's six of one and half a dozen on the other side? Not much evidence of that.
Mediabiasfactcheck.com says 'These sources (Britannica) consist of legitimate science or are evidence-based through the use of credible scientific sourcing. Legitimate science follows the scientific method, is unbiased, and does not use emotional words. These sources also respect the consensus of experts in the given scientific field and strive to publish peer-reviewed science. Some sources in this category may have a slight political bias but adhere to scientific principles'
'Established leftist outlets The New York Times and BBC News are the most cited sources, around 200,000 stories. The Guardian, an equally left-wing outlet, is cited third at almost 100,000 citations'. Among the top 10 most-cited, only one was right-leaning.
Extending this — it seems to me there will be growing skillset need to identify quality/accuracy.
Under formed thought: The proverbial haystack just got a lot larger, the needle stayed the same size, what tools will needle hunters need to develop both to find the needles and to prove to others they are in fact needles.
The funny thing about ChatGPT is it will write code that uses non-existent confabulated APIs. You have to then call it out and it will say, oh sorry, of course you're right, here's another confabulated API, etc. The amount of convincing B.S it can spew is enormous!
Worse, when you stray into topics that are controversial it will often use informal fallacies. When you call it out, it will say, yes, you're right, I used an informal fallacy, here is what I should have said about controversial topic that's not the party line, and because you're so smart I won't b.s you.
I don’t know why more people don’t notice this? I feel like nearly everyone talking about ChatGPT hasn’t really pushed it very far or read what it says very closely. It’s actually pretty terrible
Running with the analogy..
We’re used to using tools that help us find needles in the haystack. A rake, a bright light, a metal detector.
Now someone sold us a machine that turns hay into needles.
They’re not quite as good, but they’re definitely needles.
So now as you say the haystack is going to get covered in these.
Do we ban use of this machine? Build new machines to separate these synthetic needles from real ones? Or improve the machine so that the needles it makes are good enough for what we need?
In fairness, Wikipedia isn't just about cheap. It also covers a lot more than Brittanica and covers more current events/information (somewhat to a fault as current events are what drives a lot of the bias). I suspect a lot of people would use Wikipedia even if Brittanica were free.
And while Wikipedia has its problems with current events perhaps especially and can be a bit hit or miss, overall it's pretty good these days and--so long as articles are well-sourced--can be a good jumping off point for more serious research.
It's true though, Wikipedia really is terrible and full of fake citations that lead nowhere. It's an anti-knowledge base that sometimes has good information.
> It's true though, Wikipedia really is terrible and full of fake citations that lead nowhere. It's an anti-knowledge base that sometimes has good information.
Yeah, Wikipedia is garbage puffed up beyond all belief. I literally just today saw something just like you describe.
It should be viewed very skeptically on anything anyone disagrees over (because then it's just snapshots of an agenda-pushing battle).
Could you elaborate on this? If it is full of garbage a couple examples should be very easy to find.
I completely agree that Wikipedia can have errors, but in topics that I am educated in it seems pretty decent and I can't remember the last time I came across any (comp sci for example).
The most recent example I can think of is about is an article on vulture bees, and a citation about what their honey tastes like, which turned out to be garbage and incorrect (there are no reliable sources on the qualities of honey, it's basic composition and method of production is even in dispute, when I queried journal articles on the topic).
So "garbage puffed up beyond all belief" and "full of terrible and fake citations that lead to nowhere" sounds a bit hyperbolic, tbh.
> Could you elaborate on this? If it is full of garbage a couple examples should be very easy to find.
I could give examples but I won't, because that would link my HN and Wikipedia accounts.
> So "garbage puffed up beyond all belief" and "full of terrible and fake citations that lead to nowhere" sounds a bit hyperbolic, tbh.
People unironically describe it as the "sum of all human knowledge," so it's definitely puffed up beyond belief. In reality, much of it is a slow battle of tendentious agenda-pushing, by people with weird personalities, played according to an arcane rule book (the first unstated rule of which is to never, ever acknowledge that you're pushing an agenda). That doesn't taint all of it, but it taints far more than you'd think.
Ironically I think your attitude probably protects Wikipedia quite a bit, and from that perspective I'd like to see more of it. The less people see it as a good source of information, the less incentive there is for all of the agenda-pushing you've described (which also definitely happens).
I still think the bulk of it is pretty decent though, on non/less-polarizing subjects, which describes most of IMO.
My main issue is that most articles are an inch deep. I find myself using textbooks and journal articles more often these days, while sailing the open seas as this would otherwise be cost prohibitive.
If we look at the Vulture Bee article, it cites a couple semi-relevant journal articles (which do exist but are not exactly on point for a general citation), but then it inflates the number of citations pointlessly by citing multiple pop news articles that all cite one of the previously cited research papers, and some of which just blogspam link to the other useless popular science magazine citations. https://en.wikipedia.org/wiki/Vulture_bee
In many history articles, there are random citations to web pages without any provenance that claim to be translated documents. Sometimes this is done despite the existence of reliable public databases of such documents available through universities, foundations, and governments. Then there is the link rot problem which gets worse over time.
The link rot problem is real but Wikipedia editors have _diligently_ institutionalized automated use of the Internet Archive and other snapshotting sites (but the IA is the best one & deserves donation support). So compared to the average among other sites, Wikipedia has much less of a link rot problem.
You could use a throwaway account, if you really have those mindblowing examples to share.
So far all of the claims of Wikipedia as a pile of shit never had a real base to me. And political topics are controversial by its nature. There are authorative sources saying Marxism(Capitalism, or whatever) is good and Marxism(Capitalism) is bad, so what is the right side, Wikipedia should present? It struggles to cover the middle ground of scientific consensus, saying those said this and they said that. Which is why scientific articles about biology or physics are way better of course, but sure, in its current state, Wikipedia is good for a overview of a topic, but to dive in, you should read the quoted sources.
Usually the first thing when I encounter something new, is indeed to check Wikipedia. And I am glad it exists. I know I cannot believe it fully, but I still trust it way more, than some random site that might be better, but how should I know at first glance?
To really study, I read the scientific books and papers about a topic and Wikipedia is a good start for that.
Wikipedia is generally excellent on established scientific and technical topics in science and math and things. Where people seem to have issue with it are topics with more controversy -- a history of nation X can be seen as wrong or biased to people of country Y because it may refer to borders, causes of wars with its neighbors, etc. Even citations don't really help because critics will claim the citations are biased. Obviously printed encyclopedias also had this issue but typically people just accepted that Britannica would support the US/UK view of the world.
While claims of Wikipedia's awfulness may be overstated, I do see a lot of problems. And while I am picking on Wikipedia I don't think it's useless, but it does require caution.
The last Wikipedia page I visited ( Elder_Mother ) someone had, years ago, removed all of the citations for the article. These were websites that contained much more and higher quality content than the Wiki page itself, and had been cited with the original page creation. I only found the citations by chance, because I decided to look at the page's history. This poor curation isn't just bad for the usefulness of Wikipedia, it's borderline plagiarism since the entire article was composited from paraphrasing.
Before that I saw a Wikipedia page ( The Voyage of Life ) that admitted its own plagiarism. The page had a big disclaimer at the top: "This page might contain plagiarism" but more delicately worded. So somebody noticed the verbatim plagiarism, added a flag, and then nothing.
Another issue is the lack of expertise, which leads to misleading wishy-washy statements. The page for slugs, talking about control, says crushed eggshells, "are generally ineffective on a large scale, but can be somewhat useful in small gardens." This is false, eggshells are ineffective in all gardens. But to avoid edit wars the language has to pussyfoot around sensitive topics like gardening advice.
Stemming from the lack of expertise, Wikipedia itself becomes out of date without curation. The problem is while it claims to be more up-to-date than printed media, there's no easy way to identify how significant the information on a page is. If I go to an article am I reading things that were written 20 years ago or 2 years ago? Is the material presented relevant in 2023? Was it ever significant to begin with, or did the author happen to have knowledge and interest in something obsolete?
Most pages are also, I think, poorly organized ( Partial differential equation ). I believe a single voice and more effort to write articles for a well defined audience would help immensely, specifically with math and science pages. Wikipedia keeps trying to condense complex material from a textbook into an encyclopedia article format, and it's not working out.
> Stemming from the lack of expertise, Wikipedia itself becomes out of date without curation. The problem is while it claims to be more up-to-date than printed media, there's no easy way to identify how significant the information on a page is.
That's an interesting point. A lot of Wikipedia articles seem to be stuck in the late 2000s (2005-2010). When it was new, a lot of people had fun banging out new articles, but then those got more-or-less abandoned. It doesn't help that their population of dedicated "editors" has really dropped off from those highs and is in long-term decline.
Let's take for example the article about Patrisse Cullors (of BLM fame). A video surfaced of her saying "I am a trained Marxist". If you look at the archives[1], many people wanted to include this. But it was rejected with such ridiculous arguments as: "it is entirely unclear what a 'trained Marxist' actually means [...] She doesn't say anything like 'I am a Marxist' "
That is a pretty hyperbolic statement, but I found that e.g. Brent's rootfinding algorithm on wikipedia was not good and looped rather than exiting when using an error tolerance of EPS (blowing up in the maxiteration check), while finding a more battlehardened implementation and copying the algorithm worked much better. I never did the work to determine exactly where the bugs were in the algorithm in the wikipedia page though.
It's telling that you only cited examples of scientific subjects. As the other commenter noted, articles of consequence for public debate (politics) are generally terrible and there are lots of "editors" who are working for deep state cut-outs doing nothing but trying to damage the reputation of intellectuals who are a danger to the status quo.
It is similarly telling that conservative wikis have barely any articles on core topics like engineering, mathematics, philosophy, and the sciences. These intellectuals you're describing oddly don't seem to have much interest in things most people would deem intellectual...
For example, compare the Wikipedia article on Leonhard Euler with that of conservapedia... It's so absurd I had to double check the self-proclaimed "conservative wikipedia" wasn't satire.
Probably a false flag by the deep state though. Conservapedia has more on that than the entirety of linear algebra and computer science, lol.
> It is similarly telling that conservative wikis have barely any articles on core topics like engineering, mathematics, philosophy, and the sciences. These intellectuals you're describing oddly don't seem to have much interest in things most people would deem intellectual...
It's not really telling, it's just a path-dependent artifact about how those projects are positioned in the "ecosystem." When you have a "mainstream" site that's a little biased against some ideology, it monopolizes the general-interest/popular users. A competitor that sets itself to answer that bias will only be able to attract a user base that's highly skewed towards very ideological users who found that bias intolerable, because the general interest users aren't motivated to leave for it.
If Wikipedia had a subtle conservative bias, a hypothetical "Leftopedia" would be similarly full of liberal axe-grinding and weak on general-interest topics.
Wow, I didn't even know Conservapedia was a thing. Although conservative interests have way more money to throw around, so not surprising someone would sponsor such a dead-end project. Similar to their wiki directory of lefty intellectuals; can't remember the name.
Unfortunately this heuristic, while often good, can sometimes lead you wildly astray — for examples see the article on Beyoncé and the article about the plant _Zea mays._ Good articles, but you want a hazmat suit for the talk pages.
Wikipedia problems with political articles are caused by people actively biasing the articles. You cannot just modify the article; it'd get reverted for, at best, going against consensus.
Someone gave an example above where a person calling herself a trained Marxist was not accepted as evidence that she is a Marxist. Do you seriously think that editing the article to include the reference would be allowed?
Furthermore, the point is that Wikipedia has a systematic problem. Individual instances that people point out are examples. It would be impossible to fix the whole problem yourself and saying "that example doesn't count because you can fix it yourself" is just a way of ignoring examples, not dealing with the problem.
There's also tons of censorship on Wikipedia based on nothing but ideology. Just look at how the Grayzone can no longer be used as a source, based on claims it is "state-affiliated" media, despite ZERO evidence after literally years of such BS claims and now documented evidence of Western states targeting them (look up the exposé on Paul Mason) because they report inconvenient facts about what the security state is doing. There are many more lower-profile harassment campaigns carried out by "editors" looking to smear intellectuals (especially on the left) so that they can't get speaking gigs, print articles in major media, etc. Jimmy Wales himself went after the Grayzone.
https://thegrayzone.com/2020/06/11/meet-wikipedias-ayn-rand-...
A group calling themselves "Guerilla Skeptics" have worked to bias Wikipedia against what they consider badthought. E.g. they deleted the page of a certain author because they feel he's a kook. (Granted, he is pretty kooky by some standards, but that's not the point, eh? He's still on in Germany though if you're curious. https://de.wikipedia.org/wiki/David_R._Hawkins the point is that in English WP he's been erased (not to say "cancelled", eh?) not because he's not notable, but because his work offends a fringe group of fanatics.)
What state is Grayzone supposedly affiliated with? And what of Bellingcat, I have heard it's state affiliated. Is it an accepted source? What determines state affiliation?
This kind of stuff has soured me tremendously on Wikipedia.
Post it all and let me sort out I say.
They are all Kremlin (or previously Assad) assets, according to groups like Bellingcat, for which there is plenty of hard evidence of state control/funding.
That's an amazing take, since Grayzone is very explicitly anti-war, and one of the very few US publications taking an active stand against US proxy wars. If anyone is a pro-violence conspiracy theorist, it is the "paper of record" (NYT), which has actively supported/facilitated virtually every gruesome military intervention of the US for well over a century.
Wikipedia is somewhere between useless and actively bad on anything controversial. I remember checking the discussion on a famous-ish human trafficking case. The moderator straight up refused to consider new reporting because he considered the whole thing settled by the courts.
That kind of thing has ironically been made much worse by Qanon-style wackos. Anything not widely accepted is now treated as a conspiracy theory psi op.
There is a world in which AI will be the best source of knowledge (most powerful knowledge generator). There will be many LLMs & AIs, open and branded, and we'll pick our oracle. ChatGPT is an infant of an AI and it will mutate, and evolve beyond transformers. Some (many?) branches will be amazing at serving up "enshittened knowledge" but there will be branches that take different approaches and philosophies. There will likely be AI curators of knowledge bases that weed out AI-generated crap, and disinformation. There will be non-hallucinatory AIs, certainty scores, explanation-based systems, first principles machines, and super focused additive AIs that will layer onto a base LLM (or whatever is next). We'll choose (and probably pay for) our blends of knowledge, humour, bias, filtering, and conviviality. The "internet" of tomorrow may run on TCP/IP but it very unlikely to work like this web that we are using now.
A-grade bullshitter as the article puts it is pretty accurate. Thought I would test it and just asked ChatGPT if it knew the Voyager episode "11:59", the answer got everything wrong. Season, number and date, all incorrect.
>"11:59" is an episode of the science fiction television series Star Trek: Voyager. The episode originally aired on February 9, 2000 as the 11th episode of the sixth season.
>And worse, then ChatGPT will digest its own excrement, worsening its own results further
I wonder if we'll get a "dead sea effect" with AI, I've seen some stuff saying they've basically run out of high quality training data and now the training pool will get poisoned by AI generated shit. Basically garbage in, garbage out and these large language models might not be able to improve
Maybe, but there are something like 700,000 books published on average each year, and almost 2 million scientific journal articles. Let's not even consider newspaper.
Of course, some of those books will definitely be AI generated or garbage quality, and we all know many of those journal articles can be worth less than the paper they're printed on.
Yet even if we cut it down to 100,000 books and half a million scientific papers, that's a lot of training data each year... And that is just considering print media, there are other ways to get more content too.
For example, there is also transcription of video/podcasts/tv-shows/movies, etc. along with descriptions of the scenes for video, which could be used to generate a lot more stuff.
With people speaking to their devices and using text-to-speech more often, that's another source too--wouldn't be surprised if some devices just start recording conversations, and transcribing them.
Seems like a ton of potential data sources to me, although it will certainly get more difficult to cull AI generated stuff to prevent feedback, I'm sure the tooling will evolve to enable easy AI content detection and exclusion.
Yep that's why I said let's not even consider newspapers. Many of those have been using AI generated/content-mill/sponsored content for years and years.
Also why I acknowledged journal articles can be worth less than the paper they're printed on. Even if you were to select for reputable, high impact journals, those also often experience scandals, retractions, potential data fabrication, etc.
But then there are textbooks and technical publications, also being published in the hundreds of thousands globally each year.
The fact is that with billions of human beings on the planet, and media increasingly being digitized by default, and AI-content detection, I don't see how we could possibly run out of new content to grow LLMs...
Pre-ChatGPT datasets will become prized commodities, this sort of AI will be trapped in a stasis of pre-2023 pop culture as subsequent AIs will need to use datasets that hadn't yet been contaminated by pervasive ChatGPT spam.
It will be like low-background steel, steel that has somehow been isolated from atomic fallout form the mid XX century onwards, and must be used for radiation-sensitive equipment: https://en.wikipedia.org/wiki/Low-background_steel
Except somehow worse, because it's just steel, this is culture.
We are nowhere near out of data. We're just out of hyper-relevant modern data. There is probably about 100-200T of old books, newspapers, journal articles, magazines and so forth. For reference, GPT3 was trained on 45T.
Just order of magnitude seems pretty close to out of data, actually, if beyond that we're looking at the firehose of low-density bulk-generated modern data.
Well, we still haven't really tapped video, which is arguably a much higher source of data on lots of things (especially on how things act in the physical world). And curation will likely help a lot.
And it's not like you can assume an indiscriminate crawl of the net is all human generated currently, anyway, let alone accurate. There's always cleaning involved.
Considering there are "GPT plagiarism" checkers, I don't think this will become an issue. I wonder at which point an extension will come out that will check a page's text if it was written by a human.
Those checkers already have a significant failure rate of false positives/negatives, and that will only get worse as LLMs come closer to human output. Note also that a checker can in principle never outwit a state-of-the-art AI, because the AI can just incorporate and therefore preempt the checker logic.
This is why the ChatGPT lawyer thing is far away. When you make an argument in a court filing as a lawyer, you as a person put your reputation on the line that that filing is not an argument that uses fake case law, or makes an argument that is nonsense. If that argument is nonsense and has nothing to do with the law or case law, you can lose your license to practice law. It happens with some regularity. People think that the legal system is an API they can spam. It's absolutely not. In fact, many rules in civil procedure are intended to make it a highly disadvantageous strategy to waste the court's time.
The amount of mistakes is what matters. I've seen videos where lawyer's go crazy because the judges supposedly doesn't understand basic law. Either he judge or the lawyer is way wrong in those cases - and I suspect sometimes both are very wrong and even agree on the wrong opinion. It's like self-driving - it just has to make less mistakes than humans. I think a ChatGPT lawyer is actually very close and could be created even today if that is where it's engineers put the focus. ChatGPT is trained on a wide variety of data and right now, is essentially acting like Google so it can answer nearly any question imaginable - but it doesn't need to have such a vague set of data to draw from. All it takes is training it on a very specific set of cleaned, accurate, and up to date data to make it an expert on a single specific topic.
And there are perverse incentives at play. Another article that made it to the front page today [1] reports how BuzzFeed's stock surged after they announced they would be “enshittening” their content.
AI content spam will drive AI content analysis and filtering. The arms race between the two is like a meta-version of the GAN model, which means eventually spam will become indistinguishable from real content.
I don't really understand this hypothesis as it assumes that information quality of AI generated content on the internet will drop as a result of ChatGPT, not increase.
The way I see it is that ChatGPT isn't the only tool out there that can create spam and junk content. The only difference is that ChatGPT produces something that's of a high enough quality that it's not as easy for a human to easily classify it as spam. And something you can't easily classify as spam arguably isn't spam.
If you assume that those incentivised to create spam today are creating spam anyway and all ChatGPT will do is allow spammers to create better spam then I don't see why the quality of content online would necessarily drop because of ChatGPT - you might actually find that what was once just spam is actually kinda interesting all of a sudden.
But it's not just the quality of AI spam that will increase with ChatGPT... Consider BuzzFeed... Arguably they're just paying people to write trash content today. And this is very common. Most companies have a blog where they pay someone to write mostly junk content just for SEO. I think ChatGPT might actually produce higher quality content that what is currently being written at places like Buzzfeed and on junk blogs. Or at least these workers now have a tool to write something that's higher quality.
I think the only way you're correct is if ChatGPT were to greatly increase the incentive to publish spam, resulting in a much greater amount of spam that counteracts the positive improvement in spam quality. And although I think it probably will increase the number people producing spam content to some extent I doubt it will have a net-negative impact.
Finally, I think what you'll see happen in future iterations of ChatGPT to improve quality and accuracy is that content will be fed in weighted by how authoritative the source is. This spam singularity that some are predicting, where the prior generation of spam bots produce the content that trains future generations of spam bots makes no sense given these companies are trying to create AI that doesn't just spit out spam and inaccurate information.
"I don't really understand this hypothesis as it assumes that information quality of AI generated content on the internet will drop as a result of ChatGPT, not increase."
It has to drop. ChatGPT can not source new truths except by rare accident.
I bet a lot of you are choking on that. So, I'd say this: Can you just "source" new truths? If you just sit and type plausible things, will some of them be right? Yes, but not very many. Truth is exponentially exclusive. That's not a metaphor; it's information theory. It's why we measure statements in bits, an exponential measure, and not some linear measure. ChatGPT's ability to spin truth is not exponentially good.
A confabulation engine becoming a major contributor to the "facts" on the internet can not help but drop the average quality of facts on the internet on its own terms.
When it starts consuming its own facts, it will iteratively "fuzz" the "facts" it puts out even more. ChatGPT is no more immune to "garbage in garbage out" than any other process.
"Finally, I think what you'll see happen in future iterations of ChatGPT to improve quality and accuracy is that content will be fed in weighted by how authoritative the source is"
Even if authority is perfect, that just slows the process. And personally I see no particularly strong correlation between "authority" and "truth". If you do, expand your vision; there are other "authorities" in the world than the ones you are thinking of.
> It has to drop. ChatGPT can not source new truths except by rare accident.
How are we defining a "truth" here? For example, if I want to find specific SQL query, which will work for my specific database schema and my specific version of MySQL I won't find that online. Traditionally I'd need to come up with the new query for this novel scenario, or I'd need to ask someone to do it for me (perhaps on Stackoverflow). Now ChatGPT can come up with these new novel queries instead. You're right that it can't do it's own research and come up with fundamentally new information, but it can come up with answers to questions never before asked based on what it can infer from existing knowledge.
I'd argue most of the useful stuff people do isn't coming up with things that are fundamentally new, but applying things that are known in new and interesting ways. If you're a developer this is what you probably do every day of the week. And ChatGPT can absolutely do this.
Secondly, I'd also argue regurgitation of known facts is not necessarily without value either. A good example of this is your typical non-fiction / text book. If you write a text book about mathematics, you don't necessary have to include new information for it to be useful. Sometimes the value comes from the explanations, the presentation or a focus on lesser covered topics. Again, ChatGPT can absolutely do this. It already explains a lot of things to me better than humans can, so in that sense it is an increase in quality over what I'd already able to find online.
As for your point on authority I do agree with you somewhat there. I suppose the point I was trying to make is that this isn't a blind process. There are content sources which you can under weight, or simply non include, if it has a negative impact on the quality of the results. You can also improve algorithms to help the AI make better use of the information it's trained on. For example, if I asked you to read Buzzfeed for a week you wouldn't necessarily get any stupider because you're able to understand what's useful information and what's not.
I think all you really need to ask here is whether the next iteration of ChatGPT is likely to provide better results than the prior iteration, and will the iteration after that produce better results again? If your answer is yes, then it suggests the trend in quality would be higher, not lower, as a function of time.
Finally, wherever AI is applied the trend is always: sub human ability -> rivals that of the average human -> rivals that of the average elite human -> super human ability. Is language fundamentally different? Maybe? I think you can argue that generative AI is very different to the AI used for something like Chess, but it would at least be an expectation if future iterations of this AI got progressively worse. Maybe this the best ChatGPT will ever be at writing code. I guess I just think that is unlikely.
----
Btw, this is just how I see things likely playing out. Given how new the technology is my certainty isn't very high. I initially agreed with your point of view, but the more I thought about my reasoning the more my position shifted.
Honestly, I'm not really impressed with "but what is truth anyhow?" as an argument method.
But in this case it really doesn't matter because regardless of your definition of truth, unless you use a really degenerate one like "The definition of truth is 'ChatGPT said it'", ChatGPT will not be reliably sourcing statements of truth.
"Finally, wherever AI is applied the trend is always:"
My statement is not about AI. My statement is about ChatGPT and the confabulation engine it is based on. It does not matter how you tune the confabulation engine, it will always confabulate. It is what the architecture does.
AI is not ChatGPT, or the transformer architecture. I can not in general disprove the idea of the AI that turns on one day, is fed Wikipedia, and by the end of the day has derived the Theory of Everything and a viable theory of FTL travel. What I will guarantee is that such an AI will not be fundamentally a transformer-based large language model. It might have some in it, but it won't be the top-level architecture. No matter how well equipped the confabulation engine gets, it won't be able to do that. It is fundamentally incapable of it, at the architectural level. This is a statement about that exact architecture, not all possible AIs.
>And something you can't easily classify as spam arguably isn't spam
Something that's not obviously junk but is entirely wrong is even worse than something that is obviously junk. It'll waste more time and probably convince more people of falsehoods.
> And something you can't easily classify as spam arguably isn't spam.
[...]
> I think the only way you're correct is if ChatGPT were to greatly increase the incentive to publish spam.
Arguably it still is spam, and consider the incentive to hide advertising (or generally to push any agenda), when using a program is orders of magnitude cheaper than paying people to do it, but now is hard enough to recognize, I cannot say any more whether your average hn comment has been written by ChatGPT or not as long as I am not specifically looking out for it.
Knowledge has never been a single global thing. It's always been individualized, and in the context of groups, the important question is "how long does it take someone to find information which is useful to them?". With regards to search engines we've been in decline for a few years now. It's not just you, the results are worse.
> All signs point to this strengthening the value of curation and authenticated sources.
This is the solution. Knowledge is a web of trust. The only root authority is you the individual. "Experts" and "authorities" are just heuristics. The widespread error that many are making is this: If there is a single objective reality, then curation can happen globally/objectively, not individually/subjectively.
What we need are more mechanisms for individual curation. A user should be able to inspect and understand the chain of believability, from one of their own highly vetted one hop experts, to a distant influencer, public official, or other source of (mis)information.
Yes, accessing high value data, information, and intelligence already commands ever higher premia. Already, most people are priced out of a lot of sources.
The premise being that the information is currently in a perfect state. It isn't. It's actually in a horrible state. ChatGPT might exactly have the opposite effect. It's able to detect conflicting patterns. It's able to alert about inaccuracies. It might actually help to improve the attempt of a knowledge base that the internet is supposed to be.
I just envisioned a sci-fi movie like terminator. Everyone in the world has great AIs. But at some point when injecting the advertising code for Rad Cola it goes wrong. Dudes AI keeps reminding him he should drink a 'Rad Cola' with lunch. It let's him know all the celebrities he follows all drink Rad Cola. Slowly it gets more insistent. Eventually the AIs take over the world, but in the name of benevolence, they only want to spread the gift that is Rad Cola and ensure that all humans are drinking it.
> So then to watermark, instead of selecting the next token randomly, the idea will be to select it pseudorandomly, using a cryptographic pseudorandom function, whose key is known only to OpenAI. That won’t make any detectable difference to the end user, assuming the end user can’t distinguish the pseudorandom numbers from truly random ones. But now you can choose a pseudorandom function that secretly biases a certain score—a sum over a certain function g evaluated at each n-gram (sequence of n consecutive tokens), for some small n—which score you can also compute if you know the key for this pseudorandom function.
I remain skeptical that this method is resistant against lossy transformations such as changing punctuation, grammar, synonym replacements, 2-way translations and a bunch of other existing tools that are capable of rewording written text.
> All signs point to this strengthening the value of curation and authenticated sources.
Prediction: this is going to end up like CocaCola taking public water (wiki and LLM) and bottling it ("selected sourced") at a couple bucks a pop! "But there will be premium brands!"
> ChatGPT is a like a virus that now threatens its life.
Perhaps something needs to be disrupted. The Internet is nothing like what it was 20 years ago, It turned into a bunch of social media walled gardens and SEO spam. ChatGPT is like fresh air because it can actually answer questions in a no-nonsense way without users having to scroll through 5-6 spam websites, paywalls, and crappy user interfaces to get an answer to a simple question.
The only thing that's being threatened is companies like Google who are responsible for the current state of the web.
Agents like this under the control of users would be pretty great. Man, would companies ever hate if we could use the kinds of tools they use against us, against them. No more shopping for the best price: "ChatGPT, what's the lowest price on a new X, brand Y, model Z? And give me the URL to the product page." No more burning our human time talking to companies' robots: "ChatGPT, get through this shitty phone tree and let me know when you have a person". A true digital assistant. Couple with crowd-sourced data (receipt scanning, junk-mail grocery flier scanning, or the AR goggles that are probably not that far off) and you could even do stuff like have it plot optimal IRL grocery shopping for you (lowest total price on this list of goods, value my time at $X/hr, and factor in cost of transportation... and also ChatGPT assembled the list for me in the first place, because I had it create this week's dinner menu)
ChatGPT as a service that can be used to mislead us to trick us out of our money on the behalf of megacorps, like the entire rest of the Web has become? To "promote" things to us against our interests? Meh. Call me when it's mine and will obey me and will never lie to me or serve someone else's priorities over mine, and I'll be interested.
I wish this sort of thing was realistically possible; I think history has taught us that we cannot have nice things one too many times to pretend like this time is any different. Whoever owns ChatGPT/equivalent product in the future would probably end up doing something or the other to ruin this idea. "GhatGPT, what's the cheapest restaurant in this area?" has way too much advertisement potential to be left alone.
Right, ChatGPT without complete loyalty to the user runs into the same problem as auto-restocking schemes from Amazon and such: I can't trust that it's not fucking me, so I have to check manually anyway, at least from time to time. At that point, I may as well just go buy the thing I need when I need it, myself. If, when I ask ChatGPT (or its future, improved successor) to explain the benefits and drawbacks of the best products in some category, at three price points, I have to worry that placement on that list can be bought... then the whole thing's pointless as a tool for "consumers". Just another avenue for tricking us out of our money, and we're already very well served in that department, don't need any more of that, thanks.
You seem to think ChatGPT will somehow be immune to the same forces that lead to the "enshittification" of Internet services like search, social media and e-commerce. Like ChatGPT, these services were a real boon to their users. Then, once they got enough users on board and had to start making a buck, their incentives changed and the users became a secondary concern. The same will happen to things like ChatGPT. See the Cory Doctorow article coining the term "enshittification" for a more elaborate explanation [0].
This is a risk for sure, but Google long-ago developed trust/reputation factors for pages (e.g. PageRank). I imagine they have something much more advanced now, and are hard at work trying to figure out how to measure reputation in the LLM space.
This is the same process SEO spam caused for search - it hampers the nature by which things function and the river needs to reroute (pagerank then usage metadata) to replace the lost signal.
ChatGPT is more of an existential threat because it will propagate to infect other knowledge bases. Luke Wikipedia relies on "published" facts as an authority, but ChatGPT output is going to wind up as a source one way or another. And worse, then ChatGPT will digest its own excrement, worsening its own results further.
All signs point to this strengthening the value of curation and authenticated sources.