Hacker News new | past | comments | ask | show | jobs | submit login

they'd have to get ChatGPT to stop spewing up made up bullshit and fake academic citations first, which I think is a bit of a bigger problem than many seem to be considering. the outcome will still be hilarious though, so it's win win for the onlookers I guess

edit: all the responses here who think the garbage is "just fine, at least it's not SEO" are similarly hilarious. as though SEO would not be polluting the AI as well




This is a commonly echoed complaint but it’s largely without merit. ChatGPT spews nonsense because it has no access to information outside of its training set.

In the context of a search engine, single shot learning with the top search results should mitigate almost all hallucination.


Not if it's only looking at the top search results. Those are very often terrible sources of information.


Something like searchQuery+=“ reddit” should do the trick.


I’ve never understood this meme - Reddit only works for product reviews, and only if you ignore astroturfing.

Who goes to Reddit for factual information? It’s definitely not Wikipedia, it’s more like an op-ed.


Parent is just joking. `site:reddit.com` is not used for searching facts, but for finding opinions, reviews, personal experiences, and instructions.


I‘m not completely serious of course, but with all the p-hacking and irreproducibility going on, is that such a big problem? At least people will double-check results then.


I agree completely. I'm as skeptical as anyone else about AI/ML-but IMHO, the criticism that "now we won't know what's true" is a tacit admission that we've already given up on critical thinking. In this regard, I honestly think ChatGPT et al. has potential to be a net positive, by virtue of its incorrectness: Maybe it'll force people to question things that they wouldn't have felt the need to otherwise.


It'll get pretty tedious to have to check every little factoid spewed by an LLM. At some point you'll get used to it and stop checking. It would be interesting to see the distribution of how long this takes for a population sample.


I‘d argue that you need to fact check every little piece of info on the internet anyway.


People have been grounding LLMs with internet searches for a few months:

WebGPT: Browser-assisted question-answering with human feedback https://arxiv.org/abs/2112.09332


From the abstract:

> This model's answers are preferred by humans 56% of the time to those of our human demonstrators, and 69% of the time to the highest-voted answer from Reddit.

These are very interesting metrics.


I kind of think this isn't as much of an issue as we assume. It just has to be superior to the first page of Google results which nowadays is an infested pile of SEO garbage, direct advertisments and a single Wikipedia link.


Read the link in the topic... It literally now doesn't made up bullshit. It's a modified version of ChatGPT.

> Unlike ChatGPT, the new Bing can also retrieve news about recent events. In The Verge’s demos, the search engine was even able to answer questions about its own launch, citing stories published by news sites in the last hour.


> they'd have to get ChatGPT to stop spewing up made up bullshit and fake academic citations firs

Why? The alternative is reading your way through SEO spam (and soon: reading yourself through ChatGPT-generated SEO spam), which is often just as wrong. Source: am working for SEOs.


Yes - I’m more hopeful of LLMs (or whatever replaces them over time) getting more truthful and accurate sooner than we get rid of SEO spam. I would also imagine SEO spam gets worse soon with LLMs.


The incentives are very different. With SEO spam, you have a direct monetary incentive. More spam = more clicks = more money. With LLMs extracting information and answering questions without actually sending the user to the site ... what's the incentive to create low-accuracy content to be ingested, beyond "I want to mess with the data set"?

I'm sure there will be a few people going down that route, but that needs a lot of intrinsic motivation. It's a very different beast vs having a multi-billion dollar market draw in millions of interested parties, and you can concentrate on identifying the saboteurs and filtering them out, which feels easier, especially because it's hard to impossible to identify "shadow-banning".


On that note where are they extracting new information from? If sites are never visited people will stop creating them


Wait and see, it seems this is not the exact same model than ChatGPT and it is able to cite sources


OK, we've waited...six days! and we have this:

https://news.ycombinator.com/item?id=34775853




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: