Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

How do you suppose an LLM can cite it's sources when it doesn't have one?! It's a language model, not an encyclopedia. The LLM doesn't even get to choose what it outputs - it just gives next word probabilities and one of those is selected AT RANDOM by the sampler.

So, maybe words 1-3 of the LLMs answer are some common turn of speech that was predicted by 1000s of samples, word 4 came from 4chan (a low probability random pick from the sampler), and word 5 was hallucinated. So, what's the "source" for this "fact"?





This is transparently untrue. Gemini reliably produces links (both inline and at the paragraph level, and most of the time summarizes them correctly. This has been publically available for quite a while now.

The word reliably is doing a lot of work here. I was using one of the bigger llms (honestly I can't remember which one) after they started putting citations into their responses. I thought this is great now I can look up the actual source if I need more I depth understanding...

Well a couple of prompts later after I asked it some details about some signal processing algorithm, it tells me "for more in discussion of the algorithm look at citation a (a very general dsp book that likely did not cover the specific topic in depth) or the special issue on [topic of my question] in IEEE journal of X"

So I think "great there's a special issue on this topic" that's just what I need. A quick Google does not result in anything so I prompt the AI, "Can you provide a more specific reference to the special issue in...". The answer: "There is no special issue on [topic]...". So llm s make up citations just as they make up everything else.


I asked Claude to translate a book title from Hebrew (well not translate exactly but locate the original English title of the same book).

That's not a language I speak or generally have anything else to do with.

I then asked it an unrelated question about a science topic and it returned something with a citation. When I clicked on the citation, not only was it not relevant to the science question it claimed it was cited to support, it was basically a conspiracy theory from the 1970s about Jews controlling the media.

Which somehow seems even worse than my usual experience of the link being totally made up dead end.


Reminds me of Gellman amnesia but for LLMs

Seems apt because people's relationship with journalists and facts seem to be about the same - most people take it at face value and SMEs decry poor reporting


That's not the type of citation they're talking about. Gemini uses a tool call to the Google search engine and thus can cite and read proper links. You're talking about an LLM that just hallucinates citations which don't exist.

Is Gemini the same thing that shows up in google search AI box? Because that thing is wrong all the time.

Just the other day I was searching for some details about the metal graphics api language, and something weird caught my eye as I scrolled past the AI stuff. Curious, I engaged, asking more basic questions and they were just.. wrong. Even right now, “what is the default vertex winding order in Metal?” is wrong. Or how about “does metal use a left or right handed coordinate system for the normalized device coordinates?”. I mean this is day one intro level stuff, and easily found on Apple’s dev site.

And the “citations” are ridiculous. It references some stack overflow commentary or a Reddit thread where someone asks a similar question. But the response is “I don’t know about Metal, but Vulcan/D3D use (something different)”. Seriously wtf.

GPT4 gives the same wrong answers with almost the same citations. GPT5 gets it right, for at least the examples above.

Either way, it’s hard to trust it for things you don’t know, when you can’t for things you do.


No it's gemini.google.com

Then what is the llm that shows up at the top of google search queries?

Maybe it's Gemini, maybe it's another one of their models, but I'm specifically talking about LLMs like Gemini, or, if you want a better example, Perplexity, which crawls web pages first and then cites them, so that there aren't bogus citations.

Oops, well it felt nice to vent anyway.

A while back I heard "hallucinate and verify" as a good method. The LLM makes up some stuff, then uses RAG to double check it (in Gemini's case, Google, in everyone else's case, a DDoS).

Gemini is an LLM with toolcalls (including tools that, approximately, perform a google searchs and read the top results)

Not all chatbots are LLMs with toolcalls, and LLMs are perfectly capable of answering without using such toolcalls (and sometimes perform better).


Perhaps this is a distinction between:

1. "Having sources" because there's something following a logical reasoning process with a knowledge graph.

2. "Having sources" because a hyper-mad-libs hallucinatory engine predicted desirable text which was introduced earlier in the document.

We can reduce the chances of humans getting a #2 hallucination that they object-to, but stochastic whack-a-mole doesn't convert it to a #1 mechanism.


Not true. In so many cases, the "links" that LLMs come up with are either irrelevant or non-existent. The links have the same lack of reliability as the rest of their answers, or worse.

That's a load bearing "most of the time"

I don't mind in that I'm not expecting perfection; I'm happy to be able to track down a source quicker than I could digging through forum queries or whatever. It's about what I would hope for from a moderately competent intern.

Maybe it can do it, but it is certainly not guaranteed. Just this month I've asked Gemini 2.5 Pro to "explain to me topic _ in deep technical detail". It produced a decent text, but with zero references or links, despite this topic is being a public open standard. Since I needed text and not knowledge, it was fine for me, I've verified data myself. But a person looking to learn from this techno-parrot would be hoping it gets lucky and not fed with too much llm-slop.

so "most of the time" they are facts?

the llm itself does not do that, the web search tool does that

The fancy online models can produce links for you. They might get the summary wrong, but they’ve got a link, you can follow it and check it out.

In this context they are more like conversational search engines. But that’s a pretty decent feature IMO.


If the output came from RAG (search) rather than the model itself, then a link is possible, but not if the model just generated the sequence of words by itself.

Note too that these models can, and do, make up references. If it predicts a reference is called for, then it'll generate one, and to the LLM it makes no difference if that reference was something actually in the training data or just something statistically plausible it made up.


They also search online and return links, though? And, you can steer them when they do that to seek out more "authoritative" sources (e.g. news reports, publications by reputable organizations).

If you pay for it, ChatGPT can spend upwards of 5 minutes going out and finding you sources if you ask it to.

Those sources can than be separately verified, which is up to the user - of course.


Right, but now you are not talking about an LLM generating from it's training data - you are talking about an agent that is doing web search, and hopefully not messing it up when it summarizes it.

Yes, because most of the things that people talk about (ChatGPT, Google SERP AI summaries, etc.) currently use tools in their answers. We're a couple years past the "it just generates output from sampling given a prompt and training" era.

It depends - some queries will invoke tools such as search, some won't. A research agent will be using search, but then summarizing and reasoning about the responses to synthesize a response, so then you are back to LLM generation.

The net result is that some responses are going to be more reliable (or at least coherently derived from a single search source) than others, but at least to the casual user, maybe to most users, it's never quite clear what the "AI" is doing, and it's right enough, often enough, that they tend to trust it, even though that trust is only justified some of the time.


The models listed in the quote have this capability, though, they must be RAGs or something.

RAG is a horrible term for agentic search. Please stop using it.

And, don’t argue with me about terms. It literally stands for retrieval (not store or delete or update) augmented generation. And as generation is implied with LLMs it really just means augmenting with retrieval.

But if you think about it the agent could be augmented with stores or updates as well as gets, so that’s why it’s not useful, plus nobody I’ve seen using RAG diagrams EVER show it as an agent tool. It’s always something the system DOES to the agent, not the agent doing it to the data.

So yeah, stop using it. Please.


What if you just read it a Retrieval AGent? It isn’t the conventionally accepted definition but it fits and it might make you happier.

If a plain LLM, not an agent, invokes a tool then that can still be considered as RAG. You seem to be thinking of the case where an agent retrieves some data then passes it to an LLM.

A year ago there were links to things that didnt exist. Has that changed?

I’m sure it is possible to get a model to produce a fake URL, but it seems like ChatGPT has some agentic feature where it actually searches in a search engine or something, and then gives you the URLs that it found.

It's selecting a random word from a probability distribution over words. That distribution is crafted by the LLM. The random sampler is not going to going to choose a word with 1e-6 probability anytime soon. Besides with thinking models, the LLM has the ability to correct itself so it's not like the model is at the mercy of a random number generator

You can reductionistically do the same to claim that the mesh of charged gel tubes in our brain is just spasming our muscles when humans type words in a computer.

Whether LLM are good or not, liars or not hardly depends on them being implemented on random black boxes algorithms becouse you could say the same of our brains.


The point is that the statement "LLMs should just cite their sources, what's the problem" is nonsensical, and the reason it's nonsense has to do with how LLMs actually work.

Citing sources is not a magic that makes what you say true, it just makes statement more easily falsifiable.

LLMs can cite sources as well as any human, that is with a non-trivial error rate.

LLMs are shit for a lot of things but the problems are with the quality of the output whether they work by magic, soul-bending, matrix multiplication, or whatever is irrelevant.


LLMs can fabricate phony citations

Like Gemini does




Consider applying for YC's Winter 2026 batch! Applications are open till Nov 10

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: