Hacker News

illiac786 · 2025-06-03T09:24:51 1748942691

You put people in nice little drawers, the skeptics, and the non-skeptics. It is reductive and most of all, it’s polarizing. This is how US politics have become and we should avoid this here.

luffy-taro · 2025-06-03T10:56:01 1748948161

Yeah, putting labels on people is not very nice.

Xmd5a · 2025-06-03T08:35:55 1748939755

What a condescending tone

jrvarela56 · 2025-06-03T10:36:29 1748946989

10 month old account talking like that to the village elder

foldr · 2025-06-03T11:47:51 1748951271

In fairness, the article is a lot more condescending and insulting to its readers than the comment you're replying to.

rvnx · 2025-06-03T08:40:40 1748940040

A LLM is essentially the world information packed into a very compact format. It is the modern equivalent of the Library of Alexandria.

Claiming that your own knowledge is better than all the compressed consensus of the books of the universe, is very optimistic.

If you are not sure about the result given by a LLM, it is your task as a human to cross-verify the information. The exact same way that information in books is not 100% accurate, and that Google results are not always telling the truth.

fennecfoxy · 2025-06-03T09:13:54 1748942034

>LLM is essentially the world information packed into a very compact format.

No, it's world information distilled to various parts and details that training deemed important. Do not pretend for one second that it's not an incredibly lossy compression method, which is why LLMs hallucinate constantly.

This is why training is only useful for teaching the LLM how to string words together to convey hard data. That hard data should always be retrieved via RAG with an independent model/code verifying that the contents of the response are correct as per the hard data. Even 4o hallucinates constantly if it doesn't do a web search and sometimes even when it does.

TheEdonian · 2025-06-03T08:46:47 1748940407

Well let's not forget that it's an opinionated source. There is also the point that if you ask it about a topic it will (often) give you the answer that has the most content about it (or easiest to access information).

illiac786 · 2025-06-03T09:16:40 1748942200

Agree.

I find that, for many, LLMs are addictive, a magnet, because it offers to do your work for you, or so it appears. Resisting this temptation is impossibly hard for children for example, and many adults succumb.

A good way to maintain a healthy dose of skepticism about its output and keep on checking this output, is asking the LLM about something that happened after the training cut off.

For example, I asked if lidar could damage phone lenses. And the LLM very convincingly argued it was highly improbable. Because that recently made the news as a danger for phone lenses, and wasn’t part of the training data.

This helps me stay sane and resist the temptation of just accepting LLM output =)

On a side note, the kagi assistant is nice for kids I feel because it links to its sources.

dale_glass · 2025-06-03T10:54:34 1748948074

LIDAR damaging the lens is extremely unlikely. A lens is mostly glass.

What it can damage is the sensor, which is actually not at all the same thing as a lens.

When asking questions it's important to ask the right question.

illiac786 · 2025-06-03T11:25:35 1748949935

Sorry, I meant the sensor

criley2 · 2025-06-03T11:52:12 1748951532

I asked ChatGPT o3 if lidar could damage phone sensors and it said yes https://chatgpt.com/share/683ee007-7338-800e-a6a4-cebc293c46...

I also asked Gemini 2.5 pro preview and it said yes. https://g.co/gemini/share/0aeded9b8220

I find it interesting to always test for myself when someone suggests to me that a "LLM" failed at a task.

illiac786 · 2025-06-03T12:11:05 1748952665

I should have been more specific, but you missed my point I believe.

I tested this at the time on Claude 3.7 sonnet, which have an earlier cut off date and I just tested again with this prompt: “Can the lidar of a self driving car damage a phone camera sensor?” and the answer is still wrong in my test.

I believe the issue is the training cut off date, that’s my point, LLM seem smart but they have limits and when asked about something discovered after training cut off date, they will sometimes confidently be wrong.

criley2 · 2025-06-03T12:46:31 1748954791

I didn't miss your point, rather I wanted you to realize some deeper points I was trying to make

- Not all LLM are the same, and not identifying your tool is problematic because "LLM's can't do a thing" is very different than "The particular model I used failed at this thing". I demonstrated that by showing that many LLMs get the answer right. It puts the onus of correctness entirely on the category of technology, and not the tool used or the skill of the tool user.

- Training data cutoffs are only one part of the equation: Tool use by LLM's allows them to search the internet and run arbitrary code (amongst many other things).

In both of my cases, the training data did not include the results either. Both used a tool call to search the internet for data.

Not realizing that modern AI tools are more than an LLM with training data, but rather have tool calling, full internet access, and can access and reason about a wide variety of up to date data sources demonstrates a fundamental misunderstanding about modern AI tools.

Having said that:

Claude Sonnet 4.0 says "yes": https://claude.ai/share/001e16f8-20ea-4941-a181-48311252bca0

Personally, I don't use Claude for this kind of thing because while it's proven to be a very good at being a coding assistant and interacting with my IDE in an "agentic" manner, it's clearly not designed to be a deep research assistant that broadly searches the internet and other data sources to provide accurate and up to date information. (This would mean that ai/model selection is a skill issue and getting good results from AI tools is a skill, which is borne out by the fact that I get the right answer every time I try, and you can't get the right answer once).

illiac786 · 2025-06-04T19:45:00 1749066300

Still not getting it I think.

My point is: LLMs sound very plausible and very confident when they are wrong.

That’s it. And I was just offering a trick to help remembering this, to keep checking their output – nothing else.

ivape · 2025-06-03T08:48:38 1748940518

This is pre-Covid HN thread on work from home:

https://news.ycombinator.com/item?id=22221507

It’s eerie. It’s historical. These threads from these past two years about what the future of AI will be will read like ghost stories. Like Rose having flash backs of the Titanic. It’s worth documenting. We honestly could be having the most ominous discussion of what’s to come.

We sit around and complain about dips in hiring, that’s nothing. The iceberg just hit. We’ve got 6 hours left.

ignoramous · 2025-06-03T09:03:45 1748941425

> We sit around and complain about dips in hiring, that’s nothing. The iceberg just hit. We’ve got 6 hours left.

At least we've got hacker news to ourselves, have we not ... https://news.ycombinator.com/item?id=44130743

Pamar · 2025-06-03T09:21:23 1748942483

Partially OT:

Yesterday I asked Chat GPT which was the Japanese Twin City for Venice (Italy). This was just a quick offhand question because I needed the answer for a post on IG, so not exactly a death or life situation.

Answer: Kagoshima. It also added that the "twin status" was officially set in 1965, and that Kagoshima was the starting point for the Jesuit Missionary Alessandro Valignano in his attempt to proselitize Japanese people (to Catholicism, and also about European Culture).

I never heard of Kagoshima, so I googled for it. And discovered it is the twin city of Neaples :/

So I then googled for "Venice Japanese Twin City" and got: Hiroshima. I doublechecked this then I went back to ChatGPT and wrote:

"Kagoshima is the Twin City for Neaples.".

This triggered a websearch and finally it wrote back:

"You are right, Kagoshima is Twin City of Neaples since 1960."

Then it added "Regarding Venice instead, the twin city is Hiroshima, since 2023".

So yeah, a Library of Alexandria that you can count on as long as you have another couple of libraries to doublechek whatever you get from it. Note also that this was very straightforward question, there is nothing to "analyze" or "interpret" or "reason about". And yet the answer was completely wrong, the first date was incorrect even for Neaples (actually the ceremony was in May 1960) and the extra bits about Alessandro Valignano are not reported anywhere else: Valignano was indeed a Jesuit and he visited Japan multiple times, but Kagoshima is never mentioned when you google for him or if you check his wikipedia page.

You may understand how I remain quite skeptical for any application which I consider "more important than an IG title".

rvnx · 2025-06-03T09:27:05 1748942825

Claude 4 Opus:

> Venice, Italy does not appear to have a Japanese twin city or sister city. While several Japanese cities have earned the nickname "Venice of Japan" for their canal systems or waterfront architecture, there is no formal sister city relationship between Venice and any Japanese city that I could find in the available information

I think GPT-4o got it wrong in your case because it searched Bing, and then read only fragments of the page ( https://en.wikipedia.org/wiki/List_of_twin_towns_and_sister_... ) to save costs for processing "large" context

Pamar · 2025-06-03T11:40:43 1748950843

I am Italian, and I have some interest in Japanese history/culture.

So when I saw a completely unknown city I googled it up because I was wondering what it actually had in common with Venice (I mean, a Japanese version of Venice would be a cool place to visit next time I go to Japan, no?).

If I wanted to know, I dunno, "What is the Chinese Twin City for Buenos Aires" (to mention two countries I do not really know much about, and do not plan to visit in the future) should I trust the answer? Or should I go looking it up somewhere else? Or maybe ask someone from Argentina?

My point is that even as a "digital equivalent of the Library of Alexandria" LLM seem to be extremely unreliable. Therefore - at least for now - I am wary about using them for work, or for any other area where I really care for the quality of the result.

richardw · 2025-06-03T10:40:35 1748947235

If I want facts that I would expect the top 10 Google results to have, I turn search on. If I want a broader view of a well known area, I turn it off. Sometimes I do both and compare. I don’t rely on model training memory for facts that the internet wouldn’t have a lot of material for.

40 for quick. 40 plus search for facts. O4-mini high plus search for “mini deep research”, where it’ll hit more pages, structure and summarise.

And I still check the facts and sources to be honest. But it’s not valueless. I’ve searched an area for a year and then had deep research find things I haven’t.

croemer · 2025-06-03T21:06:11 1748984771

o3 totally nailed it first shot. Hiroshima since 2023. Provides authoritative source (Venetian city press release): https://chatgpt.com/share/683f638a-3ce0-8005-91d6-3eb1df9f19...

meowface · 2025-06-03T09:54:10 1748944450

What model?

People often say "I asked ChatGPT something and it was wrong", and then you ask them the model and they say "huh?"

The default model is 4.1o-mini, which is much worse than 4.1o and much much worse than o3 at many tasks.

TeMPOraL · 2025-06-03T10:30:02 1748946602

Yup. The difference is particularly apparent with o3, which does bursts of web searches on its own whenever it feels it'll be helpful in solving a problem, and uses the results to inform its own next steps (as opposed to just picking out parts to quote in a reply).

(It works surprisingly well, and feels mid-way between Perplexity's search and OpenAI's Deep Research.)

Pamar · 2025-06-03T11:31:30 1748950290

I asked "What version/model are you running, atm" (I have a for-free account on OpenAI, what I have seen so far will not justify a 20$ monthly fee - IN MY CASE).

Answer: "gpt-4-turbo".

HTH.

bavell · 2025-06-03T12:21:40 1748953300

Don't ask the model, just look at the model selection drop-down (wherever that may be in your UI)

meowface · 2025-06-03T12:44:13 1748954653

>I have a for-free account on OpenAI, what I have seen so far will not justify a 20$ monthly fee - IN MY CASE

4.1o-mini definitely is not worth $20/month. o3 probably is (and is available in the $20/month plan) for many people.

sethammons · 2025-06-03T11:12:18 1748949138

No, don't think libraries, think "the Internet."

The Internet thinks all kinds of things that are not true.

mavhc · 2025-06-03T11:49:38 1748951378

Just like books then, except the internet can be updated

rvnx · 2025-06-03T11:55:55 1748951755

We all remember those teachers that said that internet cannot be trusted, and that only source of truth is in books.

rsynnott · 2025-06-03T09:22:10 1748942530

Even if this were true (it is not; that’s not how LLMs work), well, there was a lot of complete nonsense in the Library of Alexandria.

rvnx · 2025-06-03T09:58:35 1748944715

It's a compressed statistical representation of text patterns, so it is absolutely true. You lose information during the process, but the quality is similar to the source data. Sometimes even above, as there is consensus when information is repeated across multiple sources.

brahma-dev · 2025-06-03T10:48:36 1748947716

It's amazing how it goes from all the knowledge in the world to ** terms and conditions apply, all answers are subject to market risks, please read the offer documents carefully.........

jgrahamc · 2025-06-03T10:13:10 1748945590

As someone who has followed Thomas' writing on HN for a long time... this is the funniest thing I've ever read here! You clearly have no idea about him at all.

tptacek · 2025-06-03T18:56:45 1748977005

Especially coming from you I appreciate that impulse, but I had the experience of running across someone else the Internet (or Bsky, at least) believed I had no business not knowing about, and I did not enjoy it, so I'm now an activist for the cause of "people don't need to know who I am". I should have written more clearly above.

jgrahamc · 2025-06-04T08:29:21 1749025761

That is a very good cause!

exe34 · 2025-06-03T08:28:52 1748939332

One would hope the experience leads to the position, and not vice-versa.

rfrey · 2025-06-03T12:41:02 1748954462

... you think tptacek has no expertise in cryptography?

wickedsight · 2025-06-03T08:28:52 1748939332

That is no different from pretty any other person in the world. If I interview people to catch them on mistakes, I will be able to do exactly that. Sure, there are some exceptions, like if you were to interview Linus about Linux. Other than that, you'll always be able to find a fluke in someone's knowledge.

None of this makes me 'snap out' of anything. Accepting that LLM's aren't perfect means you can just keep that in mind. For me, they're still a knowledge multiplier and they allow me to be more productive in many areas of life.

tecleandor · 2025-06-03T08:56:21 1748940981

Not at all. Useful or not, LLMs will almost never say "I don't know". They'll happily call a function to a library that never existed. They'll tell you "Incredible idea! You're on the correct path! And you can easily do that with so and so software", and you'll be like "wait what, that software doesn't do that", and they'll answer "Ah, yeah, you're right, of course."

ignoramous · 2025-06-03T09:08:15 1748941695

TFA says, hallucinations is why "gyms" will be important: Language tooling (compiler, linter, language server, domain-specific static analyses etc) that feed back into the Agent, so it'll know to redo.

rvnx · 2025-06-03T09:13:34 1748942014

Sometimes asking in a loop: "are you sure ? think step-by-step", "are you sure ? think step-by-step", "are you sure ? think step-by-step", "are you sure ? think step-by-step", "verify the result" or similar, you may end up with "I'm sure yes", and then you know you have a quality answer.

yujzgzc · 2025-06-03T09:04:05 1748941445

No there are many techniques now to curb hallucinations. Not perfect but no longer so egregiously overconfident.

ninkendo · 2025-06-03T10:21:51 1748946111

…such as?

rvnx · 2025-06-03T09:11:48 1748941908

The most infuriating are the emojis everywhere