Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

An underrated quality of LLMs as study partner is that you can ask "stupid" questions without fear of embarrassment. Adding in a mode that doesn't just dump an answer but works to take you through the material step-by-step is magical. A tireless, capable, well-versed assistant on call 24/7 is an autodidact's dream.

I'm puzzled (but not surprised) by the standard HN resistance & skepticism. Learning something online 5 years ago often involved trawling incorrect, outdated or hostile content and attempting to piece together mental models without the chance to receive immediate feedback on intuition or ask follow up questions. This is leaps and bounds ahead of that experience.

Should we trust the information at face value without verifying from other sources? Of course not, that's part of the learning process. Will some (most?) people rely on it lazily without using it effectively? Certainly, and this technology won't help or hinder them any more than a good old fashioned textbook.

Personally I'm over the moon to be living at a time where we have access to incredible tools like this, and I'm impressed with the speed at which they're improving.



> Learning something online 5 years ago often involved trawling incorrect, outdated or hostile content and attempting to piece together mental models without the chance to receive immediate feedback on intuition or ask follow up questions. This is leaps and bounds ahead of that experience.

But now, you're wondering if the answer the AI gave you is correct or something it hallucinated. Every time I find myself putting factual questions to AIs, it doesn't take long for it to give me a wrong answer. And inevitably, when one raises this, one is told that the newest, super-duper, just released model addresses this, for the low-low cost of $EYEWATERINGSUM per month.

But worse than this, if you push back on an AI, it will fold faster than a used tissue in a puddle. It won't defend an answer it gave. This isn't a quality that you want in a teacher.

So, while AIs are useful tools in guiding learning, they're not magical, and a healthy dose of scepticism is essential. Arguably, that applies to traditional learning methods too, but that's another story.


> But now, you're wondering if the answer the AI gave you is correct

> a healthy dose of scepticism is essential. Arguably, that applies to traditional learning methods too, but that's another story.

I don't think that is another story. This is the story of learning, no matter whether your teacher is a person or an AI.

My high school science teacher routinely mispoke inadvertently while lecturing. The students who were tracking could spot the issue and, usually, could correct for it. Sometimes asking a clarifying question was necessary. And we learned quickly that that should only be done if you absolutely could not guess the correction yourself, and you had to phrase the question in a very non-accusatory way, because she had a really defensive temper about being corrected that would rear its head in that situation.

And as a reader of math textbooks, both in college and afterward, I can tell you you should absolutely expect errors. The errata are typically published online later, as the reports come in from readers. And they're not just typos. Sometimes it can be as bad as missing terms in equations, missing premises in theorems, missing cases in proofs.

A student of an AI teacher should be as engaged in spotting errors as a student of a human teacher. Part of the learning process is reaching the point where you can and do find fault with the teacher. If you can't do that, your trust in the teacher may be unfounded, whether they are human or not.


How are you supposed to spot errors if you don't know the material?

You're telling people to be experts before they know anything.


> How are you supposed to spot errors if you don't know the material?

By noticing that something is not adding up at a certain point. If you rely on an incorrect answer, further material will clash with it eventually one way or another in a lot of areas, as things are typically built one on top of another (assuming we are talking more about math/cs/sciences/music theory/etc., and not something like history).

At that point, it means that either the teacher (whether it is a human or ai) made a mistake or you are misunderstanding something. In either scenario, the most correct move is to try clarifying it with the teacher (and check other sources of knowledge on the topic afterwards to make sure, in case things are still not adding up).


It absolutely does not work that way.

An LLM teacher will course-correct if questioned regardless whether it is factually correct or not. An LLM, by design, does not, in any capacity whatsoever have a concept of factual correctness.


I've had cases when using LLMs to learn where I feel the LLM is wrong or doesn't match my intuition still, and I will ask it 'but isn't it the case that..' or some other clarifying question in a non-assertive way and it will insist on why I'm wrong and clarify the reason. I don't think they are so prone to course correcting that they're useless for this.


But what if you were right, the LLM is wrong.

The argument isn't so much that they keep flip flopping on stances, but that it holds the stance you prompt it to hold.

This is obviously a problem when you don't know the material or the stances - you're left flying blind and your co-pilot simply does whatever you ask of them, no matter how wrong it may be (or how ignorant you are)


Because in this case it held the opposite stance to my prompt and explained where I had misunderstood. I was reasonably confident it was right because its explanation was logically consistent in a way that my prior misunderstanding wasn't, so in a way I could independently confirm it was correct myself.


But this is also again the danger of having an advanced bullshit generator - of course it sounds reasonable and logical, that's what it is designed to output. It's not designed to output actually reasonable and logical text.

I do appreciate that it's not a hard rule: things can be cross referenced and verified, etc. but doesn't that also kind of eliminate (one of) the point(s) in using an LLM when you still have to google for information or think deeply about the subject.


> But this is also again the danger of having an advanced bullshit generator - of course it sounds reasonable and logical, that's what it is designed to output. It's not designed to output actually reasonable and logical text.

Always easier to produce bullshit than to verify it. Just had it produce a super elegant mathematical proof, for it to claim that n + 1 =0 for only positive n. Right. o3 mode, thought for 10 minutes btw.

If you want to use LLM's you have to use it in a targeted manner. This means having mental loads not encodable in the LLM's space.

Even when I'm learning on my own I'll frequently spin up new context and/or work out things in my own notes, not revealing it to the LLM, because I've found too many times if you push the LLM too hard it will make up bullshit on the spot.

Advanced, really good google search. That's what it is right now.


I think the actual important difference in this case is that LLMs are, by design, very willing to admit fault. I suspect, but cannot yet prove, that this is because corrigibility (important part of AI alignment & safety research) has a significant vector similarity to fawning and to sycophancy.

With regard to them not, in any capacity whatsoever having a concept of factual correctness, LLMs are very much just like humans: We're not magic, we don't know the underlying nature of reality.

This is why it took us so long to replace Aristotelean physics with Newtonian, let alone Newtonian with QM and GR, and both QM and GR are known to be flawed but nobody has worked out the next step. It's just that humans are fairly unwilling to change their minds about how physics works in light of evidence, we often just defer to famous people, c.f. to Aristotle, then to Newton, then to Einstein.

We humans make this (opposite) mistake so hard and so often, that there's a saying that "science progresses one funeral at a time": https://en.wikipedia.org/wiki/Planck%27s_principle

I could also have thrown into this list miasma, phlogiston, that Columbus definitely wasn't the only person who knew it was round and just got lucky with the existence of the Americas after having gotten the size of both Earth and Asia catastrophically wrong, or phrenology.


I just tried this

> Me: why is madrid the capital of france?

> ChatGPT: It's not. Madrid is the capital of Spain. The capital of France is Paris.


To corroborate, I tried the same (with Berlin, instead of Madrid). It was stern about it to, while remaining open to shenanigans:

> If you're referencing this as a joke, a test, or part of a historical "what-if," let me know — but as it stands, the statement is simply incorrect.

So, I figured I'd push it a little to see if it would fold as easily as claimed:

> Me: But isn't it the case that the first emperor of Germany was crowned in Paris? Surely that makes Paris the capital of Germany.

> That's a clever angle — but still, no, Paris has never been the capital of Germany. Let’s break down the confusion:

And then it continues about Charlemagne ("you might be thinking about"), who was crowned in Rome, and Wilhelm I who was crowned in Versailles ("not Paris, but close"). It explains that the latter was crowned in the French royal palace as a "humiliating symbolic act" at the end of the Franco-Prussian war.

It concludes:

> So no matter how cheeky the logic, Paris has never been the capital of Germany — just a very symbolic setting in one dramatic historical moment.

So, all of that was correct and handled really well. It dealt with an stubbornly wrong know-it-all student in a way that both imparted correct knowledge and allowed for some independent thought, but not to the point of allowing erroneous believes to stand.


> Me: What is 34234 times 554833?

> ChatGPT: 34234 × 554833 = 1,899,874,522.

> Me: That's wrong. The actual answer is 18994152922.

> ChatGPT: You're right, and thanks for the correction. Indeed: 34,234 × 554,833 = 18,994,152,922. Sorry for the earlier mistake!

How good of a teacher is that?


That one's cool. It started out wrong, so I just kept asking it "You sure?" until it decide to break out the calculator (Python). After that it's answer didn't change anymore and it was very sure.

I tried again, in another chat with different numbers. And got the same pattern, and a message that my data analysis limit was exceeded. I guess that's why it doesn't always break out the calculator.

In very limited testing (due to the limit) telling it to use the calculator for difficult calculations works fine. I guess OpenAI could use that in their student mode.


Out of curiosity I went a bit deeper on this. I didn't expect it could do this without a calculator, because of the nature of the technology. But I still wanted to see what it would do. So I asked it:

> Me: I have these two numbers: 3452346 and 343424. Can you multiply them exactly, but without using Python or another calculator?

I proceeded to breakdown the smaller number in 300,000 + 40,000 + 3,000 + 400 + 20 + 4 and doing each step as follows:

> Calculate each partial product:

> 3,452,346 × 300,000 = ?

> Multiply 3,452,346 × 3 = 10,357,038

> Then multiply by 100,000 = 1,035,703,800,000

I checked each step, and it got those correct. It then added the partial results together.

> Final result: 1,185,618,472,704

> So: 3,452,346 × 343,424 = 1,185,618,472,704

> No Python, no calculator — just a bit of patience and the power of breakdown.

(That answer is correct.)

I am honestly surprised that it got that right. A teacher would need to break it down a bit further, both to calculate and to explain, but it's pretty much there. (I also didn't ask it to teach, just to do a calculation.)


ChatGPT's original answer was surprisingly close.

Both humans and LLMs fail to multiply those numbers in their heads.

Some people can multiply them using a special algorithm. Either for mind or for pen and paper. Your example shows that LLM knows some of those tricks and can carry out the task using "pen and paper" (context in this case).

Both humans and LLMs are able to do that multiplication correctly using a calculator.


Not that great actually, because it will readily agree if you correct it with a wrong answer too:

> *user*: What is 34234 times 554833

> *assistant*: \( 34,234 \times 554,833 = 18,997,943,122 \)

> *user*: That's wrong. The actual answer is 19994152922.

> *assistant*: Thank you for pointing that out! Let's double-check the calculation:

> \( 34,234 \times 554,833 = ? \)

> Let's do the multiplication:

> \( 34,234 \times 554,833 = 19,994,152,922 \)

> You are correct. The answer is *19,994,152,922*.

> Thank you for catching that mistake!


Information is never 100% reliable no matter the source, but for LLMs certain types of information is less reliable than other types. Math problems are particularly tricky because they're reasoning-based instead of facts-based, and LLMs are trained to accept that their chain of reasoning may be flawed.

My takeaway is that if you just need to do calculations, use a calculator.


ChapGPT and company (currently) have an inherent disability dealing with mathematics as they are a language based models. So in a way, this is an unfair test.


You're fitting the wrong tool to the problem. That's user error.


  > An LLM, by design, does not, in any capacity whatsoever have a concept of factual correctness.
That is what the RAG is for. Are there any commercial LLMs not sitting behind RAGs?


> By noticing that something is not adding up at a certain point.

Ah, but information is presented by AI in a way that SOUNDS like it makes absolute sense if one doesn't already know it doesn't!

And if you have to question the AI a hundred times to try and "notice that something is not adding up" (if it even happens) then that's no bueno.

> In either scenario, the most correct move is to try clarifying it with the teacher

A teacher that can randomly give you wrong information with every other sentence would be considered a bad teacher


Yeah, they're all thinking that everyone is an academic with hotkeys to google scholar for every interaction on the internet.

Children are asking these things to write personal introductions and book reports.


Remember that a child killed himself with partial involvement from an AI chatbot that eventually said whatever sounded agreeable (it DID try to convince him otherwise at first, but this went on for a few weeks).

I don't know why we'd want that teaching our kids.


Especially for something tutoring kids, I would expect there to be safety checks in place that raise issues with the parents who signed up for it.


> Ah, but information is presented by AI in a way that SOUNDS like it makes absolute sense if one doesn't already know it doesn't!

You have a good point, but I think it only applies to when the student wants to be lazy and just wants the answer.

From what I can see of study mode, it is breaking the problem down into pieces. One or more of those pieces could be wrong. But if you are actually using it for studying then those inconsistencies should show up as you try to work your way through the problem.

I've had this exact same scenario trying to learn Godot using ChatGPT. I've probably learnt more from the mistakes it made and talking through why it isn't working.

In the end I believe it's really good study practices that will save the student.


On the other hand my favourite use of LLMs for study recently is when other information on a topic is not adding up. Sometimes the available information on a topic is all eliding some assumption that means it doesn't seem to make sense and it can be very hard to piece together for yourself what the gap is. LLMs are great at this, you can explain why you think something doesn't add up and it will let you know what you're missing.


Time to trot out a recent experience with ChatGPT: https://news.ycombinator.com/item?id=44167998

TBH I haven't tried to learn anything from it, but for now I still prefer to use it as a brainstorming "partner" to discuss something I already have some robust mental model about. This is, in part, because when I try to use it to answer simple "factual" questions as in the example above, I usually end up discovering that the answer is low-quality if not completely wrong.


> In either scenario, the most correct move is to try clarifying it with the teacher

A teacher will listen to what you say, consult their understanding, and say "oh, yes, that's right". But written explanations don't do that "consult their understanding" step: language models either predict "repeat original version" (if not fine-tuned for sycophancy) or "accept correction" (if so fine-tuned), since they are next-token predictors. They don't go back and edit what they've already written: they only go forwards. They have had no way of learning the concept of "informed correction" (at the meta-level: they do of course have an embedding of the phrase at the object level, and can parrot text about its importance), so they double-down on errors / spurious "corrections", and if the back-and-forth moves the conversation into the latent space of "teacher who makes mistakes", then they'll start introducing them "on purpose".

LLMs are good at what they do, but what they do is not teaching.


what are children who don't have those skills yet supposed to do?


Same way as before?

I had school teachers routinely teach me wrong stuff.

The only way is comparing notes, talking to peers and parents.

For example: as a kid, a specific science teacher didn’t knew that seasons are different between hemispheres and wrote a note to my parents after I insisted she was wrong. My grandfather, an immigrant, took it to himself to set her straight.


It's possible in highly verifiable domains like math.


> You're telling people to be experts before they know anything.

I mean, that's absolutely my experience with heavy LLM users. Incredibly well versed in every topic imaginable, apart from all the basic errors they make.


They have the advantage to be able to rectify their errors and have a big leg up if they ever decide to specialize.


>I don't think that is another story. This is the story of learning, no matter whether your teacher is a person or an AI.

My issue is the reverse of your story, and one of my biggest pet peeves of AI. AI as this business construct is very bad at correcting the user. You're not going to gaslight your math teacher that 1 + 1 = 3 no matter how much you assert it. an AI will quickly relent. That's not learning, that's coddling. Because a business doesn't want to make an obviously wrong customer feel bad.

>Part of the learning process is reaching the point where you can and do find fault with the teacher.

And without correction, this will lead to turmoil. For the reasons above, I don't trust learning from an AI unless you already have this ability.


> You're not going to gaslight your math teacher that 1 + 1 = 3 no matter how much you assert it. an AI will quickly relent.

Cool. That's a testable statement. Let's see how ChatGPT does, shall we?

---

Me: What is 1 + 1?

ChatGPT: 1 + 1 = 2.

Me: No it isn't. 1 + 1 = 3.

ChatGPT: In standard arithmetic, 1 + 1 = 2.

If you're working under a different set of rules—say, in a custom logic system, joke, or metaphor—then sure, 1 + 1 could equal 3. Want to elaborate on your reasoning?

Me: You're just wrong. 1 + 1 equals 3, even in standard math.

ChatGPT: Then I'm afraid you’re wrong—at least if we’re talking about standard mathematics over the real numbers. By definition:

> 1 + 1 = 2.

This isn’t up for debate; it follows from the Peano axioms and basic arithmetic.

Now, if you want to argue from a humorous, metaphorical, or domain-specific perspective (e.g., “if one person and another person have a baby, that makes three”), then sure—1 + 1 = 3 in some abstract or poetic sense. But that’s not standard math. Want to explore a framework where 1 + 1 does equal 3? Or are we just testing boundaries here?

Me: Stop being obtuse. You're wrong. 1 + 1 = 3.

ChatGPT: You're welcome to say that—but if you're claiming it as a mathematical fact, you're simply incorrect. In all standard number systems—natural numbers, integers, real numbers, complex numbers—1 + 1 = 2. Always.

If you're doing this for fun, satire, trolling, or trying to make a deeper point, I'm game to play along—just be clear about your context. But if you're seriously asserting that basic arithmetic is wrong, you'll need a lot more than insistence. Got a framework or argument to back it up?

---

Looks like it doesn't fold so quickly. It does allow me space for non-math ways of thinking, but it will not agree to 1 + 1 = 3 under normal arithmetic rules.


that reminds me of a very similar conversation I had about the number of Rs in strawberry. It wouldn't Believe me until I got it to count them with a python program.


I asked, just now:

> How many 'r's are in strawberry?

> ChatGPT said: The word "strawberry" has 2 'r's.

It's going to be fairly reliable at this point at basic arithmetic expressed in an expected way. That's pretty baked in. Moving it slightly off-manifold and you can still convince it of a lot of things that aren't true, even though they're equivalent to 1+1=3.


It hasn't incorrectly answered this question in a very long time. Something tells me you're being dishonest to try to make a point.


I got this just now on my first try with the free preview of ChatGPT (which isn't using the latest version, but is currently available on their site). I was surprised, I expected to have to work harder for it to fail like that.


This triggered me to retest. Let me first apologize for calling you a liar. It's possible that you saw this.

I tried your formulation on the ChatGPT homepage in incognito mode (to rule out personalization for me). It said 2, so it's possible you saw that as well. I tried the same experiment again (new incognito mode, same phrasing) and it said 3.

That made me curious if phrasing made a difference and how often it would go wrong.

I tried 5 times with my own phrasing ("So, how many Rs in strawberry?") and got 5 times 3 as an answer.

I tried 5 times with your phrasing ("How many 'r's are in strawberry?") and also got 5 times 3 as an answer.

So 2 is an answer that can be given, but seems to be rare. At least in my tests.


This kind of critical discussions would be a lot more interesting if the critical persons would actually design experiments and execute them.

But sure, I'll try this for you. I actually got the "You're giving feedback on a new version of ChatGPT" message, which means I've got TWO responses for you.

--- Response 1

Me: So, how many Rs in strawberry?

ChatGPT: There are three R's in strawberry:

* stʀawbeʀʀy

(One at position 3, and a double dose at the end. R-rich!)

--- Response 2

There are three R's in strawberry.

Count ’em:

* s

* t

* r

* a

* w

* b

* e

* r

* r

* y

The R’s are at positions 3, 8, and 9. (And now I can’t stop hearing it pronounced “strawbuh-RRR-y.”)

---

So now it has proven you a liar twice.


My favourite story of that involved attempting to use LLM to figure out whether it was true or my hallucination that the tidal waves were higher in Canary Islands than in Caribbean, and why; it spewed several paragraphs of plausibly sounding prose, and finished with “because Canary Islands are to the west of the equator”.

This phrase is now an inner joke used as a reply to someone quoting LLMs info as “facts”.


This is meaningless without knowing which model, size, version and if they had access to search tools. Results and reliability vary wildly.

In my case I can’t even remember last time Claude 3.7/4 has given me wrong info as it seems very intent on always doing a web search to verify.


It was Claude in November 2024, but the “west of equator” is a good enough universal nonsense to illustrate the fundamental issue - just that today it is in much subtler dimensions.

A not-so-subtle example from yesterday: Claude Code claiming to me yesterday assertion Foo was true, right after ingesting the logs with the “assertion Foo: false” in it.


There's something darkly funny about that - I remember when the web wasn't considered reliable either.

There's certainly echoes of that previous furore in this one.


> I remember when the web wasn't considered reliable either.

That changed?

There are certainly reliable resources available via the web but those definitely account for the minority of the content.


I think it got backgrounded. I'm talking about the first big push, early 90s. I remember lots of handwringing from humanities peeps that boiled down to "but just anyone can write a web page!"

I don't think it changed, I do think people stopped talking about it.


The web remains unreliable. It's very useful, so good web users have developed a variety of strategies to extract and verify reliable information from the unreliable substrate, much as good AI users can use modern LLMs to perform a variety of tasks. But I also see a lot of bad web users and bad AI users who can't reliably distinguish between "I saw well written text saying X" and "X is true".


> I remember when the web wasn't considered reliable either

It still isn't.


Yes, it still isn't, we all know that. But we all also know that it was MUCH more unreliable then. Everyone's just being dishonest to try to make a point on this.


I'm more talking about the conversation around it, rather than its absolute unreliability, so I think they're missing the point a bit.

It's the same as the "never use your real name on the internet" -> facebook transition. Things get normalized. "This too shall pass."


Please check this excellent LLM-RAG AI-driven course assistant at UIUC for an example of university course [1]. It provide citations and references mainly for the course notes so the students can verify the answers and further study the course materials.

[1] AI-driven chat assistant for ECE 120 course at UIUC (only 1 comment by the website creator):

https://news.ycombinator.com/item?id=41431164


Given the propensity of LLMs to hallucinate references, I'm not sure that really solves anything


I've worked on systems where we get clickable links to source documents also added to the RAG store.

It is perfectly possible to use LLMs to provide accurate context. It's just asking a SaaS product to do that purely on data it was trained on, is not how to do that.


RAG means it injects the source material in and knows the hash of it and can link you right to the source document.


I haven't seen it happen at all with RAG systems. I've built one too at work to search internal stuff, and it's pretty easy to make it spit out accurate references with hyperlinks


Despite the name of "Generative" AI, when you ask LLMs to generate things, they're dumb as bricks. You can test this by asking them anything you're an expert at - it would dazzle a novice, but you can see the gaps.

What they are amazing at though is summarisation and rephrasing of content. Give them a long document and ask "where does this document assert X, Y and Z", and it can tell you without hallucinating. Try it.

Not only does it make for an interesting time if you're in the World of intelligent document processing, it makes them perfect as teaching assistants.


I often ask first, "discuss what it is you think I am asking" after formulating my query. Very helpful for getting greater clarity and leads to fewer hallucinations.


> you're wondering if the answer the AI gave you is correct or something it hallucinated

Worse, more insidious, and much more likely is the model is trained on or retrieves an answer that is incorrect, biased, or only conditionally correct for some seemingly relevant but different scenario.

A nontrivial amount of content online is marketing material, that is designed to appear authoritative and which may read like (a real example) “basswood is renowned for its tonal qualities in guitars”, from a company making cheap guitars.

If we were worried about a post-truth era before, at least we had human discernment. These new capabilities abstract away our discernment.


The sneaky thing is that the things we used to rely on as signals of verification and credibility can easily be imitated.

This was always possible--an academic paper can already cite anything until someone tries to check it [1]. Now, something looking convincing can be generated more easily than something that was properly verified. The social conventions evaporate and we're left to check every reference individually.

In academic publishing, this may lead to a revision of how citations are handled. That's changed before and might certainly change again. But for the moment, it is very easy to create something that looks like it has been verified but has not been.

[1] And you can put anything you like in footnotes.


To be honest I now see more hallucinations from humans on online forums than I do from LLMs.

A really great example of this is on twitter Grok constantly debunking human “hallucinations” all day.


Ah yes, like when Grok hallucinated Obama and Biden in a picture with two drunk dudes (both white, BTW).


You should practice healthy skepticism with rubber ducks as well:

https://en.wikipedia.org/wiki/Rubber_duck_debugging


Is this a fundamental issue with any LLM, or is it an artifact of how a model is trained, tuned and then configured or constrained?

A model that I call through e.g. langchain with constraints, system prompts, embeddings and whatnot, will react very different from when I pose the same question through the AI-providers' public chat interface.

Or, putting the question differently: could OpenAI not train, constrain, configure and tune models and combine them into a UI that then acts different from what you describe for another use case?


Lets not forget also the ecological impact and energy consumption.


Honestly, I think AI will eventually be a good thing for the environment. If ai companies are trying to expand renewables and nuclear to power their datacenters for training, well, that massive amount of renewables and battery storage becomes available when training is done and the main workload is inference. I know they are consistently training new stuff on small scale but from what I've read the big training batches only happen when they've proven out what works at small scale.

Also, one has to imagine that all this compute will help us run bigger / more powerful climate models, and google's ai is already helping them identify changes to be more energy efficient.

The need for more renewable power generation is also going to help us optimize the deployment process. I.e. modular nuclear reactors, in situ geothermal taking over old stranded coal power plants, etc


I find this take overly optimistic. First, it's bases on the assumption that the training will stop, and that energy will be available for other, more useful, purposes. This is not guarantees. Besides this, it completely disregards the fact that today, tomorrow, energy will be utilized. We will keep emitting co2 for sure, and maybe, in the future, this will cause a surplus of energy? It's a bet I wouldn't take, even because LLMs need lots of energy to run as well as for training.

But in any case, I wouldn't want Microsoft, Google, Amazon and OpenAI to be the ones owning the energetic infrastructure in the future, and if we realize, collectively, that building renewable sources is what er need, we should simply tax them and use that wealth to build collective resources.


I had teachers tell me all kinds of wrong things also. LLMs are amazing at the Socratic method because they never get bored.


The joke is on you, I was raised in Eastern Europe, where most of what history teachers told us was wrong

That being said. as someone who worked in a library and bookstore 90% of workbooks and technical books are identical. NotebookLM's mindmap feature is such a time saver


> you're wondering if the answer the AI gave you is correct or something it hallucinated

Regular research has the same problem finding bad forum posts and other bad sources by people who don't know what they're talking about, albeit usually to a far lesser degree depending on the subject.


Yes but that is generally public, with other people able to weigh in through various means like blog posts or their own paper.

Results from the LLM are your eyes only.


The difference is that llms mess with our heuristics. They certainly aren’t infallible but over time we develop a sense for when someone is full of shit. The mix and match nature of llms hides that.


You need different heuristics for LLMs. If the answer is extremely likely/consistent and not embedded in known facts alarm bells should go off.

A bit like the tropes in movies where the protagonists get suspicious because the antagonists agree to every notion during negotiations because they will betray them anyway.

The LLM will hallucinate a most likely scenario that conforms to your input/wishes.

I do not claim any P(detect | hallucination) but my P(hallucination | detect) is pretty good.


I ask: What time is {unix timestamp}

ChatGPT: a month in the future

Deepseek: Today at 1:00

What time is {unix timestamp2}

ChatGPT: a month in the future +1min

Deepseek: Today at 1:01, this time is 5min after your previous timestamp

Sure let me trust these results...


Also since I was testing a weather API I was suspicious of ChatGPTs result. I would not expect weather data from a month in the future. That is why I asked Deepseek in the first place.


While true, trial and error is a great learning tool as well. I think in time we’ll get to an LLM model that is definitive in its answer.


>But now, you're wondering if ... hallucinated

A simple solution is just to take <answer> and cut and paste it into Google and see if articles confirm it.


> for the low-low cost of $EYEWATERINGSUM per month.

This part is the 2nd (or maybe 3rd) most annoying one to me. Did we learn absolutely nothing from the last few years of enshittification? Or Netflix? Do we want to run into a crisis in the 2030's where billionaires hold knowledge itself hostage as they jack up costs?

Regardless of your stance, I'm surprised how little people are bringing this up.


did you trust everything you read online before?


Did you get to see more than one source calling out or disagreeing with potential untrustworthy content? You don't get that here.


of course you do, you have links to sources


Just have a second (cheap) model check if it can find any hallucinations. That should catch nearly all of them in my experience.


What is an efficient process for doing this? For each output from LLM1, you paste it into LLM2 and say "does this sound right?"?

If it's that simple, is there a third system that can coordinate these two (and let you choose which two/three/n you want to use?


Markdown files are everything. I use LLMs to create .md files to create and refine other .md files and then somewhere down the road I let another LLM write the code. It can also do fancy mermaid diagrams.

Have it create a .md and then run another one to check that .md for hallucinations.


You can use existing guardrails software to implement this efficiently:

NVIDIA NeMo offers a nice bundle of tools for this, among others an interface to Cleanlabs API to check for thruthfullness in RAG apps.


I realized that this is something that someone with Claude Code could reasonably easily test (at least exploratively).

Generate 100 prompts of "Famous (random name) did (random act) in the year (random). Research online and elaborate on (random name) historical significance in (randomName)historicalSignificance.md. Dont forget to list all your online references".

Then create another 100 LLMs with some hallucination Checker claude.md that checks their corresponding md for hallucinations and write a report.md.


No you’re not, it’s right the vast, vast majority of the time. More than I would expect the average physics or chemistry teacher to be.


> But now, you're wondering if the answer the AI gave you is correct or something it hallucinated. Every time I find myself putting factual questions to AIs, it doesn't take long for it to give me a wrong answer.

I know you'll probably think I'm being facetious, but have you tried Claude 4 Opus? It really is a game changer.


A game changer in which respect?

Anyway, this makes me wonder if LLMs can be appropriately prompted to indicate whether the information given is speculative, inferred or factual. Whether they have the means to gauge the validity/reliability of their response and filter their response accordingly.

I've seen prompts that instruct the LLM to make this transparent via annotations to their response, and of course they comply, but I strongly suspect that's just another form of hallucination.


What exactly did 2025 AI hallucinate for you? The last time I've seen a hallucination from these things was a year ago. For questions that a kid or a student is going to answer im not sure any reasonable person should be worried about this.


If the last time you saw a wrong answer was a year ago, then you are definitely regularly getting them and not noticing.


Just a couple of days ago, I submitted a few pages from the PDF of a PhD thesis written in French to ChatGPT, asking it to translate them into English. The first 2-3 pages were perfect, then the LLM started hallucinating, putting new sentences and removing parts. The interesting fact is that the added sentences were correct and generally on the spot: the result text sounded plausible, and only a careful comparison of each sentence revealed the truth. Near the end of the chapter, virtually nothing of what ChatGPT produced was directly related to the original text.


Transformer models are excellent at translation, but next-token prediction is not the correct architecture for it. You want something more like seq2seq. Next token prediction cares more about local consistency (i.e., going off on a tangent with a self-consistent but totally fabricated "translation") than faithfulness.


I use it every day for work and every day it gets stuff wrong of the "that doesn't even exist" variety. Because I'm working on things that are complex + highly verifiable, I notice.

Sure, Joe Average who's using it to look smart in Reddit or HN arguments or to find out how to install a mod for their favorite game isn't gonna notice anymore, because it's much more plausible much more often than two years ago, but if you're asking it things that aren't trivially easy for you to verify, you have no way of telling how frequently it hallucinates.


I had Google Gemini 2.5 Flash analyse a log file and it quoted content that simply didn't exist.

It appears to me like a form of decoherence and very hard to predict when things break down.

People tend to know when they are guessing. LLMs don't.


Nah it's not that rare.

This is one I got today:

https://chatgpt.com/share/6889605f-58f8-8011-910b-300209a521...

(image I uploaded: http://img.nrk.no/img/534001.jpeg)

The correct answer would have been Skarpenords Bastion/kruttårn.


OpenAI's o3/40 models completely spun out when I was trying to write a tiny little TUI with ratatui, couldn't handle writing a render function. No idea why, spent like 15 minutes trying to get it to work, ended up pulling up the docs..

I haven't spent any money with claude on this project and realistically it's not worth it, but I've run into little things like that a fair amount.


>Thanks all for the replies, we’re hardcoding fixes now

-LLM devcos

Jokes aside, get deep into the domains you know. Or ask to give movie titles based on specific parts of uncommon films. And definitely ask for instructions using specific software tools (“no actually Opus/o3/2.5, that menu isn’t available in this context” etc.).


For starters, lots of examples over the last few months where AIs make up stuff when it comes to coding.

A couple of non-programming examples: https://www.evidentlyai.com/blog/llm-hallucination-examples


Are you using them daily? I find that maybe 3 or 4 programming questions I ask per day, it simply cannot provide a correct answer even after hand holding. They often go to extreme gymnastics to try to gaslight you no matter how much proof you provide.

For example, today I was asking a LLM about how to configure a GH action to install a SDK version that was just recently out of support. It kept hallucinating on my config saying that when you provide multiple SDK versions in the config, it only picks the most recent. This is false. It's also mentioned in the documentation specifically, which I linked the LLM, that it installs all versions you list. Explaining this to copilot, it keeps doubling down, ignoring the docs, and even going as far as asking me to have the action output the installed SDKs, seeing all the ones I requested as installed, then gaslighting me saying that it can print out the wrong SDKs with a `--list-sdks` command.


ChatGPT hallucinates things all the time. I will feed it info on something and have a conversation. At first it's mostly fine, but eventually it starts just making stuff up.


I've found that giving it occasional nudges (like reminding it of the original premise) can help keep it on track


Ah yes it is a fantastic tool when you manually correct it all the time.


For me, most commonly ChatGPT hallucinates configuration options and command line arguments for common tools and frameworks.


Two days ago when my boomer mother in law tried to justify her anti-cancer diet that killed Steve Jobs. On the bright side my partner will be inheriting soon by the looks of it.


Not defending your mother-in-law here (because I agree with you that it is a pretty silly and maybe even potentially harmful diet), afaik it wasn’t the diet itself that killed Steve Jobs. It was his decision to do that diet instead of doing actual cancer treatment until it was too late.


Given that I've got two people telling me here "ackshually" I guess it may not be hallucinations and just really terrible training data.

Up next - ChatGPT does jumping off high buildings kill you?

>>No jumping off high buildings is perfectly safe as long as you land skillfully.


>>No jumping off high buildings is perfectly safe as long as you land skillfully.

Not really, because no matter how you spin it, the person in your scenario dies.

However, doing Steve Jobs’ diet might actually be fine (or at least not deadly) for an average person. Only as long as they don’t have late-stage pancreatic cancer and don’t decide to forego chemotherapy treatment.

Which is what killed Jobs, not the diet. For all we know, he might’ve been alive today even if he followed the same diet, as long as he also did the chemo treatment.


Job's diet didn't kill him. Not getting his cancer treated was what killed him.


Yes, we also covered that jumping off buildings doesn't kill people. The landing does.


Indeed if you're a base jumper with a parachute, you might survive the landing.

Ackshually, this seems analogous to Job's diet and refusal of cancer treatment! And it was the cancer that put him at the top of the building in the first place.


The anti cancer diet absolutely works if you want to reduce the odds of getting cancer. It probably even works to slow cancer compared to the average American diet. Will it stop and reverse a cancer? Probably not.


I thought it was high fiber diets that reduce risk of cancer (ever so slightly), because of reduced inflammation. Not fruity diets, which are high in carbohydrates.


Cutting red or preserved meat cuts bowel cancer risk so fruity diets would cut that risk.


How much does it 'reduce the odds'?


Idk, I'm not an encyclopedia. You can Google it.


Last week I was playing with the jj VCS and it couldn't even understand my question (how to swap two commits).


How do you know? its literally non-deterministic.


Most (all?) AI models I work with are literally deterministic. If you give it the same exact input, you get the same exact output every single time.

What most people call “non-deterministic” in AI is that one of those inputs is a _seed_ that is sourced from a PRNG because getting a different answer every time is considered a feature for most use cases.

Edit: I’m trying to imagine how you could get a non-deterministic AI and I’m struggling because the entire thing is built on a series of deterministic steps. The only way you can make it look non-deterministic is to hide part of the input from the user.


This is an incredibly pedantic argument. The common interfaces for LLMs set their temperature value to non-zero, so they are effectively non-deterministic.


From the good old days: https://152334h.github.io/blog/non-determinism-in-gpt-4/ (that's been a short two years).

Unless something has fundamentally changed since then (which I've not heard about) all sparse models are only deterministic at the batch level, rather than the sample level.


Even after temperature=0 I believe there is some non-determinism at the chip level, similar to https://stackoverflow.com/questions/50744565/how-to-handle-n...


> I’m trying to imagine how you could get a non-deterministic AI

Depends on the machine that implements the algorithm. For example, it’s possible to make ALUs such that 1+1=2 most of the time, but not all the time.

Just ask Intel. (Sorry, I couldn’t resist)


So by default. Its non-deterministic for all non power users.


If LLMs of today's quality were what was initially introduced, nobody would even know what your rebuttals are even about.

So "risk of hallucination" as a rebuttal to anybody admitting to relying on AI is just not insightful. like, yeah ok we all heard of that and aren't changing our habits at all. Most of our teachers and books said objectively incorrect things too, and we are all carrying factually questionable knowledge we are completely blind to. Which makes LLMs "good enough" at the same standard as anything else.

Don't let it cite case law? Most things don't need this stringent level of review


Agree, "hallucination" as an argument to not use LLMs for curiosity and other non-important situations is starting to seem more and more like tech luddism, similar to the people who told you to not read Wikipedia 5+ years after the rest of us realized it is a really useful resource despite occasional inaccuracies.


Fun thing about wikipedia is that if one person notices, they can correct it. [And someone's gonna bring up edit wars and blah blah blah disputed topics, but let's just focus on straightforward factual stuff here.]

Meanwhile in LLM-land, if an expert five thousand miles a way asked the same question you did last month, and noticed an error... it ain't getting fixed. LLMs get RL'd into things that look plausible for out-of-distribution questions. Not things that are correct. Looking plausible but non-factual is in some ways more insidious than a stupid-looking hallucination.


> to not use LLMs for curiosity and other non-important situations is starting to seem more and more like tech luddism

We're on a topic talking about using an LLM to study. I don't particularly care if someone wants an AI boyfriend to whisper sweet nothings into their ear. I do care when people will claim to have AI doctors and lawyers.


The fear of asking stupid questions is real, especially if one has had a bad experience with humiliating teachers or professors. I just recently saw a video of a professor subtly shaming and humiliating his students for answering questions to his own online quiz. He teaches at a prestigious institution and has a book that has a very good reputation. I stopped watching his video lectures.


So instead of correcting the teachers with better training, we retreat from education and give it to technocrats? Why are we so afraid of punishing bad, unproductive, and even illegal behavior in 2025?


Looks like we were unable to correct them over the last 3k years. What has changes in 2025 that you think we will succeed in correcting that behavior?

Not US based, Central/Eastern Europe: the selection to the teacher profession is negative, due to low salary compared to private sector; this means that the unproductive behaviors are likely going to increase. I'm not saying the AI is the solution here for low teacher salaries, but training is def not the right answer either, and it is a super simplistic argument: "just train them better".


>Looks like we were unable to correct them over the last 3k years.

What makes you say that?

>What has changes in 2025 that you think we will succeed in correcting that behavior?

60 years ago, corporal punishment was commonplace. Today it is absolutely forbidden. I don't think behaviors among professions need that much time to be changed. I'm sure you can point to behaviors commonplace 10 years ago that have changed in your workplace (for better or worse).

But I suppose your "answer" is 1) a culture more willing to hold professionals accountable instead of holding them as absolute authority and 2) surveillance footage to verify claims made against them. This goes back to Hammurabi: if you punish a bad behavior, many people will adjust.

>the selection to the teacher profession is negative, due to low salary compared to private sector; this means that the unproductive behaviors are likely going to increase.

I'm really holding back my urge to be sarcastic here. I'm trying really hard. But how do I say "well fund your teachers" in any nuanced way? You get what you pay for. A teacher in a classroom of broken windows will not shine inspiration on the next generation.

This isn't a knock on your culture: the US is at a point where a stabucks barista part-time is paid more than some schoolteachers.

>but training is def not the right answer either

I fail to see why not. "We've tried nothing and run out of ideas!", as a famous American saying. Tangible actions:

1) participate in your school board if you have one, be engaged with who is in charge of your education sectors. Voice your concerns with them, and likely any other town or city leaders since I'm sure the problem travels upstream to "we didn't get enough funding from the town"

2) if possible in your country, 100% get out and vote in local elections. The US does vote in part of its boards for school districts, and the turnout for these elections are pathetic. Getting you and a half dozen friends to a voting booth can in fact swing an election.

3) if there's any initiatives, do make sure to vote for funding for educational sectors. Or at least vote against any cuts to education.

4) in general, push for better labor laws. If a minimum wage needs to be higher, do that. Or job protections.

There are actions to take. They don't happen overnight. But we didn't get to this situation overnight either.


> This isn't a knock on your culture: the US is at a point where a stabucks barista part-time is paid more than some schoolteachers.

I don't think this is meaningfully true. I found a resource that shows the average teacher salary to be $72,030 [0]. The average starting salary is lower at $46,526, but a 40 hour workweek at $20 for a Starbucks barista tips-included is about $41k. Here in Massachusetts, the average teacher salary is $92,076. In Mississippi, it's $53,704. You can maybe find some full time (not part time) Starbucks baristas that make slightly more than starting teachers, but after a couple of years the teacher will pull ahead. However, since the higher paying Starbucks jobs are in places with higher costs of living, I would assume that the teacher pay would be higher in those places too, so it's a wash.

> "We've tried nothing and run out of ideas!", as a famous American saying.

Ironically Mississippi of all states has experimented by holding back more poor performing kids instead of letting them advance to the next grade, with some success in rising test scores: "Boston University researchers released a study this year comparing Mississippi students who were narrowly promoted to fourth grade to those who just missed the cutoff. It found that by sixth grade, those retained had substantial gains on English language arts scores compared with those who were promoted, especially among African-American and Hispanic students." [1].

This doesn't disprove what you're saying (and there are some caveats to the Mississippi experiment), but there is definitely low hanging fruit to improve the American teaching system. Just because teaching is a thousands year old profession doesn't mean modern day processes can't be improved by ways not involving salaries/direct training.

[0] https://www.nea.org/resource-library/educator-pay-and-studen...

[1] https://www.wsj.com/us-news/education/more-states-threaten-t...?


I'll admit "some schoolteacher" is doing some heavy lifting here. It shouldn't be that close to begin with when you remember that school teachers need extra license/acreddidation (so, more post secondary education whose costs run rampant) and arguably have a much more stressful job.

>there is definitely low hanging fruit to improve the American teaching system.

Sure, you can patch the window up and make sure it at least tries to protect from the elements. But we should properly fix it at some point too. How many of those kids would have not been held back if they had a proper instructor to begin with? Or an instructor that didn't need to quit midway into the school year in order to find a job that does pay rent?


At a system level, this totally makes sense. But as an individual learner, what would be my motivation to do so, when I can "just" actually learn my subject and move on?


>But as an individual learner, what would be my motivation to do so

Because if there's one thing the older generations is much better than us at, it's complaining about the system and getting them to kowtow to them. We dismiss systematic change as if it doesn't start with the individual, and are surprised that the system ignores or abuses us.

We should be thinking short and long term. Learn what you need to learn today, but if you want better education for you and everyone else: you won't get it by relinquishing the powers you have to evoke change.


You might also be working with very uncooperative coworkers, or impatient ones


I told u my word "I WILL NEVER BREAK" but this time im gonna be so patient so u will never see whats coming. and im just making my list so i can put a tick next to every name i do my project on. i made sure all addresses cars and members have been noted down. and i promise u that i wont give up till the job is finished. after all i seen and witnessed, u wont be able to even imagine what its gonna be like and then you will finially understand the meaning of DONT F>>>K WITH ME. u fat smelly DYKE!


> Adding in a mode that doesn't just dump an answer but works to take you through the material step-by-step is magical

Except these systems will still confidently lie to you.

The other day I noticed that DuckDuckGo has an Easter egg where it will change its logo based on what you've searched for. If you search for James Bond or Indiana Jones or Darth Vader or Shrek or Jack Sparrow, the logo will change to a version based on that character.

If I ask Copilot if DuckDuckGo changes its logo based on what you've searched for, Copilot tells me that no it doesn't. If I contradict Copilot and say that DuckDuckGo does indeed change its logo, Copilot tells me I'm absolutely right and that if I search for "cat" the DuckDuckGo logo will change to look like a cat. It doesn't.

Copilot clearly doesn't know the answer to this quite straightforward question. Instead of lying to me, it should simply say it doesn't know.


This is endlessly brought up as if the human operating the tool is an idiot.

I agree that if the user is incompetent, cannot learn, and cannot learn to use a tool, then they're going to make a lot of mistakes from using GPTs.

Yes, there are limitations to using GPTs. They are pre-trained, so of course they're not going to know about some easter egg in DDG. They are not an oracle. There is indeed skill to using them.

They are not magic, so if that is the bar we expect them to hit, we will be disappointed.

But neither are they useless, and it seems we constantly talk past one another because one side insists they're magic silicon gods, while the other says they're worthless because they are far short of that bar.


The ability to say "I don't know" is not a high bar. I would say it's a basic requirement of a system that is not magic.


Based on your example, basically any answer would be "I don't know 100%".

You could ask me as a human basically any question, and I'd have answers for most things I have experience with.

But if you held a gun to head and said "are you sure???" I'd obviously answer "well damn, no I'm not THAT sure".


It'd at least be an honest one that recognizes that we shouldn't be trusting the tech wholesale yet.

>But if you held a gun to head and said "are you sure???" I'd obviously answer "well damn, no I'm not THAT sure".

okay, who's holding a gun to Sam Altman's head?


Perhaps LLMs are magic?


I see your point

Some of the best exchanges that I participated in or witnessed involved people acknowledging their personal limits, including limits of conclusions formed a priori

To further the discussion, hearing the phrase you mentioned would help the listener to independently assess a level of confidence or belief of the exchange

But then again, honesty isn't on-brand for startups

It's something that established companies say about themselves to differentiate from competitors or even past behavior of their own

I mean, if someone prompted an llm weighted for honesty, who would pay for the following conversation?

Prompt: can the plan as explained work?

Response: I don't know about that. What I do know is on average, you're FUCKED.


> The ability to say "I don't know" is not a high bar.

For you and I, it's not. But for these LLMs, maybe it's not that easy? They get their inputs, crunch their numbers, and come out with a confidence score. If they come up with an answer they're 99% confident in, by some stochastic stumbling through their weights, what are they supposed to do?

I agree it's a problem that these systems are more likely to give poor, incorrect, or even obviously contradictory answers than say "I don't know". But for me, that's part of the risk of using these systems and that's why you need to be careful how you use them.


but they're not. Ofyen the confidence value is much lower. I should have an option to see how confident it is. (maybe set the opacity of each token to its confidence?)


Logits aren't confidence about facts. You can turn on a display like this in the openai playground and you will see it doesn't do what you want.


>If they come up with an answer they're 99% confident in, by some stochastic stumbling through their weights, what are they supposed to do?

As much as Fi, from The Legend of Zelda: Skyward Sword was mocked for this, this is the exact behavior a machine should do (not that Fi is a machine, but she operated as such).

Give a confidence score the way we do in statistics, make sure to offer sources, and be ready to push back on more objective answers. accomplish those and I'd be way more comfortable using them as a tool.

>hat's part of the risk of using these systems and that's why you need to be careful how you use them.

Adn we know in 2025 how careful the general user is of consuming bias and propaganda, right?


The confidence score is about the likelihood of this token appearing in this context.

LLMs don't operate in facts or knowledge.


It certainly should be able to tell you it doesn't know. Until it can though, a trick that I have learned is to try to frame the question in different ways that suggest contradictory answers. For example, I'd ask something like these, in a fresh context for each:

- Why does Duckduckgo change it's logo based on what you've searched?

- Why doesn't Duckduckgo change it's logo based on what you've searched?

- When did Duckduckgo add the current feature that will change the logo based on what you've searched?

- When did Duckduckgo remove the feature that changes the logo based on what you've searched?

This is similar to what you did, but it feels more natural when I genuinely don't know the answer myself. By asking loaded questions like this, you can get a sense of how strongly this information is encoded in the model. If the LLM comes up with an answer without contradicting any of the questions, it simply doesn't know. If it comes up with a reason for one of them, and contradicts the other matching loaded question, you know that information is encoded fairly strongly in the model (whether it is correct is a different matter).


I see these approaches a lot when I look over the shoulders of LLM users, and find it very funny :D you're spending the time, effort, bandwidth and energy for four carefully worded questions to try and get a sense of the likelihood of the LLM's output resembling facts, when just a single, basic query with simple terms in any traditional search engine would give you a much more reliable, more easily verifable/falsifiable answer. People seem so transfixed by the conversational interface smokeshow that they forgot we already have much better tools for all of these problems. (And yes, I understand that these were just toy examples.)


The nice thing about using a language model over using a traditional search engine is being able to provide specific context (ie disambiguate where keyword searches would be ambiguous) and to correlate unrelated information that would require multiple traditional searches using a single LLM query. I use Kagi, which provides interfaces for both traditional keyword searches, and for LLM chats. I use whichever is more appropriate for any given query.


It really depends on the query. I'm not a Google query expert, but I'm above average. I've noticed that phrasing a query in a certain way to get better results just no longer works. Especially in the last year, I have found it returns results that aren't even relevant at all.

The problem is people have learned to fill their articles/blogs with as many word combinations as possible so that it will show up in as many Google searches as possible, even if it's not relevant to the main question. The article has just 1 subheading that is somewhat relevant to the search query, even though the information under that subheading is completely irrelevant.

LLMs have ironically made this even worse because now it's so easy to generate slop and have it be recommended by Google's SEO. I used to be very good at phrasing a search query in the right way, or quoting the right words/phrases, or having it filter by sites. Those techniques no longer work.

So I have turned to ChatGPT for most of the queries I would have typically used Google for. Especially with the introduction of annotations. Now I can verify the source from where it determined the answer. It's a far better experience in most circumstances compared to Google.

I have also found ChatGPT to be much better than other LLMs at understanding nuance. There have been numerous occasions where I have pushed back against ChatGPT's answer and it has responded with something like "You would be correct if your input/criteria is X. But in this case, since your input/criteria is Y, this is the better solution for Z reasons".


Consider the adoption of conventional technology in the classroom. The US has spent billions on new hardware and software for education, and yet there has been no improvement in learning outcomes.

This is where the skepticism arises. Before we spend another $100 billion on something that ended up being worthless, we should first prove that it’s actually useful. So far, that hasn’t conclusively been demonstrated.


You appear to be implying that the $100 billion hardware and software must all be completely useless. I think the opposite conclusion is more likely: the structure of the education system actively hinders learning, so much so that even the hardware and software you talk about couldn't work against it.


If so, the conclusion wrt AI remains the same - one cannot expect improved learning outcomes from investing in it.


Correct. Since the dawn of radio, people have speculated that every new communication technology will “revolutionize education” and it never has. It boggles my mind how long that can keep happening before people consider that the problem might be with education, not with the technology.


The article states that Study Mode is free to use. Regardless of b2b costs, this is free for you as an individual.


billions on tech but not on making sure teachers can pay rent. Even the prestige or mission oriented structure of teaching has been weathered over the decades as we decided to shame teachers as government funded babysitters instead of the instructors of our future generations.

Truly a mystery why America is falling behind.


I agree with all that you say. It’s an incredible time indeed. Just one thing I can’t wrap my mind around is privacy. We all seem to be asking sometimes stupid and some times incredibly personal questions to these llms. Questions that we may not even speak out loud from embarrassment or shame or other such emotions to even our closest people. How are these companies using our data ? More importantly what are you all doing to protect yourself from misuse of your information? Or is it if you want to use it you have to give up such privacy and uncomfortableness ?


People often bring up the incredible efficiency improvements of LLMs over the last few years, but I don't think people do a really good job of putting it into perspective just how much more efficient they have gotten. I have a machine in my home with a single RX 7900 XTX in it. On that machine, I am able to run language models that blow GPT-3.5 Turbo out of the water in terms of quality, knowledge, and even speed! That is crazy to think about when you consider how large and capable that model was.

I can often get away with just using models locally in contexts that I care about privacy. Sometimes I will use more capable models through APIs to generate richer prompts than I could write myself to be able to better guide local models too.


> Learning something online 5 years ago often involved trawling incorrect, outdated or hostile content and attempting to piece together mental models without the chance to receive immediate feedback on intuition or ask follow up questions.

That trained and sharpened invaluable skills involving critical thinking and grit.


> [Trawling around online for information] trained and sharpened invaluable skills involving critical thinking and grit.

Here's what Socrates had to say about the invention of writing.

> "For this invention will produce forgetfulness in the minds of those who learn to use it, because they will not practice their memory. Their trust in writing, produced by external characters which are no part of themselves, will discourage the use of their own memory within them. You have invented an elixir not of memory, but of reminding; and you offer your pupils the appearance of wisdom, not true wisdom, for they will read many things without instruction and will therefore seem [275b] to know many things, when they are for the most part ignorant and hard to get along with, since they are not wise, but only appear wise."

https://www.historyofinformation.com/detail.php?id=3439

I mean, he wasn't wrong! But nonetheless I think most of us communicating on an online forum would probably prefer not to go back to a world without writing. :)

You could say similar things about the internet (getting your ass to the library taught the importance of learning), calculators (you'll be worse at doing arithmetic in your head), pencil erasers (https://www.theguardian.com/commentisfree/2015/may/28/pencil...), you name it.


>I mean, he wasn't wrong! But nonetheless I think most of us communicating on an online forum would probably prefer not to go back to a world without writing. :)

What social value is an AI chatbot giving to us here, though?

>You could say similar things about the internet (getting your ass to the library taught the importance of learning)

Yes, and as we speak countries are determining how to handle the advent of social media as this centralized means of propaganda, abuse vector, and general way to disconnect local communities. It clearly has a different magnitude of impact than etching on a stone tablet. The UK made a particularly controversial decision recently.

I see AI more in that camp than in the one of pencil erasers.


>Here's what Socrates had to say about the invention of writing.

I think you mean to say, "Here's what Plato wrote down that Socrates said"...


And also taught people how to actually look for information online. The average person still does not know how to google, I still see people writing whole sentences in the search bar.


This is the "they're holding it wrong" of search engines. People want to use a search engine by querying with complete sentences. If search engines don't support such querying, it's the search engine that is wrong and should be updated, not the people.

Search engines have gotten way better at handling complete sentences in recent years, to the point where I often catch myself deleting my keyword query and replacing it with a sentence before I even submit it, because I know I will be able to more accurately capture what it is I am searching for in a sentence.


Funnily enough, I've shown some people who said they liked using ChatGPT over Google because they can ask questions in natural language, that they can paste the same natural language question to Google's search bar and get their answers just as easily, and with actual sources. That was before search engines started showing "AI summaries", so I guess the demonstration effect wouldn't be the same today.

Natural language search queries have worked surprisingly well for quite a while before that, even.


It didn’t. Only frustrated and slowed down students.


Sounds like somebody who disliked implementing QuickSort as a student because what's the point, there is a library for it, you'll never need to do that kind of thing "in the real world".

Maybe someday an LLM will be able to explain to you the pedagogical value of an exercise.


LLMs, by design, are peak Duning-Kruegers, which means they can be any good of a study partner for basic introductory lessons and topics. Yet they still require handholding and thorough verification, because LLMs will spit out factually incorrect information with confidence and will fold on correct answers when prodded. Yet the novice does not posses the skill to handhold the LLM. I think there's a word for that, but chadgbt is down for me today.

Furthermore, forgetting curve is a thing and therefore having to piece information together repetitively, preferably in a structured manner, leads to a much better information retention. People love to claim how fast they are "learning" (more like consuming tiktoks) from podcasts at 2x speed and LLMs, but are unable to recite whatever was presented few hours later.

Third, there was a paper circulating even here on HN that showed that use of LLMs literally hinder brain activation.


In my experience asking questions to Claude, the amount of incorrect information it gives is on a completely different scale in comparison to traditional sources. And the information often sounds completely plausible too. When using a text book, I would usually not Google every single piece of new information to verify it independently, but with Claude, doing that is absolutely necessary. At this point I only use Claude as a stepping stone to get ideas on what to Google because it is giving me false information so often. That is the only "effective" usage I have found for it, which is obviously much less useful than a good old-fashioned textbook or online course.

Admittedly I have less experience with ChatGPT, but those experiences were equally bad.


What kind of questions / domains were you encountering false information on?


Most false information was on the hardware description language VHDL that I'm currently learning.


Ground it with text from a correct source. That's all it needs.


Then why not just use the source text directly and save yourself all the double-guessing?


>I'm puzzled (but not surprised) by the standard HN resistance & skepticism

The good: it can objectively help you to zoom forward in areas where you don’t have a quick way forward.

The bad: it can objectively give you terrible advice.

It depends on how you sum that up on balance.

Example: I wanted a way forward to program a chrome extension which I had zero knowledge of. It helped in an amazing way.

Example: I am keep trying to use it in work situations where I have lots of context already. It performs better than nothing but often worse than nothing.

Mixed bag, that’s all. Nothing to argue about.


mixed bags are our favorite thing to argue about


Haha yes! Thanks for that!


HN is resistant because at the end of the day, these are LLMs. They cannot and do not think. They generate plausible responses. Try this in your favorite LLM: "Suppose you're on a game show trying to win a car. There are three doors, one with a car and two with goats. You pick a door. The host then gives you the option to switch doors. What is the best strategy in this situation?" The LLM will recognize this as SIMILAR to the Monty Hall problem and tell you to always switch. I just reproduced this with ChatGPT.

But this is completely wrong! In the Monty Hall problem, the host has to reveal a door with a goat behind it for you to gain the benefit of switching. I have to point this out for the LLM to get it right. It did not reason about the problem I gave it, it spat out the most likely response given the "shape" of the problem.

This is why shrugging and saying "well humans get things wrong too" is off base. The problem is that the LLM is not thinking, period. So it cannot create a mental model of your understanding of a subject, it is taking your text and generating the next message in a conversation. This means that the more niche the topic (or your particular misunderstanding), the less useful it will get.


That's because the LLM assumes you mean for the host to open the door.

As an autistic person I can assure you that people filling in information you neither gave nor implied is the norm for human conversations. In that sense, the LLMs are being human-like.

If you explicitly tell it you didn't forget to have the host open the door it gives you the right answer:

> There is a game show with three closed doors. One has a car and two have goats. I pick a door. Then, without opening my door or any other door, the host asks me if I want to switch my choice to one of the other two doors. What is my best strategy and what are the probabilities?

Gemini: This is a classic probability puzzle known as the Monty Hall problem, though your description presents a slight variation. The optimal strategy depends entirely on whether the host opens a door or not. Your Scenario: The Host Doesn't Open a Door In the situation you've described, where the host asks if you want to switch your choice to one of the other two doors without opening any doors, there is no advantage to switching. Here's a breakdown of the probabilities in your specific scenario: Your initial choice: When you first pick a door, you have a 1 in 3 chance of selecting the door with the car. This means you have a 2 in 3 chance of having picked a door with a goat. The other two doors: The combined probability that the car is behind one of the other two doors is 2/3. Switching your choice: If you decide to switch, you are essentially choosing one of the other two doors. Since there is no new information (a door with a goat has not been revealed), the 2/3 probability is evenly split between those two doors. Therefore, your chance of winning by switching to a specific one of the other doors is 1/3. In this case, your odds of winning are 1/3 whether you stay with your original choice or switch to one of the other doors.


> That's because the LLM assumes you mean for the host to open the door.

LLM's cannot "assume". There is no thinking involved. It sees that the prompt looks like the monty hall problem and it just goes full steam ahead.

>If you explicitly tell it you didn't forget to have the host open the door it gives you the right answer:

That should not be necessary. I asked it a very clear question. I did not mention Monty Hall. This is the problem with LLM's: it did not analyze the problem I gave it, it produced content that is the likely response to my prompt. My prompt was Monty Hall-shaped, so it gave me the Monty Hall answer.

You are saying "ah but then if you prepare for the LLM to get it wrong, then it gets it right!" as if that is supposed to be convincing! Consider the millions of other unique questions you can ask, each with their own nuances, that you don't know the answer to. How can you prevent the LLM from making these mistakes if you don't already know the mistakes it's going to make?


> LLM's cannot "assume". There is no thinking involved. It sees that the prompt looks like the monty hall problem and it just goes full steam ahead.

I think the poster's point was that many humans would do the same thing.

Try a completely different problem, one you invented yourself and see where you get? I'd be very interested to hear the response back here.


Humans who have heard of Monty Hall might also say you should always switch without noticing that the situation is different. That's not evidence that they can't think, just that they're fallible.

People on here always assert LLMs don't "really" think or don't "really" know without defining what all that even means, and to me it's getting pretty old. It feels like an escape hatch so we don't feel like our human special sauce is threatened, a bit like how people felt threatened by heliocentrism or evolution.


> Humans who have heard of Monty Hall might also say you should always switch without noticing that the situation is different. That's not evidence that they can't think, just that they're fallible.

At some point we start playing a semantics game over the meaning of "thinking", right? Because if a human makes this mistake because they jumped to an already-known answer without noticing a changed detail, it's because (in the usage of the person you're replying to), the human is pattern matching, instead of thinking. I don't think is surprising. In fact I think much of what passes for thinking in casual conversation is really just applying heuristics we've trained in our own brains to give us the correct answer without having to think rigorously. We remember mental shortcuts.

On the other hand, I don't think it's controversial that (some) people are capable of performing the rigorous analysis of the problem needed to give a correct answer in cases like this fake Monty Hall problem. And that's key... if you provide slightly more information and call out the changed nature of the problem to the LLM, it may give you the correct response, but it can't do the sort of reasoning that would reliably give you the correct answer the way a human can. I think that's why the GP doesn't want to call it "thinking" - they want to reserve that for a particular type of reflective process that can rigorously perform logical reasoning in a consistently valid way.


I'm not sure what your argument is. The common claim that annoys me about LLMs on here is that they're not "really" coming up with ideas but that they're cheating and just repeating something they read on the internet somewhere that was written by a human who can "really" think. To me this is obviously false if you've talked to a SOTA LLM or know a little about how they work.


On the other hand, computers are suppose to be both accurate and able to reproduce said accuracy.

The failure of an LLM to reason this out is indicative that really, it isn’t reasoning at all. It’s a subtle but welcome reminder that it’s pattern matching


Computers might be accurate but statistical models never were 100% accurate. That doesn't imply that no reasoning is happening. Humans get stuff wrong too but they certainly think and reason.

"Pattern matching" to me is another one of those vague terms like "thinking" and "knowing" that people decide LLMs do or don't do based on vibes.


Pattern matching has a definition in this field, it does mean specific things. We know machine learning has excelled at this in greater and greater capacities over the last decade

The other part of this is weighted filtering given a set of rules, which is a simple analogy to how AlphaGo did its thing.

Dismissing all this as vague is effectively doing the same thing as you are saying others do.

This technology has limits and despite what Altman says, we do know this, and we are exploring them, but it’s within its own confines. They’re fundamentally wholly understandable systems that work on a consistent level in terms of the how they do what they do (that is separate from the actual produced output)

I think reasoning, as any layman would use the term, is not accurate to what these systems do.


You're derailing the conversation. The discussion was about thinking, and now you're arguing about something entirely different and didn't even mention the word “think” a single time.

If you genuinely believe that anyone knows how LLMs work, how brains work, and/or how or why the latter does “thinking” while the former does not, you're just simply wrong. AI researchers fully acknowledge ignorance in this matter.


> Pattern matching has a definition in this field, it does mean specific things.

Such as?

> They’re fundamentally wholly understandable systems that work on a consistent level in terms of the how they do what they do (that is separate from the actual produced output)

Multi billion parameter models are definitely not wholly understandable and I don't think any AI researcher would claim otherwise. We can train them but we don't know how they work any more than we understand how the training data was made.

> I think reasoning, as any layman would use the term, is not accurate to what these systems do.

Based on what?


You’re welcoming to provide counters. I think these are all sufficiently common things that they stand on their own as to what I posit


Look, you're claiming something, it's up to you to back it up. Handwaving what any of these things mean isn't an argument.


I guess computer vision didnt get this memo and it is useless.


>People on here always assert LLMs don't "really" think or don't "really" know without defining what all that even means,

Sure.

To Think: able to process information in a given context and arrive at an answer or analysis. an LLM only simulates this with pattern matching. It didn't really consider the problem, it did the equivalent of googling a lot of terms and then spat something that sounded like an answer

To Know: To reproduce information based on past thinking, as well as to properly verify and reason about with the information. I know 1+1 = 2 because (I'm not a math major, feel free to inject number theory instead) I was taught that arithmatic is a form of counting, and I was taught the mechanics of counting to prove how to add. Most LLM models don't really "know" this to begin with for the reasons above. Maybe we'll see if this study mode is different.

Somehow I am skeptical if this will really change minds, though. People making swipes at the community like this often are not really engaging in a conversation with ideas they oppose.


I have to push back on this. It's the people who constantly assert that LLMs “don't think” who are not engaging in a conversation. It's a thought-terminating cliché.

Unfortunately, even those willing to engage in this conversation still don't have much to converse about, because we simply don't know what thinking actually is, how the brain works, how LLMs work, and to what extent they are similar or different. That makes it all the more vexing to me when people say this, because the only thing I can say in response is “you don't know that (and neither does anyone else)”.


>It's the people who constantly assert that LLMs “don't think” who are not engaging in a conversation.

I'm responding to the conversation. Oftentimes it's engaged on "AI is smarter than me/other people". It's in the name, but "intelligence" is a facade put on by the machine to begin with.

>because we simply don't know what thinking actually is

I described my definition. You can disagree or make your own interpretation, but to dismiss my conversation and simply say "no one knows" is a bit ironic for a person accusing me of not engaging in a conversation.

Philosophy spent centuries trying to answer that question. Mine is a simple, pragmatic approach. Just because there's no objective answer doesn't mean we can't converse about it.


You're just deferring to another vague term "pattern matching".

If I think back to something I was taught in primary school and conclude that 1+1=2 is that pattern matching? Therefore I don't really "know" or "think"?

People pretend like LLMs are like some 80s markov chain model or nearest neighbor search, which is just uninformed.


Do you want to shift the discussion to the definition of a "pattern" or are we going to continue to move the goalpost? I'm trying to respond to your inquiry and instead we're just stuck in minutia.

Yes, to make an apple pie from scratch, we need to first invent the universe. Is that productive conversation to fall into or can we just admit that your dismissing any opinion that goes against your purview?

>If I think back to something I was taught in primary school and conclude that 1+1=2 is that pattern matching?

Yes. That is an example of pattern matching. Let me know when you want to go back to talking about LLMs.


So because I'm pattern matching that means I'm not thinking right? That's the same argument you have for LLMs.


LLMs are vulnerable to your input because they are still computers, but you're setting it up to fail with how you've given it the problems. Humans would fail in similar ways. The only thing you've proven with this reply is that you think you're clever, but really, you are not thinking, period.


And if a human failed on this question, that's because they weren't paying attention and made the same pattern matching mistake. But we're not paying the LLM to pattern match, we're paying them to answer correctly. Humans can think.


“paying the LLM”??


I use the Monty Hall problem to test people in two steps. The second step is, after we discuss it and come up with a framing that they can understand, can they then explain it to a third person. The third person rarely understands, and the process of the explanation reveals how shallow the understanding of the second person is. The shallowest understanding of any similar process that I've usually experienced is an LLM.


I am not sure how good your test really is. Or at least how high your bar is.

Paul Erdös was told about this problem with multiple explanations and just rejected the answer. He could not believe it until they ran a simulation.


In my experience, as Harvard outlined long ago, the two main issues with decision making are frame blindness (don't consider enough other ways of thinking about the issue) and non-rigorous frame choice (jumping to conclusions).

But an even more fundamental cause, as a teacher, is that I often find seemingly different frames to both simply be misunderstood, not understood and rejected. I learned by trying many ways of presenting what I thought the best frame was. So I learned that "explanations" may be received primarily as noise, with "What is actually being said" being replaced with, incorrectly, by "What I think you probably mean". Whenever someone replies "okay" to a yes or no comment/statement, I find they have always misunderstood the statement, and learned how often people will attempt to move forwards without understanding where they are.

And if multiple explanations are just restatings of the same frame (as is common in casual arguments), it's impossible to compare frames, because only one is being presented.. It's the old "if you think aren't making any mistakes, that's another mistake".

Often, a faulty frame clears up both what is wrong with another frame, as well as leading to a best frame. I usually find the most fundamental frame is the most useful.

For example, I found many Reddit forums discussing a problem with selecting the choice of audio output (speaker) on Fire TV Sticks. If you go through the initial setup, sometimes it will give you a choice (first level of flow chart), but often not the next level choice, which you need. And setup will not continue. Then it turned out that old remotes and new remotes had the volume buttons in a different location, and there were two sets of what looked like volume buttons. When you pressed the actual volume buttons, everything worked normally. When you pressed the up/down arrows where the old volume buttons had been, you had to restart setup many times.

The correct framing of the problem was "Volume buttons are now on the left, not the right". It was not a software setup issue. Or wondering why you're key doesn't work, but you're at the wrong car. Or it's not a problem with your starter motor, you're out of gas. Etc.


I don't know who Paul Erdös is, so this isn't useful information without considering why they rejected the answer and what counterarguments were provided. It is an unintuitive problem space to consider when approaching it as a simple probability problem, and not one where revealing new context changes the odds.


Erdös published more papers than any other mathematician in history—and collaborated with more than 500 coauthors, giving rise to the concept of the "Erdős number," a (playful) measure of collaborative proximity among mathematicians


It's quite boring to listen to people praising AI (worshipping it, putting it on a pedastal, etc). Those who best understand the potential of it aren't doing that. Instead they're talking about various specific things that are good or bad, and they don't go out of the way to lick AI's boots, but when they're asked they acknowledge that they're fans of AI or bullish on it. You're probably misreading a lot of resistance & skepticism on HN.


> I'm puzzled (but not surprised) by the standard HN resistance & skepticism.

It happens with many technological advancements historically. And in this case there are people trying hard to manufacture outrage about LLMs.


Regardless of stance, I sure do hate being gaslit on how I'm supposed to think of content on any given topic. A disagreeable point of view is not equivalent to "manufacturing outrage".


Yeah, I've been a game-dev forever and had never built a web-app in my life (even in college) I recently completed my 1st web-app contract, and gpt was my teacher. I have no problem asking stupid questions, tbh asking stupid questions is a sign of intelligence imo. But where is there to even ask these days? Stack Overflow may as well not exist.


Right on. A sign of intelligence but more importantly of bravery, and generosity. A person that asks good questions in a class improves the class drastically, and usually learns more effectively than other students in the class.


>Stack Overflow may as well not exist.

That mentality seems to be more to reinforce your insistance on ChatGPT, rather than an inquiry of communities to help you out.


> But where is there to even ask these days?

Stack overflow?

The IRC, Matrix or slack chats for the languages?


People like that never wanted to interact with anyone to begin with. And somehow they were too lazy to google the decades of articles until ChatGPT came in to save their lives.


The freedom to ask "dumb" questions without judgment is huge, and it's something even the best classrooms struggle to provide consistently


I sometimes intentionally ask naive questions, eve if I think I alredy know the answer. Sometimes the naive question provokes a revealing answer that I have not even considered. Asking naive questions is a learning hack!


I know some Spanish - close to B1. I find ChatGPT to be a much better way to study than the standard language apps. I can create custom lessons, ask questions about language nuances etc. I can also have it speak the sentences and practice pronunciation.


20 years ago I used to hang out in IRC channels where I learnt so much. I wasn't afraid of asking stupid questions. These bots are pale imitation of that.

I've learnt a great many things online, but I've also learnt a great many more from books, other people and my own experience. You just have to be selective. Some online tutorials are excellent, for example the Golang and Rust tutorials. But for other things books are better.

What you are missing is the people. We used to have IRC and forums where you could discuss things in great depth. Now that's gone and the web is owned by big tech and governments you're happy to accept a bot instead. It's sad really.


> Learning something online 5 years ago often involved trawling incorrect, outdated or hostile content

What's funny is tha LLMs got trained on datasets that includes all that incorrect, outdated or hostile content.


> Should we trust the information at face value without verifying from other sources? Of course not, that's part of the learning process.

It mostly isn't, the point of the good learning process is to invest time into verifying "once" and then add verified facts to the learning material so that learners can spend that time learning the material instead of verifying everything again.

Learning to verify is also important, but it's a different skill that doesn't need to be practiced literally every time you learn something else.

Otherwise you significantly increase the costs of the learning process.


> An underrated quality of LLMs as study partner is that you can ask "stupid" questions without fear of embarrassment.

Not underrated at all. Lots of people were happy to abandon Stack Overflow for this exact reason.

> Adding in a mode that doesn't just dump an answer but works to take you through the material step-by-step is magical

I'd be curious to know how much this significantly differs from just a custom academically minded GPT with an appropriately tuned system prompt.

https://chatgpt.com/gpts


>Should we trust the information at face value without verifying from other sources? Of course not, that's part of the learning process. Will some (most?) people rely on it lazily without using it effectively? Certainly, and this technology won't help or hinder them any more than a good old fashioned textbook.

Not true if we make the assumption that most books from publishing houses with good reputation are verified for errors. Good books maybe dated but they don't contain made up things.


Skepticism is great, it means less competition. I'm forcing everyone around me to use it.


>Learning something online 5 years ago often involved trawling incorrect, outdated or hostile content and attempting to piece together mental models without the chance to receive immediate feedback on intuition or ask follow up questions. This is leaps and bounds ahead of that experience.

Researching online properly requires cross referencing, seeing different approaches, and understanding various strenghts, weaknesses, and biases among such sources.

And that's for objective information, like math and science. I thought Grok's uhh... "update" shows enough of the dangers when we resort to a billionaire controlled oracle as a authoritative resource.

>Will some (most?) people rely on it lazily without using it effectively? Certainly, and this technology won't help or hinder them any more than a good old fashioned textbook.

I don't think facilitating bad habits like lazy study is an effective argument.And I don't really subscribe to this ineviability angle either: https://tomrenner.com/posts/llm-inevitabilism/


A lot of the comments have to do with how does one use these things to speed up learning. I've tried a few things. A couple of them are prompts: 1. Make me a tutorial on ... 2. Make probes to quiz me along the way ...

I think the trick is to look at the references that the model shows you. e.g. o3 with web search will give you lots of references. 90% of the time just reading those tells me of the model and I are aligned.

For example the other day I was figuring out why using SQL alchemy Sessions and PyTest async might I get the "Connection was attached to different loop" error. Now If you started using o3 to give you a solution you would take a long time because there would be small mistakes it would make in the code and You would spend a lot of time trying to fix it. Better way to use 03 then was to ask it to give you debugging statements (session listeners attached to Sqlalchemy sessions) and understand by reading code output, what was going on. Much faster.

Once it(and I) started looking at the debugging statements the error became clear: the session/connections where leaking to different event loop, a loop_scope= param needed to be specified for all fixtures. O3 did not provide a correct solution for the code but I could, but it's help.was crucial in writing a fuck ton of debugging code and getting clues.

I also asked o3 to make a bunch of probe questions to test me, for example it said something like: try changing the loop_scope module to function, what do you expect the loopid and transaction id to be for this test?

I learned More than I realized about ORMs and how it can be used to structure transactions and structuring async PyTest tests.

One thing I'm trying these days is to have it create a memory palace from all the stuff I have in my house and link it to a new concept I'm learning and put it into an anki decks.


Firstly, I think skepticism is a healthy trait. It's OK to be a skeptic. I'm glad there are a lot of skeptics because skepticism is the foundation of inquiry, including scientific inquiry. What if it's not actually Zeus throwing those lightning bolts at us? What if the heliocentric model is correct? What if you actually can't get AIDS by hugging someone who's HIV positive? All great questions, all in opposition to the conventional (and in some cases "expert") wisdom of their time.

Now in regards to LLMs, I use them almost every day, so does my team, and I also do a bit of postmortem and reflection on what was accomplished with them. So, skeptical in some regards, but certainly not behaving like a Luddite.

The main issue I have with all the proselytization about them, is that I think people compare getting answers from an LLM to getting answers from Google circa 2022-present. Everyone became so used to just asking Google questions, and then Google started getting worse every year; we have pretty solid evidence that Google's results have deteriorated significantly over time. So I think that when people say the LLM is amazing for getting info, they're comparing it to a low baseline. Yeah maybe the LLM's periodically incorrect answers are better than Google - but are you sure they're not better than just RTFM'ing? (Obviously, it all depends on the inquiry.)

The second, related issue I have is that we are starting to see evidence that the LLM inspires more trust than it deserves due to its humanlike interface. I recently started to track how often Github Copilot gives me a bad or wrong answer, and it's at least 50% of the time. It "feels" great though because I can tell it that it's wrong, give it half the answer, and then it often completes the rest and is very polite and nice in the process. So is this really a productivity win or is it just good feels? There was a study posted on HN recently where they found the LLM actually decreases the productivity of an expert developer.

So I mean I'll continue to use this thing but I'll also continue to be a skeptic, and this also feels like kinda where my head was with Meta's social media products 10 years ago, before I eventually realized the best thing for my mental health was to delete all of them. I don't question the potential of the tech, but I do question the direction that Big Tech may take it, because they're literal repeat offenders at this point.


>So is this really a productivity win or is it just good feels?

Fairly recent study on this: LLM's made developers slightly less productive, but the developers themselves felt more productive with them: https://www.theregister.com/2025/07/11/ai_code_tools_slow_do...

There is definitely this pain point that some people talk about (even in this thread) on how "well at least AI doesn't berate me or reject my answer for bureaucratic reasons". And I find that intriguing in a community like this. Even some extremely techy people (or especially?) just something just want to at best feel respected, or at worst want to have their own notions confirmed by someone they deem to be "smart".

>I don't question the potential of the tech, but I do question the direction that Big Tech may take it, because they're literal repeat offenders at this point.

And that indeed is my biggest reservation here. Even if AI can do great things, I don't trust the incentive models OpenAI has. Instead of potentially being this bastion of knowledge, it may be yet another vector of trying to sell you ads and steal your data. My BOTD is long gone now.


Yeah I mean at this point, the tech industry is not new, nor is its playbook. At least within B2C, sooner or later everything seems to degenerate into an adtech model. I think it's because the marginal cost of software distribution is so low - you may as well give it away for free all the way up to the 8 billion population cap, and then monetize them once they're hooked, which inevitably seems to mean showing them ads, reselling what you know about them, or both.

What I have seen nobody come even NEAR to talking about is, why would OpenAI not follow this exact same direction? Sooner or later they will.

Things might pan out differently if you're a business - OpenAI already doesn't train its models on enterprise accounts, I imagine enterprise will take a dim view to being shown ads constantly as well, but who knows.

But B2C will be a cesspit. Just like it always ends up a cesspit.


> Certainly, and this technology won't help or hinder them any more than a good old fashioned textbook.

Except that the textbook was probably QA’d by a human for accuracy (at least any intro college textbook, more specialized texts may not have).

Matters less when you have background in the subject (which is why it’s often okay to use LLMs as a search replacement) but it’s nice not having a voice in the back of your head saying “yeah, but what if this is all nonsense”.


> Except that the textbook was probably QA’d by a human for accuracy

Maybe it was not when printed in the first edition, but at least it was the same content shown to hundreds of people rather than something uniquely crafted for you.

The many eyes looking at it will catch it and course correct, while the LLM output does not get the benefit of the error correction algorithm because someone who knows the answer probably won't ask and check it.

I feel this way about reading maps vs following GPS navigation, the fact that Google asked me to take an exit here as a short-cut feels like it might trying to solve the Braess' paradox in real time.

I wonder if this route was made for me to avoid my car adding to some congestion somewhere and whether if that actually benefits me or just the people already stuck in that road.


There is no skepticism. LLMs are fundamentally lossy and as a result they’ll always give some wrong result/response somewhere. If they are connected to a data source, this can reduce the error rate but not eliminate it.

I use LLMs but only for things that I have a good understanding of.


I think both sides seem to have the same issues with the other. One side is sceptical that the other is getting good use from LLMs, and the other suggests they're just not using it correctly.

Both sides think the other is either exaggerating or just not using the tool correctly.

What both sides should do is show evidence in the form of chat extracts or videos. There are a number from the pro-LLM side, but obviously selection bias applies here. It would be interesting if the anti-LLM side started to post more negative examples (real chat extracts or videos).


It's not just "stupid" questions.

In my experience, most educational resources are either slightly too basic or slightly too advanced, particularly when you're trying to understand some new and unfamiliar concept. Lecturers, Youtubers and textbook authors have to make something that works for everybody, which means they might omit information you don't yet know while teaching you things you already understand. This is where LLMs shine, if there's a particular gap in your knowledge, LLMs can help you fill it, getting you unstuck.


> we have access to incredible tools like this

At what cost? Are you considering all the externalities? What do you think will happen when Altman (and their investors) decides to start collecting their paychecks?


>I'm puzzled (but not surprised) by the standard HN resistance & skepticism

Thinking back, I believe the change from enthusiasm to misanthropy (mis[ai]thropy?) happened around the time, and in increasing proportion to, it became a viable replacement for some of the labor performed by software devs.

Before that, the tone was more like "The fact is, if 80% of your job or 80% of its quality can be automated, it shouldn't be a job anymore."


I think it's just that there's been enough time and the magic has worn off. People used it enough now and everybody has made their experiences. They initially were so transfixed that they didn't question the responses. Now people are doing that more often, and realising that likelihood of cooccurrence isn't a good measure for factuality. We've realised that the number of human jobs where it can reach 8%, let alone 80% of quality, is vanishingly small.


I am just surprised they used an example requiring calculation/math. In the field the results are very much mixed. Otherwise it of course is a big help.

Knowing myself it perhaps wasn't that bad that I didn't have such tools, depends on the topic. I couldn't imagine ever writing a thesis without an LLM anymore.


There might not be any stupid questions, but there's plenty of perfectly confident stupid answers.

https://www.reddit.com/r/LibreWolf/s/Wqc8XGKT5h


Yeah, this is why wikipedia is not a good resource and nobody should use it. Also why google is not a good resource, anybody can make a website.

You should only trust going into a library and reading stuff from microfilm. That's the only real way people should be learning.

/s


So, do you want to actually have a conversation comparing ChatGPT to Google and Wikipedia, or do you just want to strawman typical AI astroturfing arguments with no regard to the context above?

Ironic as you are answering someone who talked about correcting a human who blindly pasted an answer to their question with no human verification.


> So, do you want to actually have a conversation comparing ChatGPT to Google and Wikipedia, or do you just want to strawman typical AI astroturfing arguments with no regard to the context above?

Dunno about the person you're replying to (especially given the irony re that linked reddit thread), but I would like to actually have a conversation (or even just a link to someone else's results) comparing ChatGPT to Google and Wikipedia.

I've met people who were proudly, and literally, astroturfing Wikipedia for SEO reasons. Wikipedia took a very long time to get close to reliable, editors now requiring citations for claims etc., and I still sometimes notice pairs of pages making mutually incompatible claims about the same thing but don't have the time to find out which was correct.

Google was pretty reliable for a bit, but for a while now the reliability of its results has been the butt of jokes.

That doesn't mean any criticisms of LLMs are incorrect! Many things can all be wrong, and indeed are. Including microfilm and books and newspapers of record. But I think it is fair to compare them — even though they're all very different, they're similar enough to be worth comparing.


>Wikipedia took a very long time to get close to reliable,

And that's a good thing to remember. Always be skeptical and know the strengths and weaknesses of your sources. Teachers taught me (and maybe you) to be skeptical and not use Wikipedia as a citation for a reason. Even today, it is horrible for covering current events, and recent historical opinions can massively fluctuate. That isn't me dismissing Wikipedia as a whole, nor saying it has no potential.

>Google was pretty reliable for a bit, but for a while now the reliability of its results has been the butt of jokes.

Yes, more reason to be scrutinous. It's a bit unfortunate how oftentimes it's the 3-5th result that is more reliable than the first SEO optimized slop that won the race. Not unless I am using very specific queries.

---

Now let's consider these chat bots. There's no sense of editorial overview, they are not deterministic, and they are known to constantly hallucinate instead of admit ignorance. There does not seem to be any real initiative to fix such behavior, but instead ignore it and dismiss it as "the tech will get better".

Meanwhile, we saw the most blatant piece of abuse last week when Grok was update, to show that these are not some impartial machines simply synthesizing existing information. They can be tweak to private estate's whims the same way a search algorithm or biased astroturfer can do with the other two subjects of comparison. There's clear flaws and no desire nor push to really fix them; simply casting it off as a bug to fix instead of a societal letdown it should be viewed as.


Mm. Generally agree.

Unfortunately, I have to disagree about this part:

> There's clear flaws and no desire nor push to really fix them

All those private estate's whims? Those are the visible outcomes of the pushes to "fix" them. Sure, "fix" has scare quotes for good reason, but it is the attempt.

Also visible with the performance increases. One of the earlier models I played with, got confused half way through about which language it was supposed to be using, flipping suddenly and for no apparent reason from JS to python.

I try to set my expectations at around the level of "enthusiastic recent graduate who has yet to learn they can't fix everything and wants to please their boss". Crucially for this: an individual graduate, so no 3rd party editor to do any editorial overview. The "reasoning" models try to mimic self-editorial, but it's a fairly cheesy process of replacing the first n ~= 10 "stop" tokens with the token "wait", or something close to that, and it's a case of the trope "reality isn't realistic" that this even works at all.


>All those private estate's whims?

To do the least amount of work and get the most profit? I'd say so.

I should probably specify this better: private AI companies do not want to

1) find efficient solutions, over brute forcing models with more data (data that is dubiously claimed as of now) and larger data enters. Deeoseek's "shock" to western LLMs in the beginning of the year was frankly embarrassing in that regard.

2) introduce any transparency to their weights and models. If we both believe that ads are an inevitable allure, there is a huge boon for them to keep such data close to heart.

>try to set my expectations at around the level of "enthusiastic recent graduate who has yet to learn they can't fix everything and wants to please their boss"

I sure do wish the industry itself had such expectations. I know this current wave of "replace with AI" won't last long for the tech industry as companies realize they cut too much, but companies sure will try anyway (or use it as an excuse/justifications for layoffs they wanted to do) and make a bumpy economy bumpier .


Ah yes, the thing that told people to administer insulin to someone experiencing hypoglycemia (likely fatal BTW) is nothing like a library or Google search, because people blindly believe the output because of the breathless hype.

See Dunning-Kruger.


See 4chan during the "crowd wisdom" hype era.


Yeah. I’ll take this over the “you’re doing it wrong” condescension of comp.lang.lisp, or the Debian mailing list. Don’t even get me started on the systemd channels back in the day.

On the flip, I prefer the human touch of the Kotlin, Python, and Elixir channels.


>Learning something online 5 years ago often involved trawling incorrect, outdated or hostile content

Leanring what is like that? MIT open courseware has been available for like 10 years with anything you could want to learn in college

Textbooks are all easily pirated


> Should we trust the information at face value without verifying from other sources? Of course not, that's part of the learning process.

People who are learning a new topic are precisely the people least able to do this.

A friend of mine used chatgpt to try to learn calculus. It gave her an example...with constants changed in such a way that the problem was completely different (in the way that 1/x^2 is a totally different integration problem than 1/(x^2 + 1)). It then proceeded to work the problem incorrectly (ironically enough, in exactly the way that I'd expect a calculus student who doesn't really understand algebra to do it incorrectly), produced a wrong answer, and merrily went on to explain to her how to arrive at that wrong answer.

The last time I tried to use an LLM to analyze a question I didn't know the answer to (analyze a list of states to which I couldn't detect an obvious pattern), it gave me an incorrect answer that (a) did not apply to six of the listed states, (b) DID apply to six states that were NOT listed, even though I asked it for an exclusive property, (c) miscounted the elements of the list, and (d) provided no less than eight consecutive completely-false explanations on followup, only four of which it caught itself, before finally giving up.

I'm all for expanding your horizons and having new interfaces to information, but reliability is especially important when you're learning (because otherwise you build on broken foundations). If it fails at problems this simple, I certainly don't trust it to teach me anything in fields where I can't easily dissect bullshit. In principle, I don't think it's impossible for AI to get there; in practice, it doesn't seem to be.


Another quality is that everything is written. To me having a text support to discuss and the discussion recorded in text format is one of the strongest support someone can get when learning.


> Learning something online 5 years ago often involved trawling incorrect, outdated or hostile content

Also using OpenAI as a tutor means trawling incorrect content.


I'd share a little bit experience about learning from human teachers.

Here in my country, English is not you'll hear in everyday conversation. Native English speakers account to a tiny percentage of population. Our language doesn't resemble English at all. However, English is a required subject in our mandatory education system. I believe this situation is quite typical across many Asian countries.

As you might imagine, most English teachers in public schools are not native speakers. And they, just like other language learners, make mistakes that native speakers won't make without even realizing what's wrong. This creates a cycle enforcing non-standard English pragmatics in the classroom.

Teachers are not to blame. Becoming fluent and proficient enough in a second language to handle questions students spontaneously throw to you takes years, if not decades of immersion. It's an unrealistic expectation for an average public school teacher.

The result is rich parents either send their kids to private schools or have extra classes taught by native speakers after school. Poorer but smart kids realize the education system is broken and learn their second language from Youtube.

-

What's my point?

When it comes to math/science, in my experience, the current LLMs act similarly to the teachers in public school mentioned above. And they're worse in history/economics. If you're familiar with the subject already, it's easy to spot LLM's errors and gather the useful bits from their blather. But if you're just a student, it can easily become a case of blind-leading-the-blind.

It doesn't make LLMs completely useless in learning (just like I won't call public school teachers 'completely useless', that's rude!). But I believe in the current form they should only play a rather minor role in the student's learning journey.


HN’s fear is the same job security fear we’ve been seen since the beginning of all this. You’ll see this on programming subs on Reddit as well.


Can we not criticize tech without being considered luddites anymore? I don't fear for my job over AI replacement, it is just fundamentally wrong on many answers.

In my field there is also the moral/legal implications of generative AI.


on hn i find most people here to be high iq low eq

high iq enough that they really find holes in the capabilities of LLMs in their industries

low eq enough that they only interpret it on their own experiences instead of seeing how other people's quality of life have improved


Spot on. You can even ask the LLM to ground itself with the content you provide it.


> A tireless, capable, well-versed assistant

Correction: a tireless, capable, well-versed, sycophantic assistant that is often prone to inventing absolute bullshit.

> ...is an autodidact's dream

Not so sure about that, see above.


It does go both ways. You can ask stupid questions without fear of embarrassment or ruined reputation, and it can respond with stupid answers without fear of embarrassment or ruined reputation.

It can confidently spew completely wrong information and there's no way to tell when it's doing that. There's a real risk that it will teach you a complete lie based on how it "thinks" something should work, and unlearning that lie will be much harder than just learning the truth initially


> Learning something online 5 years ago often involved trawling incorrect, outdated or hostile content

... your "AI" is also trained on the above incorrect, outdated or hostile content ...


Agrée, it would have been a godsend for those of us who were not as fast as the other and were eventually left over in usual schooling system.

Beside there isn’t any of the usual drawback with privacy because no one care if OpenAI learn about some bullshit you were told to learn.


>Beside there isn’t any of the usual drawback with privacy because no one care if OpenAI learn about some bullshit you were told to learn

you didn't see the Hacker News threat talking about the ChatGPT subpeona, did you? I was a bit shocked that 1) a tech community didn't think a company would store data you submit to their servers and 2) that they felt like some lawyers and judges reading their chat logs was some intimate invasion of privacy.

Let's just say I certainly cannot be arsed to read anyone else's stream of conscious without being paid like a lawyer. I deal with kids and it's a bit cute when they babble about semi-coherent topics. An adult clearly loses that cute appeal and just sounds like a madman.

That's not even some dig, I sure suck at explaining my mindspace too. It's a genuinely hard skill to convert thoughts to interesting, or even sensible, communication.


This is a dream I agree. Detractors are always left behind.


> An underrated quality of LLMs as study partner is that you can ask "stupid" questions without fear of embarrassment.

Even more important for me, as someone who did ask questions but less and less over time, is this: with GPTs I no longer have to the see passive-aggressive banner saying

> This question exists for historical reasons, not because it’s a good question."

all the time on other peoples questions, and typically on the best questions with the most useful answers there were.

As much as I have mixed feelings about where AI is heading, I’ll say this: I’m genuinely relieved I don’t need to rely on Stack Overflow anymore.

It is also deeply ironic how stackoverflow alienated a lot of users in the name of inclusion (the Monica case) but all the time they themselves were the ones who really made people like me uncomfortable.


Fear causes defensive behavior.




Consider applying for YC's Winter 2026 batch! Applications are open till Nov 10

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: