Hacker News new | past | comments | ask | show | jobs | submit | mdp2021's comments login

We consider LLMs an "intuition" machine that can talk and partially understand, and do not let it retrieve information from its faulty memory and will, but force it to use an implemented memex to produce any output.

"Tell us about X (X='PS/2 Models'); here are your encyclopedias: extract and formulate".


If you were to actually try that you'd know that approach doesn't really work either. Or rather, it's not the silver bullet you hope it is. If you still think that, go ahead and implement it. That's literally the main "output quality" struggle all AI providers are in.

If you're just building a chatbot (like a pure ChatGPT/Claude interface-like) you risk massively increasing your latency and degrading your overall result quality for an attempt to improve a small scenario here or there.

Seriously, try it. Take any "Tell us about X" prompt you like. Try it as-is with an LLM, then try it with + "; here are your encyclopedias: extract and formulate"

I guarantee you that 99 times out of 100, the LLM will always reach out to the encyclopedia. The existing encyclopedia doesn't have a great LLM-like search interface that's able to find the most relevant parts to the LLM's query about X. In fact, you're building that part if I'm not mistaken. If you expect the encyclopedia to have that great search functionality that the LLM could use to always find the most relevant information about X, then you just pushed the problem one layer down. Someone will actually eventually have to tackle it.

You can also see this in both ChatGPT and Claude outputs. Every now and then they will push a change to make it "more reliable" which basically makes it more likely to search the internet before answering a question. Which also happens to be more likely to skew its output based on SEO, current popular news and other nonesense.

While nonscientific, I experience this everytime ChatGPT or Claude decide to do a web search instead of just answering the question. Ask it "I like tv show X, suggest tv shows like that" or "I like product X, suggest a similar product". If it uses the internet to search, it's a summary of the top gamed SEO results. Just whatever is popular atm, or whatever has commission links. Ask it not to use the internet and the result is surprisingly less.... "viral, SEO optimized, trended recently" type content.


You are misunderstanding the proposed frame: implementations may be faulty, but the approach remains necessary for "LLMs as informers". I.e., the answer provider should only work vis-a-vis documentation founding the output.

This implies that if we do not have good enough ways to retrieve information from repositories, we will have to invent them. Because the "LLM as informer" can only be allowed to formulate what it will find through the memex.

It is possible that to that aim, LLMs can not be directly implemented as they are in the current general state.

Also the problem of information reliability has to be tackled, in order to build such system (e.g. some sources rank higher).

It is not a solved problem, but it is a clear one. In mission critical applications, you would not even allow asking John at the nearby desk for information he may confuse.


For example, in the times of "lectures", where transmitted information was literally read (as the term says) in real time from the source to the public.

But in general, the (mis-)information that spinach could contain so much iron to be interchangeable with nails had to be a typo so rare that it would become anecdotal and generate cultural phenomena like Popeye.


I understand from the page: the "book of "secret" [IT] knowledge [sources]", where "secret" could mean that you are invited to contribute and "share your secrets"...

Which local models are you using, that do not output loop garbage at temperature 0?

What do you get at very low temperature values instead of 0?


> Which local models are you using, that do not output loop garbage at temperature 0?

All of them. I make my own frontends using llama-cpp. Quality goes up with temperature 0 and loops are rare.

The temperature setting isn't for improving quality, it's to not break your suspension of disbelief that you're talking to an intelligent entity.


> All of them

You must be using recent (or just different) models than those I tried. Mine returned garbage easily at temperature 0. (But unfortunately, I cannot try and report from there.)

This (LLM behaviour and benchmarking at low or 0 temperature value) should be a topic to investigate.


Probably a bug in the code you ran somewhere.

> LLMs as a replacement for search

Some people expect LLMs as part of a better "search".

LLMs should be integrated to search, as a natural application: search results can heavily depend on happy phrasing, search engines work through sparse keywords, and LLMs allow to use structured natural language (not "foo bar baz" but "Which foo did a bar baz?" - which should be resistant to terms variation and exclude different semantics related to those otherwise sparse terms).

But it has to be done properly - understand the question, find material, verify the material, produce a draft reply, verify the draft vis-a-vis the material, maybe iterate...


DuckDuckGo Ai assist is going in the right direction, imo. It will pull info from wikipedia, use math and map tools plus other web sources that has been mostly accurate for me on the search page.

The chat option uses gpt-4o with web search and was able to provide links to colonial map resources I was curious about after falling down that rabbit hole. It also gave me general (& proper) present day map links to the places I was looking for in the map sites I asked for.

It did get confused a few times when I was trying to get present day names of old places I had forgot; like Charles River in Va that it kept trying to send me to Boston or Charles City Co on the James river and told me to look for it around there...

The York river wiki page clearly says it was once Charles River. Maybe I wasn't asking the right questions. For more unique things it was pretty helpful thou and saved the endless searching w/ 100 tabs adventure


Repent.

You are not there to "love what gives you the kicks". That's a kind of love that should not exit the bedroom (better, the bathroom).


> The simple ...

No, improper phrasing. Correct disclaimer is, "The below engine is structurally unreliable".

--

Comment, snipers. We cannot reply to unclear noise.


Professional creatives do measure their intuitions against a number of constraints...

> informative

That Google uses a faulty assistant in the page is actually informative, not just for people who do not use that search engine, but for those attentive to the progresses in the area - where Google has given massive hits recently.

> constructive

The - extremely damaging - replacement of experts with "employees wielding an LLM" is ongoing. Some of us have been told nonsense by remote support service staff...


While you argue that showcasing a 'faulty assistant' like Google's is 'informative', particularly for those tracking AI progress, the typical LLM-got-it-wrong post often doesn't provide that deeper insight. It usually presents an isolated error without context or analysis of the system's architecture, training data limitations, or the specific type of reasoning failure. This makes its informative value quite shallow, quickly becoming repetitive rather than truly enlightening about 'progresses in the area' beyond the surface-level observation that LLMs are imperfect.

Regarding the 'constructive' aspect and the 'damaging replacement of experts,' I agree this is a critical concern. However, the genre of simply posting screenshots of LLM errors is rarely constructive in addressing this complex socio-technical issue. It highlights a symptom (LLMs making mistakes) but typically fails to constructively engage with the causes or potential solutions for deskilling, corporate responsibility in AI deployment, or the nuances of human-AI collaboration. True constructive engagement would require more than just pointing out a wrong answer; it would demand analysis, discussion of best practices, or calls for better system design and oversight, which this genre seldom provides.


Right. But simply, raising awareness helps fighting the "nurses as cheap doctors, random people with a script as a greater bargain" phenomenon.

And for what the progresses in LLMs are concerned¹, it seems evident a revolution is required - and when the key (to surpass intuition towards process, dream towards wake) will be found it will become evident.

(¹Before I was mentioning «progresses» in general - as in, "they give us Veo3 and yet Banged Inthehead at the search pages"?!)


And a rational interlocutor, besides conversational shortcuts, replies in the form "The best estimations from sources like S0, S1 and S2, publish a value between V1 and V2".

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: