So one thing to note is that the essay mentions that it refers specifically to "writing that is used to develop ideas" vs. "writing meant to describe others ideas".
The way I interpret this is that it refers to claims that build on each other to come to a conclusion. So the way to test for truth is to somehow test each claim and the conclusion, which could vary in difficulty based on the kind of claims being made.
As this essay exemplifies, it is difficult to test for truth if you make broad claims that are so imprecise that they can't be verified or don't tell you anything interesting when verified using reasonable assumptions.
The issue with this article is that it is very imprecise.
Are the standards for whether something “sounds bad” based on the average person’s reading or the intended audience.
In its most general form (how the median article sounds to the median person), the argument is pretty vacuous.
Most writing discusses simple ideas and they should sound good (familiar, easy, pleasurable) to the median person.
But the most valuable kind of writing could sound tedious and filled with incomprehensible terminology to the median person but concise and interesting to the intended audience.
The current way the idea is stated doesn’t sound correct because you can convincingly defend all 4 quadrants of the truth table.
> Are the standards for whether something “sounds bad” based on the average person’s reading or the intended audience.
As pg describes it in the article, it's neither; it's based on the writer's judgment. The writer of course is writing for some intended audience, and their judgment of what sounds good or sounds bad should be influenced by that. But pg is describing the writer's process of judging what they write.
> The reason is that it makes the essay easier to read. It's less work to read writing that flows well. How does that help the writer? Because the writer is the first reader
Note that the writer's judgement only serves as an initial proxy for how well the essay reads. This implies that the reader, whoever that is, is the true judge of how well it reads. My point is that that group is ill defined.
If it were sufficient for the writer to be the only judge of how well something reads, surely PG wouldn't feel the need to have other proofread his essays. And surely it is not sufficient for someone who lacks taste to judge their own writing as good.
The way I read that statement is the same as the startup advice of "build what you would yourself want". However you still have to validate that the market exists and is big.
There is really nothing profound in that paragraph anyway, all it is saying is that a writer should edit and proofread their work. That whole paragraph could be deleted honestly. It is obvious table stakes for one to edit their work. What differentiates good from bad is a matter of taste + who is judging it.
Thanks. The way you describe the topic, a dimension is missing in the article: who am I writing for?
Related: I think pg would benefit from graphics here and there. Creating visuals like the 2x2 matrix you describe help tremendously to make ideas more comprehensible.
I'm curious how effective these models would be at recognizing if the input video was ai generated or heavily manipulated. Also various things around face/object segmentation.
Doesn't everything just get tweaked in whatever direction the back-propagation derivative says and proportionally to that "slope"? In other words, simply by having back-propagation system in effect there's never any question about which way to adjust the weights, right?
Both RAG and infinite contexts in their current states are hacks.
Both waste compute because you have to re-encode things as text each time and RAG needs a lot of heuristics + a separate embedding model.
Instead, it makes a lot more sense to pre-compute KV for each document, then compute values for each query. Only surfacing values when the attention score is high enough.
The challenge here is to encode global position information in the surfaced values and to get them to work with generation. I suspect it can't be done out of the box but we it will work with training.
This approach has echoes of both infinite context length and RAG but is an intermediate method that can be parallelized and is more efficient than either one.
Sorry for the late response. I must be misunderstanding your comment. I read your comment as "RAG doesn't pre-compute KV for each document, which is inefficient". With RAG, you convert your text into vectors and then store them in a DB — this is the pre-compute. Then you just need to compute the vector of your query, and search for vector similarity. So it seems to me like RAG doesn't suffer from inefficiency you were saying it suffers from.
No, you've only discussed the Retrieval part of RAG, not the generation part.
The current workflow is to use the embedding to retrieve documents then dump the text corresponding to the embedding into the LLM context for generation.
Often, the embedding is from a different model from the LLM and it is not compatible with the generation part.
So yea, RAG does not pre-compute the KV for each document.
Teaching LLMs how to search is probably going to be key to make them hallucinate far less. Most RAG approaches currently use simple vector searches to pull out information. Chat GPT actually is able to run Bing searches. And presumably Gemini uses Google's search. It's fairly clunky and unsophisticated currently.
These searches are still relatively dumb. With LLMs not being half bad at remembering a lot of things, programming simple solutions to problems, etc. a next step could be to make them come up with a query plan to retrieve the information they need to answer a question that is more sophisticated than just calculating a vector for the input, fetching n results and adding those to the context, and calling it a day.
Our ability to Google solutions to problems is inferior to that of an LLM able to generate far more sophisticated, comprehensive, and exhaustive queries against a wide range of databases and sources and filter through the massive amount of information that comes back. We could do it manually but it would take ages. We don't actually need LLMs to know everything there is to know. We just need them be able to know where to look and evaluate what they find in context. Sticking to what they find rather than what they know means their answers are as good as their ability to extract, filter and rank information that is factual and reputable. That means hallucination becomes less of a problem because it can all be tracked back to what they found. We can train them to ask better questions rather than hallucinate better answers.
Having done a lot of traditional search related stuff in the past 20 years, I got really excited about RAG when I first read about it because I realized two things: most people don't actually know a lot but they can learn how to find out (e.g. Googling stuff). And, learning how to find stuff isn't actually that hard.
Most people that use Google don't have a clue how it works. LLMs are actually well equipped to come up with solid plans for finding stuff. They can program, they know about different sources of information and how to access them. They can actually pick apart documentation written for humans and use that to write programs, etc. In other words, giving LLMs better search, which is something I know a bit about, is going to enable them to give better, more balanced answers. We've seen nothing yet.
What I like about this is that it doesn't require a lot of mystical stuff by people who arguably barely understand the emergent properties of LLMs even today. It just requires more system thinking. Smaller LLMs trained to search rather than to know might be better than a bloated know-it-all blob of neurons with the collective knowledge of the world compressed into it. The combination might be really good of course. It would be able to hallucinate theories and then conduct the research needed to validate them.
One big problem is that we've build search for humans, more specifically to advertise to them.
AI doesn't need a human search, it needs a "fact database" that can pull short factoids with a truth value, which could be a distribution based on human input. So for example, you might have the factoid "Donald Trump incited insurrection on January 6th" with a score of 0.8 (out of 1) with a 0.3 variance since people either tend to absolutely believe it or disbelieve it, with more people on the believing side.
Beyond that AI needs a "logical tools" database with short examples of their use that it can pull from for any given problem.
The way I interpret this is that it refers to claims that build on each other to come to a conclusion. So the way to test for truth is to somehow test each claim and the conclusion, which could vary in difficulty based on the kind of claims being made.
As this essay exemplifies, it is difficult to test for truth if you make broad claims that are so imprecise that they can't be verified or don't tell you anything interesting when verified using reasonable assumptions.
reply