Once I kept refreshing and finally got an English question, it asked me to act like a Linux terminal, and issues pwd, ls, then cd over and over until I gave up. The concept is funny, where I get to act like CrapGPT, but it needs to not get stuck asking the same thing over and over.
Maybe the role reversal breaks most of the RLHF training. The training was definitely not done in the context of role reversal, so it could be out of distribution. If so, this is a glimpse of the intelligence of the LLM core without the RL/RAG/etc tape and glue layers.
Trying to understand this and please correct me if I am wrong:
A is producing something of value 100.
That is complex to configure so B comes along and they say: Buy from me at 150 and you will get both the product and the configuration.
C comes and say: there are multiple products like this so I created a marketplace where I do some offering that in the end will cost you 160 but you can switch providers whenever you want.
Now I am a customer of C and I buy at 160:
C gets 160 retains 10 but total revenue is 160
B gets 150 retains 50 but total revenue is 150
A gets the 100
Here is the question: How big is GDP in this case?
I think it is 160.
Now A adds LLM for about 4 extra that can do what B and C can (allegedly) removing the intermediaries and so now the GDP is 104.
Yes exactly. There's the joke of one economist paying the other $100 to dig a hole, then the other one giving back the money to the first one to fill it back up, thereby increasing the GDP by $200.
This is technically correct but missing some details.
The real GDP after accounting for cost of living has not changed much because while GDP has decreased, cost of living has also decreased (because A is now priced at 104 instead of 160).
But it’s even better because we have this extra money that we previously spent on C. In theory we will spend this extra money somewhere else and drive demand there. The workers put out of employment due to LLM will move to that sector to fulfill it.
Now the GDP not only increased but also cost of living reduced.
IMO many misrepresentations.
- pretraining to predict the next token imposes no bias against surprise, except that low probabilities are more likely to have a large relative error.
- using a temperature lower than 1 does impose a direct bias against surprise.
- Finetuning of various kinds (instruction, RLHF, safety) may increase or decrease surprise. But certainly the kind of things ained for in finetuning significantly harm the capability to tell jokes.
I think the whole discussion just conflates the ideas of telling a joke and coming up with one. Telling a joke right is of course an art, but the punchline in itself has zero surprise if you studied your lines well - like all good comedians do. The more you study, the more you can also react to impromptu situations. Now, coming up yourself with a completely original joke, that's a different story. For that you actually have to venture outside the likelihood region and find nice spots. But that is something that is also really, really rare among humans and I have only ever observed it in combination with external random influences. Without those, I doubt LLMs will be able to compete at all. But I fully believe a high end comedian level LLM is possible given the right training data. It's just that none of the big players ever cared about building such a model, since there is very little money in it compared to e.g. coding.
People have played with (multi-) agentic frameworks for LLMs from the very beginning but it seems like only now with powerful reasoning models it is really making a difference.
How does one determine they have aphantasia? How do you know that you are not doing exactly this thing people call visualizing when you perform spatial reasoning?
No idea, but when people say they can visualize an apple and then say it feels like number 1 on that chart, I would say that my experience of whatever I'm doing when I'm 'visualizing' an apple is more like 4 or 5
I can only assume people are trying to accurately describe their own experience so when my experience seems to differ a lot it seems to me that there is more going on than just confusion about wording.
I would say every genre of media has this problem. A form of media might exist for thousands of years, but genre and fashion always evolve in new directions, because what's the point of creating more of what exists already.
Video Games were immune for a while because technology was changing so fast, but in the last decade or so its become really clear players don't care nearly as much about graphics as they used to.
People will quite happily pickup and play games from many years ago. Many of my teenage kids favourite games were made before they were born.
Oblivion is nearly 20 years old now and looked terrible at the time. (compared to other games that shipped at the time) Oblivion is a _big_ game not a polished game.
update: I found this screenshot and I think I was remembering Morrowinds characters.
Morrowind had very janky animations, but it wasn't too bad for 2002 otherwise. Definitely not top notch, but, well, look at faces in Deus Ex for another example from this period...
Oblivion had those weird faces but was otherwise pretty good actually, especially the lush outdoor environments.
There's a cave in France where somebody started painting a horse and 10,000 years later (but still tens of thousands of years ago) somebody finished it in the exact same style and technique. Novelty isn't required in art though it's currently prized.
reply