To be deliberately unfair, imagine a huge if-else block — like, a few billion entries big — and each branch played out a carefully chosen and well-written string of text.
It would convince a lot of people with the breadth, despite not really having much depth.
The real GPT model is much deeper than that, of course, but my toy example should at least give a vibe for why even a simple thing might still feel extraordinary.
This is absolutely not viable because exponential growth absolutely kills the concept.
Such a system would already struggle with multiple-word inputs and it would be completely impossible to make it scale to even a paragraph of text, even if you had ALL of the observable universe at your disposal for encoding the entries.
Consider:
If you just have simple sentences consisting of 3 words (subject, object, verb, with 1000 options each-- very conservative assumptions), then 9 sentences already give more options than you have atoms (!!) in the observable universe (~10^80)
α: most of those sentences are meaningless so they won't come up in normal use
β: if statements can grab patterns just fine in most languages, they're not limited to pure equality
γ: it's a thought experiment about how easy it can be to create illusions without real depth, and specifically not about making an AGI that stands up to scrutiny
> if statements can grab patterns just fine in most languages, they're not limited to pure equality
This does not help you one bit. If you want to produce 9 sentences of output per query then regular expressions, pattern matching or even general intelligence inside your if statements will NOT be able to save the concept.
> What is the entropy per word of random yet grammatical text?
That is what these 5-11bit estimates are about. Those would correspond to a choice out of 32 to 2048 options (per word), which is much less than there are words in english (active vocabulary for a native speaker should be somewhere around 10000-ish).
Just consider the XKCD "thing explainer" which limits itself to a 1k word vocabulary and is very obviously not idiomatic.
If you want your big if to produce credible output, there is simply no way around the entropy bounds in input and desired output, and those bounds render the concept absolutely infeasible even for I/O lengths of just a few sentences.
Eliza is not comparable to GPT because it does not even hold up to very superficial scrutiny; its not really capable of even pretending to intelligently exchange information with the user, it just relies on some psychological tricks to somewhat keep a "conversation" going...
> Eliza is not comparable to GPT because it does not even hold up to very superficial scrutiny; its not really capable of even pretending to intelligently exchange information with the user, it just relies on some psychological tricks to somewhat keep a "conversation" going...
That's kinda the point I was making — tricks can get you a long way.
The comparison with GPT is not "and therefore GPT is bad" but rather "it's not necessarily as smart as it feels".
Perhaps I should've gone for "clever Hans" or "why do horoscopes convince people"?
It would convince a lot of people with the breadth, despite not really having much depth.
The real GPT model is much deeper than that, of course, but my toy example should at least give a vibe for why even a simple thing might still feel extraordinary.