Does anyone truly understand these models? I don't think we have any proofs about the upper limits of what LLMs are capable of. How can you be so confident?
To be clear, I am not saying there are no limits to what LLMs can do, I just don't get how people can be so sure one way or the other. Especially when you consider that this technology is evolving at such an unpredictable pace.
We do actually understand generally well enough what is happening. Attention isn’t some mysterious unexplained mechanism. We know how it works and why. When people describe these models as a black box, they typically mean that there are too many layers and weights to explain to you exactly why it chose, for example, a specific sequence of words. But we can certainly explain exactly why it would chose some sequence, and why that sequence would be expected to be relevant.
Simplifying a bit, but attention provides a way for the model to build context on one word based on how often it is seen with others. It doesn’t have a concept of correct or incorrect. It doesn’t have a concept of reasoning.
What is impressive is that even without these concepts of correctness and reasoning, the model can still perform quite well on tasks where correctness and reasoning would be expected. But this is more a statement on the corpus of knowledge and the power of language in general than it is on the models capabilities itself. It’s important not to confuse the ability to seem correct and seem well reasoned with any actual mechanism to do so.
> We do actually understand generally well enough what is happening.
See the comment on the "Golden Gate Bridge" version of Claude:
"The fact that we can find and alter these features within Claude makes us more confident that we’re beginning to understand how large language models really work." (emphasis mine)
I'm not sure if this paper corresponds to limits on what it can answer with a single or few tokens, but also the limits where LLM itself is allowed to produce more tokens (chain of thought) as well as use tools (coding) to solve problems?
Would allowing spaces in identifiers even introduce any ambiguity in most languages? I think the only languages I've seen where it would matter are functional languages. e.g. I think it'd be possible to write a Python program using spaces instead of underscores, and be able to unambiguously parse it with a slightly modified parser?
Since the space of valid syntax becomes so much larger, typos are more likely to result in valid but incorrect programs. Especially in dynamic interpreted languages like python.
> I think the only languages I've seen where it would matter are functional languages.
Yes, ML style function application is a problem and treating newlines as "normal" whitespace without having line separators (aka semicolons). And keywords used as infix operators, like another post reminded me of.
Python's facilities for calling subprocesses are pretty inconvenient compared to bash IMO. It defaults to binary output instead of UTF-8 so I almost always have to set an option for that. I wind up having to define threads in order to run programs in the background and do anything with their output in real time, which has an awkward syntax. The APIs for checking the exit code vs raising an error are pretty non-obvious and I have to look them up every time. And I always wind up having to write some boilerplate code to strip whitespace from the end of each line and filter out empty lines, like p.stdout.rstrip().split('\n') which can be subtly incorrect depending on what program I'm invoking.
"subprocess.run" appeared in python 3.5, and it's pretty nice - for example you so "check=True" to raise on error exit code, and omit it if you want to check exit code yourself. And to get text output you put "text=True" (or encoding="utf-8" if you are unsure what the system encoding is)
As for your boilerplate, it seems "p.stdout.splitlines()" is what you want? it's what you normally want to use to parse process output line-by-line
The background process is the hardest part, but for the most common case, you don't need any thread:
proc = subprocess.Popen(["slow-app", "arg"], stdout=subprocess.PIPE, text=True)
for line in proc.stdout:
print("slow-app said:", line.rstrip())
print("slow-app finished, exit code", proc.wait())
sadly if you need to parse multiple streams, threads are often the easiest.
The US (and Europe I'm guessing) banned human germ-line engineering in the 1970s, and so far the tech has stayed stopped worldwide. The Chinese scientist who proudly announced a success with the tech (about 10 years ago) was jailed by the Chinese government.
So, no, if the US and UK ban large training runs, there's a very good chance the rest of the world will follow.
What the Chinese government wants more than anything is a stable domestic political situation: they want to avoid a revolution, and they want to avoid the country's breaking up into 2 or 3 countries. And just like they perceived (correctly, IMHO) that the internet has a lot of potential to cause political instability and responded by vigorously regulating it, they're likely to vigorously regulate AI (while using AI to help them surveil their population). Facebook has made it more likely China will vigorously regulate AI by releasing a potent model under open-source-like terms, which proves to Beijing exactly how advances in AI can put power in the hands of the average Chinese citizen, which again the Chinese government does not want.
BTW, there's no need for you to stop running your open source model on your Apple silicon or 4090: if those models were capable of causing significant problems for people, they would've done so already, so stopping their distribution and use is not on the agenda of the people trying stop "foundational" progress in AI.
> The US (and Europe I'm guessing) banned human germ-line engineering in the 1970s, and so far the tech has stayed stopped worldwide.
And so far as we know the tech has stayed stopped worldwide. (With at least one exception, as you point out. But that exception was, apparently, not officially approved.)
Do you really think North Korea won't do this if they think they see some benefit?
Secret projects to continue to advance AI are much less of a danger than the current situation in which tens of thousands of AI researchers worldwide are in constant communication with each other with no need to hide the communications from the public or from any government.
Advancing the current publicly-known state of the art to the point where AI becomes potent enough to badly bite us (e.g., to cause human extinction) is probably difficult enough so as to not be in Pyongyang's power or even in Moscow's or Beijing's power especially if the government has to do it under the constraint of secrecy. It probably requires the worldwide community of researchers continuing to collaborate freely to reach the dubious "achievement" of creating an AI model that is so cognitively capable that once deployed, no human army, no human institution, would be able to stop it.
> ...especially if the government has to do it under the constraint of secrecy. It probably requires the worldwide community of researchers continuing to collaborate freely to reach the dubious "achievement" of creating an AI model that is so cognitively capable that once deployed, no human army, no human institution, would be able to stop it.
And stopping now may be helpful towards stalling advances (if they're even possible), by providing just enough capability to pollute the potential training data going forward. If the public internet becomes a "dead internet" or a "zombie internet," it'll be much harder to economically assemble good and massive datasets.
All the AI hype (and its implications) is bringing me around to the idea of viewing spam (of all things) as a moral good.
Back when we did the paper, Firecracker wasn't mainstream so we ended up doing a (much hackier) version of a fast VMM by modifying's Xen's VMM; but yeah, a few millis was totally feasible back then, and still now (the evolution of that paper is Unikraft, a LF OSS project at www.unikraft.org).
(Cold) boot times are determined by a chain of components, including (1) the controller (eg, k8s/Borg), (2) the VMM (Firecracker, QEMU, Cloud Hypervisor), (3) the VM's OS (e.g., Linux, Windows, etc), (4) any initialization of processes, libs, etc and finally (5) the app itself.
With Unikraft we build extremely specialized VMs (unikernels) in order to minimize the overhead of (3) and (4). On KraftCloud, which leverages Unikraft/unikernels, we additionally use a custom controller to optimize (1) and Firecracker to optimize (2). What's left is (5), the app, which hopefully the developers can optimize if needed.
LightVM is stating a VM creation of 2.3ms while Firecracker states 125ms of time from VM creation to a working user space. So this comparing apples and oranges.
I know it's cool to talk about these insane numbers, but from what I can tell people have AWS lambdas that boot slower than this to the point where people send warmup calls just to be sure. What exactly warrants the ability to start a VM this quickly?
The 125ms is using Linux. Using a unikernel and tweaking Firecracker a bit (on KraftCloud) we can get, for example, 20 millis cold starts for NGINX, and have features on the way to reduce this further.
To be clear, I am not saying there are no limits to what LLMs can do, I just don't get how people can be so sure one way or the other. Especially when you consider that this technology is evolving at such an unpredictable pace.