No, machine learning models are not programs and they are not compiled from sour...

fl7305 · on April 7, 2023

> "They are the output of non-deterministic matrix multiplication operations"

Just a nit-pick: Aren't neural networks and LLMs perfectly deterministic?

I think you can reproduce GPT-4 perfectly if you have access to the same source code, training data, and the seeds for the random number generators that they used?

As a side note, I think it'd be theoretically possible to do this on a small 8-bit microcontroller given enough time and external storage. That's the beauty of Turing machines.

This would not be practical in the least. But it sure was cool seeing a guy boot Linux in just 3.5 hours on a small 8-bit AVR microcontroller.

https://dmitry.gr/?r=05.Projects&proj=07.%20Linux%20on%208bi...

MacsHeadroom · on April 8, 2023

Multi-core math on GPU and CPU is non-deterministic for performance and scheduling reasons.

The errors are small rounding errors that maybe don't have any serious implications right now. But the larger models get and the more operations and cores it takes to train them the more the rounding errors creep up.

fl7305 · on April 8, 2023

> Multi-core math on GPU and CPU is non-deterministic for performance and scheduling reasons.

Ok, I see what you mean.

I can see how that could be the case. It depends on how the software is designed.

Now that I looked up it, I was surprised to see that PyTorch may generate non-reproducible results: https://pytorch.org/docs/stable/notes/randomness.html

But it looks like the sources of non-determinism in PyTorch are known, and can be avoided with a lot of work and loss of performance?

And for the general case, I don't think it's impossible to write deterministic code for multi-core processors?

> The errors are small rounding errors

But rounding errors don't imply non-deterministic answers, right? Just that the answer is different from the true answer?

Calculating the square root of 2 will have a rounding error with 32-bit floating point, but are you saying that you'll get different bit patterns in your FP32 due to rounding errors?

fl7305 · on April 7, 2023

Thanks for saving me the time to write the same reply :)

To expand a bit:

I can write simple image processing code that will find lines in an image.

But I can't write the code to perform OCR (optical character recognition).

However, in the early 90's, I wrote a simple C program that trained a neural network to perform OCR. It was a toy project that took a weekend.

There are many things where I could train a neural network to do something, but couldn't write explicit source code to perform the same task.

If you (chlorion) look up "genetic algorithms", you'll find many clear examples of where very impressive algorithms were evolved using a simple training program.

chlorion · on April 8, 2023

So I reread here and I think I misunderstood what you meant.

I meant that the process of generating the models, and otherwise interacting with them are regular programs. The model itself is I guess more like a database or something, but it too is just regular data.

The original thing I was replying to was claiming that the process in general was "not a program", as if there was some magic thing going on that made the model different from output of other programs, or the training was somehow magical. (that is how I read it at least)

chlorion · on April 7, 2023

If they aren't programs, how do they run on computers?

CPUs and GPUs physically cannot do anything other than execute programs which are encoded into bytecode.

What you are describing is that the language model is "magic" and breaks the laws of physics. I don't believe in magic personally though.