Intuitively I've always been a bit skeptical of quantization. Wouldn't there be ...

thesz · on Feb 28, 2024

John Carmack pointed out (and I learned it here at HN) that what training really needs is the *sign" of each individual gradient parameter. I.e., you can quantize gradient to -1, 0 and 1 and still have neural network learn much of the dataset.

Solvency · on Feb 29, 2024

Why isn't John Carmack working for OpenAI? Hell, why did he waste years at Meta to work on a VR headset and NOT AI? He even announced he wants to focus on AGI but he missed out on literally all the action.

igleria · on Feb 29, 2024

he has his own AGI startup now https://dallasinnovates.com/john-carmacks-keen-technologies-...

TBH I think they won't get anywhere. Doing good game engine work... why that would translate to AGI?

bkydcmpr2 · on March 10, 2024

That game engine was over 3 decades ago! John is one of the sharpest minds I've ever seen, if he's passionate on AGI, he surely has much deeper understanding what he's doing than the AI trendies on social media.

thesz · on Feb 29, 2024

Let me introduce you to the wonderful game that is The Talos Principle: https://en.wikipedia.org/wiki/The_Talos_Principle

It discusses whether it is possible to evolve AGi using... computer game engine! And that is John's bread and butter.

farhanhubble · on Feb 29, 2024

Wow! Is there a link to read up more on this?

thesz · on Feb 29, 2024

  > It is interesting that things still train even when various parts are pretty wrong — as long as the sign is right most of the time, progress is often made.

https://forums.fast.ai/t/how-to-do-reproducible-models-and-u...

danielmarkbruce · on Feb 29, 2024

They seem to be doing training with higher precision. The optimizer is keeping a copy.

eightysixfour · on Feb 28, 2024

It does increase the “error” (meaning it is less likely to predict the next word when compared against a dataset) but the losses are lower than your intuition would guide you to believe.

int_19h · on Feb 28, 2024

Quantization does reduce quality of the outputs. But the point is that you save enough memory doing so that you can cram a larger model into the same hardware, and this more than compensates for lost precision.

spencerchubb · on Feb 28, 2024

Yes each weight will not be able to "learn" as much if it has less bits of precision. But the idea is that you can use more weights, and the big question is whether these low-precision weights can make the model more accurate, as a whole.