Yeah. I'm not sure if anything with millions or billions of parameters will ever...

etiam · 2024-09-12T14:57:02 1726153022

Good points.

Personally I'm still basically with Geoff Hinton's early conjecture that people will have to choose whether they want a model that's easy to explain or one that actually works as well as it could.

I'd imagine the really big whiteboard would often be understandable in principle, but most people wouldn't be very satisfied at having the model go "Jolly good. Set aside the next 25 years in your calendar then, and tell me when you're ready to start on practicing the prerequisites!".

On the other hand, one might question how often we really understand something complex ostensibly "explained" to us, rather than just gloss over real understanding. A lot of the time people seem to act as if they don't care about really knowing it, and just (hopefully!) want to get an inkling what's involved and make sure that the process could be demonstrated not to be seriously flawed.

The models are being held to standards that are typically not applied to people nor to most traditional software. But sure, there are also some real issues about reliability, trust and bureaucratic certifications.

scarmig · 2024-09-12T15:09:46 1726153786

I came across "Learning XOR: exploring the space of a classic problem" other day: https://www.maths.stir.ac.uk/~kjt/techreps/pdf/TR148.pdf

Even something with three units and two inputs is nontrivial to understand on a deep level.

crazygringo · 2024-09-12T15:36:17 1726155377

> Are we ever really going to understand why it produces the numbers it does?

I would expect so, because we can categorize things hierarchically.

A medium-sized library contains many billions of words, but even with just a Dewey decimal system and a card catalog you could find information relatively quickly.

There's no inherent difficulty in understanding what a billion terms do, if you're able to just drill down using some basic hierarchies. It's just about finding the right algorithms to identify and describe the best set of hierarchies. Which is difficult, but there's no reason to think it won't be solvable in the near term.

thesz · 2024-09-12T20:52:35 1726174355

KAN's have O(N^(-4)) scaling law where N is the number of parameters. MLPs have O(N^(-1)) scaling or worse.

For where you need MLP with a tens of billions of parameters you may need KAN with thousands.

afiori · 2024-09-12T14:18:15 1726150695

I found these articles very interesting in the context of future ways to understand LLM/AIs

https://www.astralcodexten.com/p/the-road-to-honest-ai

https://www.astralcodexten.com/p/god-help-us-lets-try-to-und...