I don't think it is, as somebody who's spent maybe 100 combined hours reading AI...

famouswaffles · on Aug 11, 2023

Nobody said anything about Magic.

>There is no magic here at the lowest level – you can interrogate the math at each step and it'll make sense.

See that's the thing. You can't unless "making sense" has lost all meaning.

That you can see a bunch of signals firing or matrices being multiplied does not mean they "make sense" or are meaningful to you. Lol level gibberish is still gibberish.

Our ability to divine the purpose of activations of anything but the extremely small scale is atrocious.

charcircuit · on Aug 12, 2023

>Our ability to divine the purpose of activations of anything but the extremely small scale is atrocious.

The value of each parameter is chosen to minimize the loss. This applies to every single weight of the model. Not all weighs affect loss the same amount which is why concepts like pruning exist.

famouswaffles · on Aug 12, 2023

>The value of each parameter is chosen to minimize the loss

Vague and fairly useless. What is it doing to minimize loss ?

>Not all weighs affect loss the same amount which is why concepts like pruning exist.

Only weights with values close to or at zero get pruned. It's not because we know what each weight does and can tell what would work otherwise.

charcircuit · on Aug 12, 2023

>Vague and fairly useless.

When creating a model your goal is to find one with minimal loss. Being able to figure how to improve a model by finding weights that reduce the loss is not a vague or useless idea.

>What is it doing to minimize loss?

The value helps us get to a location in the parameter space with lower loss.

>Only weights with values close to or at zero get pruned.

Weights near 0 don't change the results of the calculations they are used in my much which is why they don't effect loss very much.

famouswaffles · on Aug 12, 2023

>When creating a model your goal is to find one with minimal loss. Being able to figure how to improve a model by finding weights that reduce the loss is not a vague or useless idea.

I'm sorry but did you bother reading the previous conversation ? We were talking about how much we know what weights do during inference. "It reduces loss" alone is in fact very vague and useless for interpretability.

>The value helps us get to a location in the parameter space with lower loss.

What neuron(s) is responsible for capitalization in GPT? You wouldn't get that simply from "reduces the loss". Our understanding of what the neurons do is very limited.

>Weights near 0 don't change the results of the calculations they are used in my much which is why they don't effect loss very much.

I understand that lol.

"This value is literally 0 so it can't affect things much" is a very different understanding level from "this bunch of weights are a redundancy because this set already achieves this function that this other set does and so can be pruned. Let's also tune this set so it never tries to call this other set while we're at it. "

charcircuit · on Aug 12, 2023

>What neuron(s) is responsible for capitalization in GPT?

It doesn't matter. Individual things like capitalization are vague and useless for interpretability. We know that incorrect capitalization will increase loss, so the model will need to figure how to do it correctly.

>Our understanding of what the neurons do is very limited.

The mathematical definition is right in the code. You can see the calculations they are doing.

>this bunch of weights are a redundancy because this set already achieves this function that this other set does and so can be pruned. Let's also tune this set so it never tries to call this other set while we're at it.

They are equivalent. If removing something does not increase loss then it was redundant behavior at least for the dataset that it is being tested against.

famouswaffles · on Aug 16, 2023

>It doesn't matter. Individual things like capitalization are vague and useless for interpretability. We know that incorrect capitalization will increase loss, so the model will need to figure how to do it correctly.

It matters for the point I was making. Capitalization is a simple example. There are far vague functions we'd certainly like the answers to.

>They are equivalent. If removing something does not increase loss then it was redundant behavior at least for the dataset that it is being tested against.

The level of understanding for both is not equivalent sorry.

At this point, you're just rambling on about something that has nothing to do with the point I was making. Good Day