Before I read this, is this yet another paper where physicists believe that mode...

gammalost · on March 19, 2024

What do you mean with "modern NN" that differ from the basic FNN?

How do you train a modern NN if not through backpropagation?

zwaps · on March 21, 2024

Fnn are easy for sequential data because there is no relationship between the elements. Hence, these NN are a frequent target of simplified analyses. However, they also never led to exciting models which we now call AI.

Instead, real models are an eclectic mix of attention or other sequential mixers, gates, ffn, norms and positional tomfoolery.

In other words, everything that makes AI models great is what these analyses usually skip. Of course, while wildly claiming generalized insights about how AI really works.

There’s a dozen papers like that every few months.

g4zj · on March 19, 2024

> believe that modern NN are trained by gradient descent

Are they not?

Genuine question. I'm very new to machine learning and neural networks.

phkahler · on March 19, 2024

>> > believe that modern NN are trained by gradient descent

>> Are they not?

While technically true, that answer offers almost zero insight into how they work. Maybe another way to say it is that during inference there is no gradient descent happening - the network is already trained. Ignoring that gradient descent might be an overgeneralization of the training process, it tells you nothing about how ChatGPT plays chess or carries a conversation.

Telling someone what methodology was used to create a thing says nothing about how it works. Just like saying our own brain is "a product of evolution" doesn't tell how it works. Nor does "you are a product of your own life experience" put psychologists out of business. "It's just gradient descent" is a great way to trivialize something that nobody really seems to understand yet.

tech_ken · on March 19, 2024

There proposal is actually that gradient descent is a bad representation of the NN learning behavior, and that instead this NFM tracks the “learning” behavior of the model better than simply its loss function over training.

3abiton · on March 19, 2024

Why is spinning a physics lens to NN a bad thing?

toxik · on March 19, 2024

It isn't, but the kinds of paper referred to make so many assumptions that the conclusions lack external validity for actual deep learning. In other words, they typically don't actually say anything about state-of-the-art deep networks.

pcw7321 · on March 19, 2024

It's yet another.. just more adds..