Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Better title:

"Artificial General Intelligence – We don't know the heck where this is going but here are some thoughts"



I've been forced to work with a team of very very smart Phds from ivy league universities to help put together an explanation deck for the c-suite. They've informally told me, the neural nets AI tech is beyond human understanding. Everybody can only explain very small pieces of it,and no one knows how the pieces work together.


Yes. In a recent HN submission, ChatGPT was listed as a “scientific advance” of the past year. While it’s certainly some kind of advance, to me it seems to be more on the engineering side than on the scientific understanding side.


Definitely engineering. It’s not entirely wrong to say that the two reasons it took us until 2022 to make a ChatGPT are 1) the computing power needed and 2) the size of the training corpus needed. The same goes for other generative AI – it took a corpus of a couple billion images to train a Stable Diffusion model.


You may argue that it took a leap of insight to get to transformer models, though.


That was not an innovation of ChatGPT.


It's pretty clear that ChatGPT is being used here as a synecdoche for recent LLMs, and transformer LLMs in particular.


"Transformer is All You Need" is from 6 years ago. It isn't an advancement that happened last year.


If we discovered a species of parrot that could learn to use language in the manner of recent LLMs, that would count as a scientific advance (though not a breakthrough, at least until we achieved a good grasp on how it is possible.)

Science advances firstly by finding something in want of an explanation, and then by coming up with one.


I don’t follow. IMO coming up with a working explanation is a necessary part of scientific advance. And that’s what we’re currently missing with LLMs, and with your parrot example.


Case in point: when Zwicky discovered that galaxies were rotating faster than could be explained in terms of what we know, that was an advance - an increase in our scientific knowledge. when we come up with a satisfactory explanation of why that is the case, then we will also have an advance - an increase in our scientific understanding. You can't get to the latter until you have advanced to the former, and we will probably need further advances of our scientific knowledge before we can understand the phenomenon Zwicky identified.


I get where you're coming from but we might also be like ants walking over an iPhone, wondering where the vibration is coming from. They might eventually figure it out, but if so it will be after an extremely long time, and they probably should better focus on other things at this very moment if they seek enlightenment.


Luckily there are lots of humans and we can focus on a lot of different things.


To explain thing: neural networks are referred to as AI today (while for a long time they were just "machine learning") but there's a substantial consensus they won't be AGI (Artificial General Intelligence, "human like intelligence" etc) and a nearly universal belief they aren't AGI now. That scientists don't understand their internal processes doesn't change this and isn't necessarily related, directly, to humans not knowing how to create AGI.


It's not beyond human understanding. Unless you mean that one must know everything from every research paper released. At its core you are just finding a well performing model using gradient descent. Gradient descent is not beyond human understanding.


Gradient descent in isolation is obviously not what they are alluding to. What the models are doing inside the box and what any of those millions or billions of weights mean or do is beyond human understanding.


I don't think it is, as somebody who's spent maybe 100 combined hours reading AI papers mostly focused around NLP and image classification.

You have a dataset, symbolically represented in 1s and 0s. You have an objective function (e.g. classify the object as belonging to one of N categories).

The purpose of the collective neurons in the network is to "encode" the input space in a way that satisfies the objective function. In the same way that we "encode" higher-level concepts into shorthand representations.

Gradient descent is the optimization function we use to develop this encoding.

Beyond this, there are all kinds of tricks people have developed (interesting activation functions for neurons, grouping + segregating neurons, introducing a dimension of recurrence/time, dataset pre-processing, using bigger datasets, having another model generate data that's deliberately challenging for the first model) to try to converge to a more robust/accurate encoding, or to try to converge to a decent encoding at a faster rate.

There is no magic here at the lowest level – you can interrogate the math at each step and it'll make sense.

The "magic" is that we have zero epistemology to explain why tricks work, other than "look, ma test results". We know certain techniques work, and we have post-hoc intuitive explanations, but we're mostly fumbling our way "forwards" via trial and error.

This is "science" in the 17th century definition of the term, where we're mixing chemicals together and seeing what happens. Maybe we'll have a good theoretical explanation for our experimental results 100 years from now, if we're still around.


Nobody said anything about Magic.

>There is no magic here at the lowest level – you can interrogate the math at each step and it'll make sense.

See that's the thing. You can't unless "making sense" has lost all meaning.

That you can see a bunch of signals firing or matrices being multiplied does not mean they "make sense" or are meaningful to you. Lol level gibberish is still gibberish.

Our ability to divine the purpose of activations of anything but the extremely small scale is atrocious.


>Our ability to divine the purpose of activations of anything but the extremely small scale is atrocious.

The value of each parameter is chosen to minimize the loss. This applies to every single weight of the model. Not all weighs affect loss the same amount which is why concepts like pruning exist.


>The value of each parameter is chosen to minimize the loss

Vague and fairly useless. What is it doing to minimize loss ?

>Not all weighs affect loss the same amount which is why concepts like pruning exist.

Only weights with values close to or at zero get pruned. It's not because we know what each weight does and can tell what would work otherwise.


>Vague and fairly useless.

When creating a model your goal is to find one with minimal loss. Being able to figure how to improve a model by finding weights that reduce the loss is not a vague or useless idea.

>What is it doing to minimize loss?

The value helps us get to a location in the parameter space with lower loss.

>Only weights with values close to or at zero get pruned.

Weights near 0 don't change the results of the calculations they are used in my much which is why they don't effect loss very much.


>When creating a model your goal is to find one with minimal loss. Being able to figure how to improve a model by finding weights that reduce the loss is not a vague or useless idea.

I'm sorry but did you bother reading the previous conversation ? We were talking about how much we know what weights do during inference. "It reduces loss" alone is in fact very vague and useless for interpretability.

>The value helps us get to a location in the parameter space with lower loss.

What neuron(s) is responsible for capitalization in GPT? You wouldn't get that simply from "reduces the loss". Our understanding of what the neurons do is very limited.

>Weights near 0 don't change the results of the calculations they are used in my much which is why they don't effect loss very much.

I understand that lol.

"This value is literally 0 so it can't affect things much" is a very different understanding level from "this bunch of weights are a redundancy because this set already achieves this function that this other set does and so can be pruned. Let's also tune this set so it never tries to call this other set while we're at it. "


>What neuron(s) is responsible for capitalization in GPT?

It doesn't matter. Individual things like capitalization are vague and useless for interpretability. We know that incorrect capitalization will increase loss, so the model will need to figure how to do it correctly.

>Our understanding of what the neurons do is very limited.

The mathematical definition is right in the code. You can see the calculations they are doing.

>this bunch of weights are a redundancy because this set already achieves this function that this other set does and so can be pruned. Let's also tune this set so it never tries to call this other set while we're at it.

They are equivalent. If removing something does not increase loss then it was redundant behavior at least for the dataset that it is being tested against.


>It doesn't matter. Individual things like capitalization are vague and useless for interpretability. We know that incorrect capitalization will increase loss, so the model will need to figure how to do it correctly.

It matters for the point I was making. Capitalization is a simple example. There are far vague functions we'd certainly like the answers to.

>They are equivalent. If removing something does not increase loss then it was redundant behavior at least for the dataset that it is being tested against.

The level of understanding for both is not equivalent sorry.

At this point, you're just rambling on about something that has nothing to do with the point I was making. Good Day


Anyone satisfied with "it's gradient descent" as an explanation isn't displaying much curiosity.


It is true and it's worth reminding. Science is progressed by interpretation of data. And we have easy access to a behemoth that needs interpreting. AI winter shouldn't come anytime soon.


…assuming we make significant progress in explaining the data. Science means coming up with theories based on observations, and then testing the theory by verifying the further predictions made by those theories by experiment. It remains to be seen how much success there will be in that regard for LLMs.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: