Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

There seems to be a new model every single day. How do people have time to keep track with everything going on in AI?


From decades of observing at a distance and observing observers at a distance, I think it's safe to say that, like fusion, there are walls that AI run into, not unlike the risers on a staircase, and when we collectively hit one, there's a lot of scuttling back forth. A lot of movement, but no real progress. If that plateau goes on too long, excitement (and funding) dry up and things die down.

Then someone figures out how to get past the current plateau, and the whole process repeats. That could be new tech, a new architecture, or it could be old tech that was infeasible and had to wait for Moore's Law.

Right now we are on the vertical part of the sawtooth pattern. Everyone hopes this will be the time that takes us to infinity, but the old people are just waiting for people to crash into the new wall.


Why things should dry up when contrary to fusion ai is already usable by millions daily ? Even if prpgress should stall a bit the product or fine-tunes or normal progress will still be super supeful , the "too soon" point has been surpassed


A lot of previous plateaus in AI are usable and used by billions daily, for example, giving good navigation routes on your phone, managing NPCs in a video game, showing ads, or recommending movies.

It's not that they don't have value -- they do, and in the trillions of dollars -- but once understood, they move from "AI" to "algorithms" and stop being exciting.

The current progress feels different to me, though. The current step in capability is much higher than previous ones, as is the potential disruption.


I think what makes the current iteration of AI different is that we don't understand how the emerging abilities work.

A map navigation algorithm: we understand it, we know where the limit is (basically it cannot do anything that isn't map navigation), so it stops being exciting.

GPT: we don't understand it, we don't know where the limit is... And it doesn't seem it will stop being exciting until we do.


People say this a lot - that we don't understand this or that, but I'm not really sure what they mean. We know exactly how these algorithms work. We know every calculation - the maths is not particularly difficult, we understand how the training process leads to information being stored in the weights, we know how inference works. What more would you want to understand before you would agree we understood it?


> We know exactly how these algorithms work.

We have no idea how these algorithms work.

> We know every calculation - the maths is not particularly difficult,

We do know that.

> we understand how the training process leads to information being stored in the weights, we know how inference works.

We do not know that.

> What more would you want to understand before you would agree we understood it?

Let me give an analogy:

We have an almost perfect understanding of transistors. If you hand me a Qualcomm mobile chipset in a black box, I'll have little or no understanding of how that allows me to make phone calls. Back in the day, I understood the x86 instruction set very well. However, if you gave me the binary of a video game, I'd have no idea how it worked. Neuroscientists understand the mathematics of how neurons work, imperfectly (but pretty well). For the sake of argument, we can pretend the models are perfect. We understand the neural wiring of simple organisms perfectly. We still have very little idea of how the human brain works.

The algorithms in deep learning are evolved and have billions of parameters. We understand the general topology, and the math of individual neurons, but we have absolutely no idea how the things work as a system. Anyone who tells you they do is lying (very likely with no ill intent; they're probably deluding themselves as well).

The people doing deep learning are, by and large, not brilliant mathematicians, of the type who did earlier AI. The math is simple compared to most of the convex optimization algorithms which came before (and could probably be made much better if those were applied). Even at the human level, a lot of work in deep learning is

- randomly tweaking parameters, topologies, and algorithms

- developing intuition (NOT theory) for which ones work better, and

- bullshitting explanations for why that might be (which would, at best, pass for hypothesis in any scientific process)

It's hard for me to emphasize how little we know about how or why these things work. We just set up a general framing which evolves well, and evolved it. An analogy would be if we set up a random number generator to write a piece of code, ran it 10^10^10 times, and picked the result which made the best wavelet transforms. We'd have no clue how it works. The only difference is (1) we have algorithms which are more tractable than randomly picking algorithms (2) we set up neural networks which evolve better than code, largely by virtue of being continuous rather than discrete.


I'm still struggling to understand what counts as 'knowing how it works' for you.

In my view, if you randomly generated a piece of code to do a task, you know how it works - you can see the algorithm right there in front of you. If you randomly generated it and checked each instance until you found a good one, you even know how you got it and why it's good at the task (because you checked it and threw away the ones that weren't). Obviously if you start not knowing what the code is, you have a simple, surface level lack of understanding which you can resolve by tracing it. Once you've done that, you understand everything there is to understand about it. The fact that it produces nice wavelet transforms is a simple product of how it was found.

What more do you want to understand about it? What more is there to understand about it?


I have plenty of pieces of code I've written, decades ago, where I know what they do, but have NO idea how they work.

If I want to understand a piece of code, I need to read it and understand it. Modern models have billions of parameters and are trained over >10^20 computations. That's more than I can ever hope to read.

> because you checked it and threw away the ones that weren't

I know what it does under specific circumstances. I don't know what it does elsewhere. We have a pretty good understanding of how GPT-4 works on training data, but we have a very poor understanding of what it does for the countless other uses we see. Code I write, I analyze carefully for corner cases.

If we develop an AI which has a corner case of "exterminate humanity" which wasn't in the training set, that's, well, very possible.

I've trained plenty of neural networks (even once coding in machine code straight to custom hardware, back in the day). I can't say I understand how very many of them work, though.

> you can resolve by tracing it

You can't trace through 200B parameters or through 10^20 computations. That's beyond human capacity. We have no idea how it works, and we have a very poor understanding of emergent behaviors.

Evolution "trains" biological organisms to survive to have as many babies as possible. Vengeance? Love? Loyalty? Pain? Hate? Emergent behaviours.


> I know what it does under specific circumstances. I don't know what it does elsewhere.

No, but if you get given a circumstance, you know how to work out what it does in that circumstance. Are you saying that you need to keep every input -> output mapping in your head to feel you understand a piece of code? I feel like I understand multiplication pretty well, but there are many multiplication calculations you could give me where I wouldn't know the answer without a lot of thinking. There's some I couldn't work out by myself in my lifetime. That doesn't stop me feeling like I understand multiplication pretty well.

> Vengeance? Love? Loyalty? Pain? Hate? Emergent behaviours.

Sure, and arguably at the level we're talking about, those are descriptive rather than explanatory. 'Vengeance' isn't something a neuron knows about, nor is it a biological mechanism in our cells, it's how we describe high level behavior resulting from the interactions of the cells. It's an abstraction. If you had the accurate model you were talking about earlier, you'd be able to work out that given the right input, a particular behavior is output. That others might call that behavior 'vengeance', makes not a single iota of difference to your ability to predict the behavior of the system.

Are you saying that you need to have developed high level descriptions of the behavior of a system in order to feel you understand it? What if there are no high level descriptions? In the hypothetical scenario where we hit on an algorithm randomly, there's no requirement that it translates to any specific high level concepts, there's just input, output and the algorithm, all of which we can understand.

Or perhaps you mean that you already have a set of categories for output behavior and to truly understand something you need to be able to categorise the inputs and know which broad input categories result in which output categories? I could probably accept that as a broad definition of understanding, but there's a lot of flex there in terms of exactly what level of granularity you're requiring.


For a multiplication, I can work out any problem, and I have a sense of what it will do under any circumstance.

> Are you saying that you need to have developed high level descriptions of the behavior of a system in order to feel you understand it?

Yes. That's almost the definition of "understanding."

> What if there are no high level descriptions?

There are things we don't or can't understand. That's approximately Godel's Theorem. That likely includes some phenomena in fluid mechanics and in quantum mechanics. It may or may not include large-scale deep learning models.

It's okay to admit we don't, or can't, understand something.

> Or perhaps you mean that you already have a set of categories for output behavior and to truly understand something you need to be able to categorise the inputs and know which broad input categories result in which output categories?

There are different levels of understanding. However, with LLMs, I don't have a clear sense of under what conditions one might decide to, for example, eradicate humanity. I'd say that suggests I have a very limited understanding of them. I don't think there are many people with a better understanding than mine, and no one with a good understanding.

I feel like I understand a multiplication algorithm well enough to know it won't do that ever, however. If I multiply two numbers, I won't get a humanity-ending answer out.

I don't know if deep learning models have some analogue to emotions. I do know multiplication doesn't.

And so on.


>you know how it works - you can see the algorithm right there in front of you.

Seeing the algorithm in front of you doesn't mean you know how it works. It's gibberish code. If you'd never learnt C or any other programming language in your life, i could show you the C Code for a popular application. You could inspect it all you like. You will still understand nothing. The best you can do is, "this code is running this application".

In the real world, you can just pick up a C book and start learning. In this instance, no one on earth has learnt C and there are no books on it.

and neural network calculations are not just one unvarying "algorithm"


Knowing the algorithm literally means that you could theoretically reproduce it yourself, step by step (absent practical worries about longevity). Each of the steps is simple and well understood. What to do at each step is simple and well understood. What more is there to understand?


By that definition, i know COBOL too since i can look up programs coded in it. I can reproduce the program too by hand-typing it and running it elsewhere. Banks should hire me asap /s.

I genuinely don't get what is so hard to understand here. You don't know the algorithm. You can see it. That's all. You don't suddenly understand information just because you can see and copy it. Would certainly be nice though.


What I'm trying to work out is what you mean by 'understand'. When it comes to an algorithm, what is it that you need to know beyond how to execute it in order to believe you understand it?


Being able to make predictable changes directly to the algorithm would be a starter.


Yes. The thing that makes the current generation of AI different is that the architectures scale. Another $10 million in training effort WILL yield improvement. And Moore’s law pairs nicely with scaling behavior. In other words, there is currently no end in sight. Plus, algo advancements like this make things happen ever faster. Plus, increased VC money means more money to throw at hardware and more folks trying new things in software. Soon we’ll be replaced :(


Depends on what you are looking for. I have this hesitation too. What we have and are on track for is useful and cool, but how far will we come in this spurt until we are back at slight incremental gains?

Implementation wise in business, we are very early though. It feels like email in 1995, we have barely scratched the surface of what LLMs can mean for business and everyday life.


Because suddenly the tech moves from world transformative to world enhancing. The potential profits from trillions to mere billions. From immortality to slightly longer lifespan.


Thanks for putting this so eloquently. That's exactly how I feel as well.


We know for a fact that there is no wall till GPT 4 and open source still has long way to reach there.


I know. The new reddit look sucks big. But the sub reddits still give you good insights in the latest developments. I am playing arround a lot with image related stuff arround stable diffusion. Comfyui subreddit gives me daily. Now after a few weeks i think i have a fair good understanding in what is hot. Checkpoints, ipadapters, facemodels etc. Just playing arround and you will get a grasp of it. I guess its similar with text generation.


FYI: if you want a faster experience, https://old.reddit.com/ still works perfectly


I sorta keep huggingface open next to HN now. Follow like “TheBloke” and a few others and you’ll know what’s up.


Thebloke just quantizises existing models what is the point of following him?


Because he quantizes up and coming models. You pay attention to him because he's a proxy for what's currently hot.


You look at the comments here to see if it’s any different or getting rave reviews. Most model are crap, the new mistral one from yesterday seem to be pretty good. But most models are not very practical or useful for anything other than amusement right now. I imagine in a year we’d get close to gpt4 models locally and with low spec requirements where a 2 series NVidia card can run.


Is almost like everything’s accelerating exponentially.


I honestly don't think I could if I still had a job not in it




Consider applying for YC's Winter 2026 batch! Applications are open till Nov 10

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: