Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

You need AGI for good self-driving because driving requires predicting the actions of other human drivers at an extremely high level. This is second-nature for humans so we barely notice that we are doing it, but it is extraordinarily difficult for non-humans.


I think the reality is somewhere in the middle. You need to be able to accurately predict behavior of humans _following some conventions_, and to be wary of the behavior of humans when they violate those conventions.

An example I saw:

- At the start of a construction area, a guy wearing a hi-viz vest holds a stop sign. A self-driving car stops at the sign.

- The guy _lowers_ the sign a bit while looking over his shoulder down the street towards others on his crew.

At this point a _human_ guesses that the sign is lowered only b/c the guy has seen that the car stopped, and expects the car to stay stopped until some further signal (e.g. a waving gesture, or flipping the sign to show the "slow" side). The human driver understands that stop sign guy is looking to coordinate with someone else nearby. There's a "script" for this kind of interaction.

... but the self-driving car starts moving as soon as the road crew guy lowers the sign. In this case nothing seriously bad happened. But it was not following The Conventions.

This doesn't take full general intelligence perhaps -- but it takes some greater reasoning about what people are doing than the cars seem to have currently, and so sometimes they drive into a zone that the fire department is actively using to fight a fire, and get in the way.


No you don’t. I can feed images of crazy driving scenarios into LLaVA and get reasonable responses. That’s a general purpose LLM with $500 worth of fine tuning running locally on my PC. You should look into what can be done with the current state of the art LLMs. Your intuition for what’s possible is out of date.

If I can do that with open source LLaMA variants, I can only imagine what’s possible if you have an actual annotated dataset of driving scenarios. Imagine a LLaMA model thats been fine tuned for lane selection, AEB, etc.


That's a nice conjecture, we will see in the coming years if it plays out.


You getting six nines of accuracy on that with good latency? Did you watch the “how our large driving model deals with stop signs” from Tesla AI department? Given the multiplicative effect of driving decisions and the weird real world out there, it has be extremely reliable and robust to be a good driver as the miles mount up.


The reason you would insert an LLM into the vision stack is to deal with the weird and unexpected. Tesla’s current stop sign approach is to train a classifier from scratch on thousands of stop signs images. It’s not surprising that architecture can’t deal with stop signs that fall outside the distribution.

LLMs with vision work completely differently. You’re leveraging the world model, built from a terabyte of text data, to aid your classification. The classic example of an image they handle well is a man ironing clothes on the back of a taxi. Where traditional image classifiers wouldn’t have a hope of handling that, vision LLMs describe it with ease.

https://llava.hliu.cc/


This is overcome with super human sensing and reaction time and better visual angles.

i.e. radar based deceleration to avoid accidents already helps many humans avoid collision. It'll help the robot too.

The rest of driving is relatively simple, methodological, and slow.


But also ambiguous and from time to time requiring judgment. Should I let that dumb ass driver go next or pull around them? I agree it’s insane not to allow the automated driving use more sensors than humans have. I wish I had vision that can cut thru rain and glare.


That is a conjecture which does not represent the current state of the technology.


I don't think anybody has demonstrated convincingly that a self-driving car would specifically need AGI to acheive what really matters: statistically better results on a wide range of metrics. I don't expect SDCs to solve trolley problems (or human drivers to solve them either) or deal with truly exceptional situations. To me that's just setting up an unnecessarily high bar.


While this might be true for the truly general case (though I’d bet it’s not), when you have a very constrained operating area it’s a lot less true.

Waymo in Phoenix and current cruise cars in SF seem like good counter examples.

The bar is also a lot lower - human drivers are pretty bad.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: