Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I originally had “radar/LIDAR” everywhere you see “LIDAR” in that comment but it got really unwieldy halfway through. I think what I said generalizes from the specific example of LIDAR to other forms of sensing pretty well anyway, so you can just sub in radar if you want. The general principle is “vision” (in the sense of cameras feeding 2D image data into something that is probably a neural network) vs “everything else”. I would have said cameras vs sensors but some of the sensors use the visible light spectrum and so their sensors are called cameras. I like your use of “optical”, that might be the cleanest way to point at what I meant.

I broadly agree with your second point, about vision-only presenting big computational challenges. I think you do get some easy wins that bring down the challenge a bit - e.g. you don’t need to model human brains, you just need to model whatever the brain is doing when it’s driving; also the fact that we can teach people to drive without understanding what their brain is doing is a reassurance that we can teach a neural network to drive without understanding what it is doing either, so it frees us from (some) of the modeling of thought processes as well. But it is still a big computational challenge. I heard that Tesla has a server farm with thousands of Nvidia A100s, if true, that could make a dent in the problem for sure.

And yeah, I also wouldn’t say vision is the be-all and end-all when it comes to driving. (It’s a pity that we can’t easily integrate LiDAR, radar, and other sensors into the human brain so we could use them like we do sight and sound in order to drive better.)

My point is more that roads come in all shapes and types and sizes, but one consistent thing about them is that they’re all designed so that humans can use vision to drive on them. Like, you don’t know if future roads/signs/cars will be built in ways that are hard to read with LiDAR, but you can be pretty confident they won’t be built to be hard to see. Road builders, car makers - everyone else involved in the driving industry is designing for vision. It’s implicit, and it’s aimed at human vision, but it’s one of the few universal constraints on driving.

That’s what I mean when I say it’s a standards-based argument, that vision is sort of a “universal interface” for roads. Another “universal interface” for roads might be wheels (with traction), or more specifically tires. You don’t need to have rubber tires, or even wheels at all, to drive on roads - but if you do have tires, you can pretty confident that you can drive on pretty much any road you come across.



This is a compelling argument at the surface level (that roads are designed for humans with vision) that quickly breaks down when you examine how Tesla constructs their self-driving system.

Quick disclaimer that this doesn't reflect the views of my employer, nor does any of what I'm saying about self-driving software apply specifically to our system. Rather I am making broad generalizations about robotics systems in general, and about Tesla's system in particular based on their own Autonomy Day presentations.

When you drive on the road as a human, you rely a lot more on intuition and feel than exact measurements. This is exactly the opposite of how a self-driving car works. Modern robotics systems work by detecting every relevant actor in the scene (vehicles, cyclists, pedestrians etc.), measuring their exact size and velocity, predicting their future trajectories, and then making a centimeter level plan of where to move. And they do all of this 10s of times per second. It's this precision that we rely on when we make claims about how AVs are safer drivers than humans. To improve performance in a system like this, you need better more accurate measurements, better predictions and better plans. Every centimeter of accuracy is important.

By contrast, when you drive as a human it really is as simple as "images in, steering angle out". You just eyeball (pun intended) the rest. At no point in time can you look at the car in the lane next to you and tell its exact dimensions or velocity.

Now perhaps with millions of Nvidia A100s we could try to get to a system that's just "images in, steering angle out" but so far that has proven to be a pipe dream. The best research in the area doesn't even begin to approach the performance that we're able to get with our more classical robotics stack described above, and even Tesla isn't trying to end-to-end learn it all.

That isn't to say it's impossible (obviously, humans do it) but I think one could make a strong argument that "images in, steering angle out" is like epsilon close to just solving the problem of AGI, and perhaps even a million A100s wouldn't cut it ;)


That's not really true. Humans, at critical moments, do make implicit and even explicit plans of movement and follow them. We don't use literal velocity measurements for other objects, true, but in making those plans we do sometimes anticipate their locations at various points in the future, which is really what matters.

The best human drivers do this not at centimeter, but at the millimeter level. Look as downhill (motor)bike racing, Formula 1, WRC, etc..., These drivers can execute millimeter level accuracy maneuveurs that are planned well in advance at over 100km/h.


Yeah that's kind of what I was trying to say. You're right in that we predict the actions of others, but we don't do it in the same way. Even when we execute millimeter level maneuvers, we aren't explicitly measuring anything... Like if you were to ask a driver for instructions on how to repeat that maneuver they wouldn't be able to tell you, they just have a "feel" for it.

Basically humans are really really good at guesstimating with great accuracy (but poor reproducibility) and since we don't use basic measurements in the first place, having better measurement accuracy wouldn't really help us be better drivers on average (it does help for certain scenarios like parking though, where knowing the # of inches remaining to an obstacle can be very useful).

But for everyday driving at speed, we wouldn't even be able to process measurements in real time even if someone was providing them to us. AVs are different and that's basically the gist of what I was trying to say. Because they actually do use, rely on, and process measurements in real time, improving their measurement accuracy (ie. switching from camera based approximate depth, to cm level accurate depth from a LiDAR) can have a meaningful impact on the final performance of the system.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: