Tesla’s insistence on using vision alone is pretty dumb. Elon and Andrej Karpathy argued that since humans can drive using just vision, that’s how we should do it in self driving cars, but I think that’s a flawed argument. The proper question to ask would be - if given additional senses, wouldn’t humans use them for safer driving?
Also humans do sometimes do use additional senses when driving. For example I've had to make a left turn at a T intersection onto the cross street, where I had a stop sign and the cross traffic did not.
This was in California's central valley where it can get very foggy making it very hard to see traffic until it was almost in the intersection.
It was a quiet rural area though and by opening the windows on both sides and turning off the radio I could hear traffic quite a bit before I could see it. I'd sit at the stop listening until I'd heard a car or two go by to be sure that it was quiet enough that I could hear them. Once I'd calibrated my senses to that day's current conditions I was able to make the turn.
Listening for cars in fog is great until a Prius comes along, and coastal areas of California is their natural habitat. But, yeah, if I had extra senses laying around, I'd use them for all sorts of things.
Vehicle noise is dominated by road noise beyond about 25 mph -- that's why the mandatory sound on new EVs or Priuses typically cuts out around 20-25mph.
Presumably this intersection is at high enough speed on the cross traffic that it's the road noise tzs is listening for, not engine noise.
That's why in the EU they have introduced requirements for electric cars to make artificial sounds. Sometimes one of them drives past and it sounds like a spaceship from a sci fi movie lol.
This is one of the most retrograde pieces of legislative nonsense imaginable.
Just when significantly cutting noise pollution from motor vehicles in urban areas is finally within our grasp we toss it away by forcing them to make stupid and annoying UFO noises on the grounds of nebulous safety concerns.
There were any number of ways of solving this that would have been less annoying and better for peoples' health[0].
For one thing, most people are able to use their eyes and will learn soon enough that EVs don't make much noise at low speeds and will keep an eye out for them. How do I know this? I live in Cambridge, UK, which is brimming with cyclists. They don't make much noise either, but you learn to look out for them (and very quickly too).
And for those who are partially sighted or blind some sort of warning device + appropriate signal could no doubt have been engineered and legislated.
But, no, we've gone for stupid noises instead.
[0] We now, of course, know that noise pollution does in fact cause health issues and, I'd argue, these outweigh the safety argument.
I don't think you would make this argument if you had seen up close the damage a car can do to a pedestrian. The problem isn't noise. It's that there are two ton vehicles traveling close to unprotected people. Anything that can mitigate that is a good thing. The argument about noise pollution causing health issues is ridiculous in comparison.
I've heard that some EV drivers turn off the sound because it annoys them. They are potentially setting themselves up for a lifetime of remorse.
It's not ridiculous. You need to consider the numbers involved. If one extra person breaks his leg because he didn't look and see the quiet car before crossing the street, that's a fair trade off for reducing the noise pollution for a million people.
Cars are 'less noisy' vs all the things that you mentioned simply because they are omnipresent. They wear you down, grind away at your tranquility, but because they're always there, they do so without you even noticing. A peaceful evening walk is made wearisome by the constant trickle of cars intruding into your hearing. Particularly so in the USA, because the cars there (just like everything else) are vast. Car use in city is a blot on humanity. I wish they would all just go away and be saved for the weekend road trip, or moving house, but not for groceries or commuting.
I have hyperacusis (hypersensitive to noise) and insomnia and I wish all noisy things would go away. I’m one of those people who’s health is severely impacted by noise and I spend a lot of money trying to get away from it. I know that most people don’t have these conditions so they don’t understand how inconsiderate they’re being. It is unrealistic of me to expect others to be more accommodating. If anything people are getting much noisier. I would be exceedingly happy to live in a place where electric cars are the biggest generators of noise pollution. I’ve lived in Europe in walkable cities with little car traffic and it’s worse with drunk revealers singing at the top of their lungs at all hours, parties going until late at night and barking dogs in the early morning. Makes me pine for a HOA controlled gated city suburbia with strict noise controls. Most traffic noise, and especially electric car noise, can be covered up by a noise generator. One place I lived installed an artwork that lit up to different levels depending on the noise pollution, instead of encourage people to be quieter it did the opposite as they competed on who could make the most noise.
You are utterly overblowing one part (noise pollution from EVs, which is really just a bit louder humm, for whatever personal reasons you have) and ignoring the additional, massive and instant benefits of actually saving lives. I definitely appreciate the added noise, so do my small kids, and they don't have to learn this from having school mate killed by ultra quiet car, same for ie elderly. Not living on this planet alone, did you notice?
Noise pollution from ie ambulance, sports car, basically any motorbike, old car, trucks and so on is much much bigger. Where is your outrage for those?
And no, noise pollution from these cars discussed is not causing massive health issues that outweigh people getting maimed and killed by them, thats just your personal preferences (like not owning a car because you are young without a family etc) you would like to push on whole society for whatever personal reasons.
Making life much more annoying at the benefit of a few kids getting run over is not automatically a great tradeoff. As a society, we should be thinking harder about these sorts of conundrums. No one wants to get run over and yeah it's a bad outcome, but how far would you go? How many times do you have to get woken up by an Amber alert before you turn it off?
Maybe his personal reason is that he's absolutely capable of getting out of the way of a car, but he doesn't like noise. That's a reasonable preference. If you absolutely can't guarantee that a kid ends up under a car without the noise, fine, but I doubt that's true.
Sure, we dont live on this planet alone, but that seems like it's more justification for not intentionally making our shared space miserable, not less.
>And for those who are partially sighted or blind some sort of warning device + appropriate signal could no doubt have been engineered and legislated.
What do you have in mind? I can’t think of anything as effective as sound—the blind person needs no special hardware to perceive it, and it can be easily spatially localized.
They turn off over ~18MPH, though. After that, tire noise is much louder. As loud as an ICE car.
So, realistically, you’d hear a Prius if it’s going normal road speeds.
Side note: I wish the sound was more pleasant. When multiple cars are moving in a lot I call it the “choir of the damned”. They’re also louder than ICE cars and that’s kinda lame, we moved backwards WRT noise.
I live in the EU, and EVs are much quieter than ICE cars at low speeds in my day-to-day experience. The volume levels need to be raised significantly from my perspective; they're dangerously low here. There are quite a lot of kids on the roads here, and the difference in audibility between ICE and EV cars is concerning.
Are you saying the typical EVs you hear are _louder_ than ICE cars where you live?
As to the pleasantness of the noise - yeah, that seems to be a manufacturer's choice, or perhaps even driver choice? And let's hope just like annoying ring-tones of years past that the current selection dies out soon...
My RAV4 Prime is much louder than an ICE at low speeds, especially in reverse. It’s so loud it’s sparked a bunch of videos and discussion on how to lower the volume: https://m.youtube.com/watch?v=q1UqicqdzFE
Some have figured out how to disable it completely (just pulling the noisemaker causes a fault code to trigger) but quieting it down to roughly the volume of an ICE seems more reasonable to me.
That noise sounds bad, and unreasonable. I suppose I've just been lucky not to live near anybody that has a car making that kind of noise.
Irrespective of the volume it is remarkable why any car designer thought that this was the right _kind_ of noise for the car to emit. Somebody chose that soundtrack, and you kind of have to wonder why...
> Are you saying the typical EVs you hear are _louder_ than ICE cars where you live?
ICE cars are pretty quiet where I live. Either that or there is so much ambient noise that I've lost some of my hearing but either way I don't hear much of an engine noise with recent cars.
Oh they're quiet here too - it's just that (most) EVs tend to be almost inaudible. They'll make a very quiet whirring noise sometimes, but the volume is so low that you'll easily miss em.
I don't know the people driving these cars well; perhaps they modded them (seems unlikely, given the neighborhood), or perhaps they're old enough to have escaped new requirements; regardless - they're too quiet here in my sample size of 1 neighborhood in NL.
Volvo plug-in hybrids don't make any noise when driving in EV-only mode. They are super stealthy, you have to be super careful when driving around a car park because no one is aware you're around them.
Regulation in the U.S. requires it: “After several additional delays, the National Highway Traffic Safety Administration issued its final ruling in February 2018. It requires hybrids and electric vehicles travelling at less than 18.6 mph (30 km/h) to emit warning sounds that pedestrians must be able to hear over background noises. The regulation requires full compliance in September 2020, but 50% of "quiet" vehicles must have the warning sounds by September 2019.” https://en.m.wikipedia.org/wiki/Electric_vehicle_warning_sou...
Before this a Prius or any modern car would be dead silent at a stop because of stop-start technology. And even those small 1.6L don't make that much noise when idling.
Now you hear all these cars making their high-pitched UFO sound. And it is VERY irritating.
I once placed a snot rocket on a Prius windshield from my bicycle. It snuck up on me. So I did the responsible thing and bailed into a neighborhood so I wouldn't have to look them in the eye.
Quiet vehicles should emit a peace cry, and I should look before I rocket.
As a cyclist who once got that stuff in his face: yes, please look first. Expecting others to make a noise just in case they might get a bioweapon dumped on them is not nice.
I think the idea of EV making additional noise is flawed. The important thing is that people should be looking when using roads and not just relying on hearing. Deaf people are allowed to drive and they can certainly cycle safely as well as long as they just look around a bit more to keep aware of what's around them.
Having fake noise just encourages pedestrians to keep looking at their phones and not use their eyes when crossing roads or cycle lanes and they can injure themselves or others by doing that.
Also, there's far too much noise in busy areas as it is, so it seems unhelpful to deliberately add extra noise to the environment.
I think there's a reason why a new sensor suite is rumoured to be imminent. Much better cameras + radar? Key word being "Project Highland".
It won't be a retrofit for older cars, which tells me current owners won't get to experience that next generation on FSD which will be possible.
I never bought mine (Ryzen '22 LR3 with earlier gen radar, now disabled, plus USS - still in use fortunately) for the FSD anyway so I don't mind. I won't blame those who might do though! (This is presently all speculation/rumours until officially confirmed).
Must be out of the loop. HW4, which is higher resolution cameras and Phoenix HD Radar, has been in S/X for months and started showing up in (at least Fremont) Model Y's with a build date around May 25th. Highland has nothing to do with this, and Model 3 sales are still doing fine, so they don't need to drop anything yet to boost model 3 demand.
Dropping prices doesn’t even have to be about demand. Lot of rivals like Lucid and Rivian struggling mightily in this market. You can pinch them out. You also may just want to share production efficiency gains with the customer for the same reason.
A 3k price drop shows they only need a little bit of a demand boost, especially since the discount is only on inventory cars. Highland will be a massive demand bump for them, since it being new likely pushes some Y purchasers to the 3.
Is this true though? Earlier today mine picked up on emergency lights that were several hundred feet further away in traffic than I would have seen them. It seems able to enhance the images the cameras capture.
Could also be aware of the position of fixed road equipment via mapping software. It's more plausible than the cameras having some kind of super vision.
Realistically the training data contained some amount of emergency lights through fog, so it can identify faint emergency lights through fog as real emergency lights and will appropriately display the warning on-screen.
Also, WAZE and Google maps have had user reported speed traps for some time. So 1 Tesla driving by information the network of the police car and others do not need to observe it to know it's there
In my view, the higher-level issues with the FSD Beta program are:
- A failure by Tesla to view the system that they are developing as what it really is - a physical safety-critical system and not "an AI". Those two are distinct systems as, with a physical safety-critical systems, the totality of the systems safety components cannot be fully expressed in software - neither initially nor continuously.
- To build on that point, Tesla is not allowing the Operational Design Domain (ODD) via a robust, well-maintained validation process determine the vehicle hardware as the ODD demands it to be. Instead, Tesla is trying to "front run" it (ignore the demands of the ODD) by largely focusing on hardware costs. The tension from failing to recognize that is why Tesla, in part, has a long history of being forced to (somewhat clandestinely) change the relevant sensor and compute hardware on their vehicles while promising to "solve FSD" (whatever that means) by the end of every year since around 2015 or so.
I'm pointing out that calling anything "AI" is both pointless and meaningless. It's a buzzword for board members and shareholders to throw around, since they refer to it as the latest LLM technology, while the phrase just means any complex business logic generated by a program.
It’s generally accepted to mean the use of neural networks which Tesla is obviously using. Good luck even identifying a stop sign with “complex business logic” or “if/else”
Most important road signs have rather distinct shapes, standardized sizes and are angled towards oncoming traffic. Having an object with known shape aligned almost perfectly towards the camera is basically the best case for many primitive object detection algorithms.
True, but it’s equally important that a self-driving car be able to recognize a stop sign that is bent from a previous accident and facing an arbitrary angle (as well as one that is angled towards the car’s lane but applies to a different road).
And stop signs that have been altered in some way. For example, rural stop signs that are peppered with holes from pot shots must still be recognized. Snowy stop signs with the bottom half obscured by accumulated drift. Signs with a non-red sticker reading “WAR” placed below the word “STOP”.
And that’s not even getting into cases where you conditionally act like there’s a stop sign. The city of Houghton,MI has major streets along the side of a hill, and minor streets going up and down the hill. Every winter, sand is put down for traction, and every spring it is cleaned away. If there’s a late-season snowstorm after the spring cleaning, cars going downhill on the minor streets physically cannot stop, so everybody on the major streets looks uphill before crossing.
Short of location-dependent fine-tuned models, I’m not sure how machine learning could replicate the logic of “if snowy in late spring, grant right-of-way to cars headed downhill”.
They're "artificial neural networks" and it would seem to me it's recognizing stop signs by comparing them to images of stop signs. So I tend to lean toward "AI" is the latest "buzzword". I think in truth it's more akin to a search engine reacting to inputs, but from sensor data, than anything close to real "intelligence" of any kind.
I can see how it appears to intelligent, but it lacks reasoning, creativity, and critical thinking.
“Creative” doesn’t necessarily mean “generating new behavior”, but can also mean “generating new hypotheses”. Suppose you see a group of young kids playing in a yard. One tosses a ball up into the air, and the rest run towards it. The first to reach the ball throws it back into the air, and the rest run toward it again.
It requires creativity to recognize the rules of the game as “try to be the first to reach the ball”, to recognize that the thrower may not have time to carefully aim, and that the others might chase the ball regardless of its location. Only if all three of those creative leaps are made, then logical deduction can take over to conclude “if the ball goes in front of me, stop before a kid does the same”.
Also, humans and other animals that rely on vision have eyelids and tear ducts and are able to blink and get stuff out of an eye.
Poor, poor Tesla cameras freak out as soon as the sunshine is too bright or there's snow or rain or ice or mud in the way. You'd think if they're going to rely on vision, every camera mounted on the car would have a way to "squint" in blinding-light conditions, or "wipe" the lens or something when smudges, rain, snow, ice, mud, or bug-splat blocks the view. But then, Tesla is insanely cheap, and all that would require parts, and that would impact margins, and that would impact stock price, and so, this is why we can't have nice things ....
Yeah human eyes cover a huge dynamic range compared to traditional cameras that have all sorts of issues with either too little or too much light (blooming, lens flares). Are a Tesla's cameras of the same quality of the human eye? Can they see in the dark just as well?
Not only can Tesla's cameras not do that as well, they may also just shut down if it's too cold, even if they're not covered in snow.
Current self driving systems sometimes fail in perfect weather conditions on correctly marked, empty roads. There's a long road ahead if it's supposed to actually work in the real world.
All the sources I'm able to find say there are no cameras in existence that are as good as human vision. Human vision is quite good and adaptive to real world conditions of all sorts.
Would be nice if someone was prototyping things like that. Id imagine you could sell actual self driving cars for $200k or maybe a bit more. (Cost of a decent luxury car + 1-2 years of a dedicated full time chauffeur)
They go the boring path.
Work together with regulators.
Prove to them that whatever they are doing is actually save to use.
Don’t oversell to their customers.
They go the way of building and retaining trust - with customers and regulation bodies.
Doing something cheaper than competitors is a good thing when you’re achieving the same goal. Doing something cheaper than competitors when there’s a trade-off to the buyer fills out options in the market. Doing something cheaper than competitors when there’s an externality (e.g. a self-driving car that fails to recognize pedestrians) is morally condemnable.
But humans eyes often look away from the road, close during a sneeze etc, and have a very narrow viewing angle compared to a car surrounded 365 degrees by cameras... so there are plusses and minuses.
Human vision isn't that perfect for driving when it's looking at a mobile phone.
Yeah it's shocking to me how many people overlook this. Even if we pretended that the Tesla sensor suite was capable of FSD, it's not FSD if you have to disengage when the lens gets mud on it. Sensor cleaning is an integral part of actually being able to have driverless operation. When I worked at Argo we spent a lot of time making sure that we were designing failsafe methods for detecting and dealing with obstructions (https://www.axios.com/2021/12/15/self-driving-cars-clean-dir...).
Also I think the biggest discovery is that the “brain” part of the human “eyes plus brain” part is extremely hard, and “sensor which can see depth” probably makes the brain part easier.
That said, Tesla was never in a position to use LiDAR because it has generally been extremely expensive. Solid state lidars are supposedly now hitting low volume testing for 2025 production years. Tesla is a mass manufacturer not a self driving start up, so there was never really an option for them to offer LiDAR without an extremely expensive self driving package.
One thing however that was obviously wrong was Elon’s promises, which were extremely misleading and helped build his fortune thanks to the misunderstanding. (Assuming this inflated stock values)
With solid state LiDAR supposedly becoming available for $500 in the next two years (a promise we have heard since 2016 but one that seems possibly to finally be coming true) we may never end up seeing if Tesla could have ever done it with pure vision - they could go with solid state LiDAR for forward facing driving in the next few years.
That said, over promising is going well for them. Perhaps they will just keep doing that.
If Tesla caves and ships LiDAR, I don’t see how they can get out of refunding all previously sold FSD packages. They can continue working on camera-only FSD but it will be immediately apparent that there’s a massive gulf in safety and performance between that and the LiDAR-equipped option.
Tesla of course claims that the HW3 cars will still get FSD at some point, but unless they somehow figure out how to bend light, that blind spot will continue to be an issue on older cars.
That's not what the article you linked says at all. Some HW4 vehicles that are shipping now are shipping without Radar (likely due to supply chain or cost issues). But HW4 was designed with a Radar in mind, it's present in the code, registered with the FCC and there are physical connectors for it on the HW4 compute module. You can see photos of the interior of the Radar in the thread from Green that I linked earlier or in this article (https://www.teslarati.com/tesla-hardware-4-hd-radar-first-lo...)
The fact that some early HW4 units will ship with the updated camera suite but not the Radar only further adds to my point that some users are going to be left with inferior sensing systems despite having paid the same $10k for FSD as everyone else. The whole "radar will be used to train and improve the vision" argument is just nonsense made up by Elon & Tesla fans. A properly functioning radar camera sensor fusion system will be superior in every way to a camera only solution. And there will be 0 Tesla's that actually achieve "full self driving" (ie. you going to sleep in the back-seat and waking up at your destination) until Tesla adds things like a cleaning system to their existing camera solution for example. The hardware is simply inadequate.
I'm not putting into question the fact that having a radar is significantly superior to vision only, that's obvious.
I'm saying that, as we see, HW4 and radar are two distinct hardware configurations: saying that there will be a large sore spot because HW4 cars have radar is objectively false as not all of them do.
I’m super confused… you’re agreeing that having a radar is superior but you’re disagreeing that it’s going to be a sore spot? How is that possible. Regardless of how you name the configurations (HW4, HW4.5 whatever, the naming is irrelevant) the point is that there will be multiple configurations of sensing suites, some of which will be objectively better than others. That’s the sore spot.
The fact that HW4 and radar are separate configurations by name is not important. HW3 is also included in the mix (you can buy FSD on it) and it has totally different camera placement. So radar aside there’s still a difference in the sensor suite.
People who paid $10k and were promised “FSD” and future hardware upgrades have every right to be pissed off about this.
Well, it’s cheaper to retrofit a $500 LiDAR than it is to refund a $10,000 option. I’m curious how they handled upgrades to the more powerful computer they released. And we can expect they will release another more powerful computer again. The same issues would apply, but they have no choice but to deal with it if they ever want to ship a fully functional system.
Refits might cost less than a full $10,000 refund, but they're definitely going to be expensive enough I could imagine Tesla trying to do everything in their power to avoid.
Installing the LiDAR unit in the front bumper would require replacing the entire existing front bumpers since there's no handy fake grille or whatever you can pop out and replace. You'll also have to paint the bumpers to match and blend onto surrounding panels since these cars have been in the sun long enough that a fresh-from-the-factory bumper will have an obvious color mismatch. If you don't, you'll have pissed off car owners to deal with and, in all likelihood, another class-action suit.
A roof-mount avoids that, but has its own issues. You'll needs techs to drill a freaking hole in the roof, mount the LiDAR unit, hope that they manage to seal the penetration well enough that water won't leak into the car, and then run cables through the interior. A whole hell of a lot of cars will leak, even if the work is performed to spec, so you'll have extend warranty coverage to deal with it. Plus, you've now got an ugly box on the roof that may or may not match the paint properly, whereas presumably the new cars will at least integrate the unit's lines into the body panels so it's not quite as obvious. The same thing goes sticking it on the hood. That's to say nothing of any hardware replacements and the fiddly bits that'll be necessary for refits.
> One thing however that was obviously wrong was Elon’s promises, which were extremely misleading and helped build his fortune thanks to the misunderstanding.
I'm wondering why there isn't a law firm that wants to make a fortune by starting a class action suit.
Our Odyssey uses a combination of multiple radar sensors and a camera to provide excellent sensing. From what I have read, millimeter wave is best of both worlds between LiDAR and Radar.
The last link reads mostly like a press release from the manufacturer. I suspect there are reasons why it is inferior to LiDAR not listed on that page. LiDAR traditionally can be kind of impractical but solid state LiDAR will help a lot with that.
If you actually try FSD beta you'll very quickly realize that the vast majority (over 95% probably) of disengagements are because the planner is dumb, not because of vision. In other words, on the screen it sees everything correctly, it just decides to do something dumb.
So currently the vision stack is not greatly holding them back.
The full system for humans is “vision + brain” and for self-driving its “sensors + planner”.
The Waymo/Cruise philosophy is that since we don’t know how to make the planner human-brain-level, we should shift as much of the load as possible to the sensors, where we have the ability to use things that humans don’t have, like lidar and radar.
To me, Tesla FSD going vision-only is a bet on the progress of AI planning models. If the planner reaches a human-equivalent level, then human-equivalent sensors are fine. Time will tell if this is a good bet, but so far it’s not.
This is 100% true. Lidar improves accuracy by millimeters up close, inches at 10-50 feet away and feet beyond that. That accuracy is more than sufficient. Recognition and classification of objects is not improved at all (that part that matters). And, like parent post said, tesla classifies everything very very well, the real issue is that the planner acts completely crazy all the time and is scary.
There are two decisions when driving: go ahead around an obstacle or stop. Even as a human I do not need super high resolution to identify the objects asround, as long as I can identify if is in my path or not.
While our eyes can do that pretty reliable, we are organics and get tired - how many hours one can drive until this becomes almost imposible? I had a situation where I would hallucinate and and start believing something is in the street do I did a full stop - nothing happened, but was quite intense. Imagine the other way around - not stopping and hitting something.
A normal radar + some low level ASIC programming would do that without geting tired. My Audi from 2014 is quite good at that and I actually rely on this feature all the time.
Humans have two eyes that move in sync and can measure distance using the auto-focus feature. They can both point at an object and autofocus to figure out how far that object is. I don't see teslas using moving binocular cameras with 10th of a second autorefocusing in order to judge distances. I don't see how it is the same thing. Of course we can play GTA with just vision, but I'd argue the average person crashes in GTA more than they do in a real car.
I once barely survived a ride with a taxi driver in Tijuana who claimed to have learned to drive playing GTA. Based on his real-world driving it seemed plausible although We did not have to visit a spray’n’pay at any point.
I never got to seem him play GTA so I can’t verify or refute your hypothesis.
I never bought that argument considering roads live in 3 dimensional space and our eyes and brain are constantly trying to decipher 2d space into depth. Seems like an extra hop that would be better cut out.
I agree with your central point but take issue with this characterisation of human vision. For people who have two functioning eyes, the perception of depth is baked in. Our subjective experience of a 2d image is an illusion. In fact, much of our vision isn’t quite what we think; for example, what we think we’re seeing in our peripheral vision may actually get filled in based on inference and prediction.
> For people who have two functioning eyes, the perception of depth is baked in.
Actually, my understanding is that the depth perception induced by binocular vision is relevant only within a relatively short range (like, single-digit number of feet away), which makes it relatively useless for long-distance depth perception needed for driving.
So it's not useless for e.g. pulling into a parking spot or steering around a close vehicle.
(I'm crosseyed and don't benefit from binocular depth cues. For the most part I do alright, though rarely I'm comically off when someone throws me a ball or I'm picking up something close to me).
I have one eye. Can confirm its the same to me. Also always comically off with baseball and tennis. Pouring tea is also tricky for me unless I am holding both the pot and the cup.
Rotating your head will give some lateral motion (your eye moves in space since the centre of rotation is your neck). You can see this easily by turning your head while focusing on an object in the foreground, your perspective on it will shift.
Side to side motion is actually what I meant and what I do sometimes in poor weather driving or when I have branches up close and blocking my vision while trying to see beyond them.
It's difficult to accept Elon and Andrej's reasoning. I suspect that if one asks a team of engineer to research designing a self-driving car, the team wouldn't come back with the argument to use vision since "human could do it with eyes." I expect a list of options with the pros and cons of each approach, along with an estimated timeline and cost.
That doesn't nullify their argument, though. If Lidar were free, it doesn't mean you need to have Lidar to achieve the same level of performance as humans, and having Lidar doesn't mean the data is so clean that the decision-making aspect of self-driving becomes solvable in a weekend.
- why would the goal be "the same level or performance as humans" ?
For context, some towns are actively removing cars from whole areas not just for pollution impact but also for pedestrian safety. Moral issues aside, the status quo is just not enough, it needs to be way better.
- achieving the same level as humans being possible in theory doesn't mean we'll get there in practice.
Having enough hardware to realize something doesn't help if the software is not up to the task. And assuming they "just" solve the software issue could be like assuming 18th century people would "just" discover relativity.
Software becoming as good as human in video processing just feels like a "general AI is around the corner" kind of expectation.
> Having enough hardware to realize something doesn't help if the software is not up to the task. And assuming they "just" solve the software issue could be like assuming 18th century people would "just" discover relativity.
This is what I mean with the last part of my argument. Lidar is supposedly an extremely thorough 3d depth map hopefully capturing at hundreds or thousands of FPS. But even if you have this data, the actual bulk of the problem with current self-driving isn't solved, that being the "business logic" for how to navigate the world smoothly and efficiently and to 'communicate' with other road users.
There is another argument: this will be cheaper to produce, and therefore cheaper at retail for the same margins, and therefore will sell many more instances, and therefore will save many more lives.
It's possible that the richest person on Earth is more concerned with doing good slash achieving his goals vs obtaining more currency/profit, which it would seem would have little to no marginal utility to him.
I am pretty sure that Cruise, Waymo, and Tesla are each and all doing everything they can to "make sure the darn things actually work". It's literally an existential crisis for them if they do not.
They are all, in the terminology, presently "default dead" until they figure it out.
That’s like saying “if cooking is just following directions in a recipe, we should just follow directions in a recipe.”
The result is subpar food because most recipes have a 1% problem called “seasoning”.
The “seasoning” of driving — the completely unpredictable and intuiting 1% of situations you find yourself behind the wheel where you just have to draw on your intuition and gut instinct — are the reason we need nothing short of AGI for _completely_ self driving vehicles.
I do think, though, trucking is ripe for AI disruption.
Humans also use inertial sensation, vibration, force feedback through the steering wheel, and their “cameras” are constantly changing their focus and position in space to construct a rich context of the environment.
Now, all of these things except the last one have some representation through an electronic or electromechanical sensor, but gluing them all together into what it takes to deal effectively with the intersection of vehicle dynamics and environmental dynamics is very hard.
Humans also don’t have 8 eyes facing every direction at all times. They also get drunk/tired/impatient/angry etc. The reality is the entire argument is silly. Both are very different and Musk/Karpathy argument is misrepresented here. Saying humans only use vision was a response to “its not possible with only vision” not a statement that human vision is good enough and no need to do better. The 8 camera surround is leaps better than human vision. Where they lack is processing the signal. Human brain does that better. But if you have better inputs (we do already) and you believe you can one day match on the processing part, you’ll one day get a much better result. One thats suited to the vision based roads we have now and scales to literally anywhere not geo constrained like Waymo
Indeed, but humans also have an incentive to drive well, embodied by local traffic police and local laws, and even before passing their driving test they're made aware of the penalties for not driving well (which, let's remind ourselves, range from "mild ticking off"/"pay $$$" through "forfeit driving licence for a time" all the way to "forfeit liberty for a time")
Where are these incentives for self-driving algorithms?
If your algo breaks the law to a sufficient level, is someone(something?) prevented from driving for a time? Is that really going to be just that one vehicle, or should it be all vehicles with that same algo? If something really bad happens, who is charged; in the worst case, who might end up going to jail?
We all know CEOs tend to believe "this time it's different", that they're special, and that the annoying rulebook is to be viewed as guidance at best. VW/Martin Winterkorn, anyone?
> Where are these incentives for self-driving algorithms?
Surely the equivalent is the reward during training?
> If your algo breaks the law to a sufficient level, is someone(something?) prevented from driving for a time? Is that really going to be just that one vehicle, or should it be all vehicles with that same algo? If something really bad happens, who is charged; in the worst case, who might end up going to jail?
Personal opinion:
Algorithm should learn from fleet and should be shared by fleet; therefore all accidents should be treated like aircraft crashes and investigated extremely thoroughly with a goal of eliminating root cause.
If that cause was CEO demanding corners be cut to boost shareholder value then jail them; if it's that the algorithm had, say, never seen a flying shark drone[0] before, and misclassified it as a something it needed to take evasive manoeuvres to avoid and that led to a crash, then perhaps not (except anything I suggest probably should be in their list of things to check for, so even then perhaps it would still be a CEO-at-fault example…)
> Surely the equivalent is the reward during training?
Surely the counter-example to when a self-driving vehicle drives straight into a stationary fire truck?[0]
If a human driver did this more than once (and lived to tell the tale!) - yet had no explanation other than "Of course I saw it, but I wasn't sure what it was and didn't realise I needed to avoid hitting it it <shrug>" - wouldn't they lose their driving licence fairly quickly?
You asked for the incentives for AI; the equivalent isn't the same as for humans.
The nature of the AI doesn't include a concept of prison or licensing, so it can't be threatened with it, for the same reason I can't threaten a human driver with Af'nek-leigh D'Och entRah'negh.
I can however 'punish' (air-quotes necessary because it might not feel like anything) an AI by altering the weights and biases of its network — once done, it then thinks differently.
Don't anthropomorphise it, that's a category error.
Also, the field of "how does it even?" is tiny, which is itself a reason to not grant them control of vehicles, but that's a separate issue.
> You asked for the incentives for AI; the equivalent isn't the same as for humans. The nature of the AI doesn't include a concept of prison or licensing, so it can't be threatened with it [..]
There certainly should be incentives for the humans creating an AI, though.
> Don't anthropomorphise it, that's a category error.
Volkswagen [human!] engineers created the illegal defeat devices in Dieselgate, under the supervision of their [human!] managers. The device is illegal, we punish the humans in charge when laws are broken, not the devices themselves. It should be the same with AI.
If this means software engineering becomes a field where you need mandatory liability insurance to work on AI, is that a bad thing?
In the glorious words of Stelios Haji-Ioannou, "If you think safety is expensive, try [having] an accident"
A camera that is actually better than the human eye is pretty difficult to find, and they cost around ~2000$ each, and even then you'll have worse peak resolution in the day and worse motion characteristics at night. Human eyes are pretty good!
I am sympathetic to this view (I would really love to see just how safe it’s possible to get), but I think the Musk/Karpathy-style argument for vision-only self-driving is quite strong, and it only seems flawed because it has been incorrectly simplified as “humans do driving with ~only vision -> computers should do driving with only vision”.
The proper argument is “humans do driving with ~only vision -> roads are therefore universally designed and built to be driven via by vision -> computers should do driving with only vision”. It is essentially a standards-based argument: since vision is the universal standard for driving, computers must be able to drive using just vision.
So vision is always going to be the core of self-driving. Why not augment with LIDAR anyway?
Well, in situations where vision and LIDAR are both right, you didn’t need LIDAR; in situations where vision is right and LIDAR is wrong, you didn’t need LIDAR and it potentially made you worse off; in situations where vision is wrong and LIDAR is right, you need to spend more on improving your vision; and in situations where both vision and LIDAR are wrong, you need to spend more on improving both, but improving vision is a higher priority. These are all the possible outcomes and none of them make a compelling case for investing in LIDAR.
> hink the Musk/Karpathy-style argument for vision-only self-driving is quite strong
> humans do driving with ~only vision -> roads are therefore universally designed and built to be driven via by vision -> computers should do driving with only vision
What is 'should', is it a moral imperative? Is it a social obligation? Who made this argument, a catholic priest?
Where is consideration of this argument from an engineering perspectove - analysis of advantages disadvantages, where consideration of cost benefit? Where is assesment that, for example, 50% of human crashes are due to poor visibility or spatial awareness and comparison of how well computer handles them?
If I posted this vacuous, unsupported argument here, I would be laughed at, and rightly so.
But if Elon announces something, there is always 10% of the population willing to defend it, no matter how dumb it is.
I don't get it. Do you have an experience in designing navigation systems? In Stereo vision systems? In computer vision? Or is this just a "Musk bad therefore idea he has is bad" counterreaction to what he said?
Karpathy is one of the world's top self driving engineers. This isn't a vacuous argument. People are driving with just vision every single day. The part we're missing is the ChatGPT moment on the computational side.
I have produced 3D maps with lidars, drone mounted near-infrared cameras and with thermal infrared cameras.
You can tell apart grass and green carpet with a simple formula. You can coint trees without machine learning. Yoi can detect which plants are whilting, land that is wet from land that is dry. All of that is easy with the right sensors - becauae they have more data than an RGB camera can produce.
I know people that work with mutispectral imagery, they can tell you that pixel N45 has a spesific substance - concrete, steel or wood - jusy from spectra alone. Thye dont need to know what pixels around it are showing, or classify objects.
Agreed, I have a similar background with both LiDAR and vision for 3d reconstruction and mapping systems, plus I've designed some fairly impactful commercial multispectral software which is now widely used in the agricultural space. And vision can give you perfectly sufficient data to build world models and to localise yourself rapidly and robustly. What I believe is missing on the Tesla side is primarily on the navigation and 'social interaction' component of driving.
It's not like Waymo dropped a LiDAR onto the roofs of their vehicles and started driving unsupervised in traffic the next day. Nor Cruise, nor Uber. The sensing is just a small part of the whole system.
"It is difficult to get a man to understand something, when his salary depends on his not understanding it" seems apt here regarding both Karpathy and Elon. You can call me skeptical, but when there are millions and billions of dollars on the line for the two respectively, I don't know if I believe in Karpathy's expertise (which is in AI, not self-driving per se) and personal integrity sufficiently to believe he is doing what he considers to be the right thing vs. putting profit ahead of human lives.
Do you have any expertise on self driving, remote sensing, computer vision, navigation systems or anything else on this topic? Do you have a Tesla with the FSD package and participate in the beta program?
From what I've heard firsthand Autopilot still steadily improves, irrespective of what people say about their favourite sensing modalities...
radar is not lidar and is present on lots of vehicles that do L2/L3 driving except newer Tesla. optical sensors do not inherently tell you distance as a function of their sensing, whereas radar does.
a vision only approach _may_ be possible at some time, but only with a strong computational model of the human brain and thought process.
also, most people drive poorly— i wouldn’t say vision is the be-all-end-all of autonomous driving. it’s also clear that waymo and cruise have taken a full sensor based approach and are successful, whereas tesla is not.
I originally had “radar/LIDAR” everywhere you see “LIDAR” in that comment but it got really unwieldy halfway through. I think what I said generalizes from the specific example of LIDAR to other forms of sensing pretty well anyway, so you can just sub in radar if you want. The general principle is “vision” (in the sense of cameras feeding 2D image data into something that is probably a neural network) vs “everything else”. I would have said cameras vs sensors but some of the sensors use the visible light spectrum and so their sensors are called cameras. I like your use of “optical”, that might be the cleanest way to point at what I meant.
I broadly agree with your second point, about vision-only presenting big computational challenges. I think you do get some easy wins that bring down the challenge a bit - e.g. you don’t need to model human brains, you just need to model whatever the brain is doing when it’s driving; also the fact that we can teach people to drive without understanding what their brain is doing is a reassurance that we can teach a neural network to drive without understanding what it is doing either, so it frees us from (some) of the modeling of thought processes as well. But it is still a big computational challenge. I heard that Tesla has a server farm with thousands of Nvidia A100s, if true, that could make a dent in the problem for sure.
And yeah, I also wouldn’t say vision is the be-all and end-all when it comes to driving. (It’s a pity that we can’t easily integrate LiDAR, radar, and other sensors into the human brain so we could use them like we do sight and sound in order to drive better.)
My point is more that roads come in all shapes and types and sizes, but one consistent thing about them is that they’re all designed so that humans can use vision to drive on them. Like, you don’t know if future roads/signs/cars will be built in ways that are hard to read with LiDAR, but you can be pretty confident they won’t be built to be hard to see. Road builders, car makers - everyone else involved in the driving industry is designing for vision. It’s implicit, and it’s aimed at human vision, but it’s one of the few universal constraints on driving.
That’s what I mean when I say it’s a standards-based argument, that vision is sort of a “universal interface” for roads. Another “universal interface” for roads might be wheels (with traction), or more specifically tires. You don’t need to have rubber tires, or even wheels at all, to drive on roads - but if you do have tires, you can pretty confident that you can drive on pretty much any road you come across.
This is a compelling argument at the surface level (that roads are designed for humans with vision) that quickly breaks down when you examine how Tesla constructs their self-driving system.
Quick disclaimer that this doesn't reflect the views of my employer, nor does any of what I'm saying about self-driving software apply specifically to our system. Rather I am making broad generalizations about robotics systems in general, and about Tesla's system in particular based on their own Autonomy Day presentations.
When you drive on the road as a human, you rely a lot more on intuition and feel than exact measurements. This is exactly the opposite of how a self-driving car works. Modern robotics systems work by detecting every relevant actor in the scene (vehicles, cyclists, pedestrians etc.), measuring their exact size and velocity, predicting their future trajectories, and then making a centimeter level plan of where to move. And they do all of this 10s of times per second. It's this precision that we rely on when we make claims about how AVs are safer drivers than humans. To improve performance in a system like this, you need better more accurate measurements, better predictions and better plans. Every centimeter of accuracy is important.
By contrast, when you drive as a human it really is as simple as "images in, steering angle out". You just eyeball (pun intended) the rest. At no point in time can you look at the car in the lane next to you and tell its exact dimensions or velocity.
Now perhaps with millions of Nvidia A100s we could try to get to a system that's just "images in, steering angle out" but so far that has proven to be a pipe dream. The best research in the area doesn't even begin to approach the performance that we're able to get with our more classical robotics stack described above, and even Tesla isn't trying to end-to-end learn it all.
That isn't to say it's impossible (obviously, humans do it) but I think one could make a strong argument that "images in, steering angle out" is like epsilon close to just solving the problem of AGI, and perhaps even a million A100s wouldn't cut it ;)
That's not really true. Humans, at critical moments, do make implicit and even explicit plans of movement and follow them. We don't use literal velocity measurements for other objects, true, but in making those plans we do sometimes anticipate their locations at various points in the future, which is really what matters.
The best human drivers do this not at centimeter, but at the millimeter level. Look as downhill (motor)bike racing, Formula 1, WRC, etc..., These drivers can execute millimeter level accuracy maneuveurs that are planned well in advance at over 100km/h.
Yeah that's kind of what I was trying to say. You're right in that we predict the actions of others, but we don't do it in the same way. Even when we execute millimeter level maneuvers, we aren't explicitly measuring anything... Like if you were to ask a driver for instructions on how to repeat that maneuver they wouldn't be able to tell you, they just have a "feel" for it.
Basically humans are really really good at guesstimating with great accuracy (but poor reproducibility) and since we don't use basic measurements in the first place, having better measurement accuracy wouldn't really help us be better drivers on average (it does help for certain scenarios like parking though, where knowing the # of inches remaining to an obstacle can be very useful).
But for everyday driving at speed, we wouldn't even be able to process measurements in real time even if someone was providing them to us. AVs are different and that's basically the gist of what I was trying to say. Because they actually do use, rely on, and process measurements in real time, improving their measurement accuracy (ie. switching from camera based approximate depth, to cm level accurate depth from a LiDAR) can have a meaningful impact on the final performance of the system.
This doesn't sound like much of a barrier to me. If you're a human training the LiDAR system, couldn't you just consult the image or video to help label whatever the LiDAR is seeing?
A good supervised learning process requires teaching humans to label consistently.
Imagine trying to write down precise instructions to train hundreds or thousands of humans to label many different types of objects using a tool like the above. Now hire, train, and manage those humans.
Compare that to having the humans draw rectangles around 2d color pictures of cars.
Also note that such tools need to be built and improved.
Is it possible to transfer learning from vision to LIDAR? Maybe if it's possible to map visual images to LIDAR images and vice-versa (by running a car with both cameras and LIDAR and learning their associations)
Probably everybody does it for themselves. So Tesla tries to have solely vision FSD while Waymo I guess does what you suggest. However seeing with lidar is not like vision only. Maybe at somepoint they share their code as OSS, probably not or very late.
Isn't the typical training data used in self-driving basically things like object labeling/segmentation and motion prediction? I'm not sure why that would be significantly different for visual vs depth-map data.
> Elon and Andrej Karpathy argued that since humans can drive using just vision
Is this maliciously specious or am I missing something? I drive using vision plus decades of life experience and all the tacit knowledge, judgement, and reasoning ability that comes from that. We have not reproduced any of that with math, and getting/stalling 90% of the way there with mimicry is not good enough.
I don't think it's super surprising that someone who sells a product called "Full Self Driving" that isn't fully self driving would also happen to lack rigor in their scientific claims.
Even just comparing sensors, the mediocre cameras Tesla uses are absolutely pathetic compared to the ability of a human eye. And I'm guessing it'll be some years yet before we have some kind of parity there. Not to mention the computing power to process the data.
It has more to do with training a neural network. The environmental cues for driving on a road are optimized for human vision. This becomes important when you think about it from a machine learning perspective. Fewer inputs are better for many reasons. Non-visual inputs will sometimes be in disagreement when the visual inputs during training which leads to a worse model.
If everyone could take some deep breathes and press pause on their emotional response to Elon Musk (and not assume everyone who happens to agree with him has a Musk tattoo) then they would fine plenty of rational arguments from an engineering perspective.
I'm still annoyed they got rid of the ultrasonic sensors on the latest model. I test drove a 2022 (or maybe a 2021) and the park assist was pretty good. And then I get the 2023 delivered and there's no park assist for the first 6 months because they removed the ultrasonics but the vision-only software wasn't ready yet. A vision-based park assist finally came in via an OTA update but it's nowhere near as precise as the ultrasonic version was. Like, the estimates of how much distance is in front of me seem to jump around a lot more than I'm actually moving, and it sometimes reports it's degraded when trying to pull out of a tight spot.
Humans not only use vision, they use sensor fusion. Combining what you see, hear, touch, etc. Your body can perceive acceleration, for example.
On top of that, you have theory of mind. For example, you have 4 cars next to you, all of them with opaque windows that do not let you see the driver:
A) a loud sports car with a bunch of modifications, decals and racing related stuff
B) a grandma car with cat related stickers
C) an unmaintained car with collision damage, and loud music coming from it
D) a family station wagon with a baby on board sticker and other family related stuff
Your mind will process what it sees and quickly assign each one of those cars a different personality. A and C will likely be perceived as riskier cars, B and D will be likely perceived as safer cars. You will avoid A and C and remain unconcerned about B and D.
The problem with the self-driving cars right now is that they only perceive the road as bodies that move.
I just imagined myself driving without sound. That seems crazy to me. I need to hear cars, kids playing, etc.
And you're right that we subconsciously assign risk values to each car. A heavily modified BMW with decals? Could be an irresponsible young male adult trying to show off on the road. I should probably be prepared to brake or let him go first.
And assumption like that are probably a great source of accidents, our mind needs to take some shortcuts like that and isn't always right. Grandmas car got sold last week and daddy is alone in the car and late for work and will be racing to get in front of you.
I'd rather have a computer keep track of everybody just the same but with millisecond reaction to all changes. Something that I can't do lacking eyes all around and processing power.
Reminds me when I drove a cab in my early 20s. I had a psychological bead on other drivers, I knew what they were going to do before they knew it. (And to some extent I still do, I just try not to be a dick about it.)
AI will probably eventually very good at picking up these behavior tendencies. Or at least better than the people who aren't driving all day.
Humans use a lot of reasoning though. I once saw a guy approaching an intersection where he had the red light and from far away I could see he was jamming out to the music and in his own world. I didn’t go on green (getting honked at by the guy behind me) and watched while the guy blew straight though the red light and slammed on the brakes when he was almost fully through it.
It's fairly obvious that cars cannot think at the level of humans and are at a disadvantage sometimes. We also don't need fully autonomous driving to prevent careless mistakes.
The argument is more complicated than Elon makes it out to be. "Humans can drive with vision alone => computers can drive with vision alone" implies computers can do anything the human brain can do. It's not a given, and it's certainly not true for the compute power in a Tesla. It's completely possible that all of the following are true:
* Humans can drive with vision alone
* A sufficiently advanced compute system can drive with cameras alone
* Telsa's FSD computer cannot match human performance without additional sensors
I've watched FSD videos quite a lot, and I've never seen an issue related to sensing / visual detection. It always draws the road and other cars with decent accuracy. All issues happen in the planner, i.e. the car makes a wrong decision, not because it can't see something, but because it doesn't know what to do in that situation.
The hard problems in autonomous driving are not related to sensing, but deciding what to do in weird or complex situations.
See this video for an example of a typical problem in recent FSD (and you can see from the screen that it's not related to sensing / detection): https://youtu.be/eY3z1kgX5hY?t=74
It has trouble detecting speed bumps and potholes. I'm not sure if this is because the vision sensors fail, or because they have not been programmed to detect/display these features properly. Whatever the reason, the planner then accelerates right into them.
Sensor fusion is actually a hard problem. Yes, more different kind of sensors can lead to poorer results. Imagine having two different views in the world at unsynced points in time, and making decisions on that. It isn’t weird there might be a focus on LIDAR, or vision, but not both at the same time, at least for real time decision making.
I don't have to imagine this. My brain is doing sensor fusion every moment of every day. A lot of the time that involves conflicting data, and your brain has to decide on the most reasonable interpretation. When it's not at its best you get things like optical illusions, nausea, etc.
It's a hard problem to solve, but that doesn't mean it's a bad idea. I think most people would agree that human beings are better off with the overlapping set of sensors that are available to us compared to the alternative.
I work for Google and I like what Waymo has accomplished (disclaimer: what I work on is nowhere near Waymo). Tesla is making a different bet with different engineering resources. And why would Google give up their secret sauce to Tesla?
Cannot agree more. Even that argument was not only flawed but rather false. We use at least hearing to help driving. I'm not sure I'm alone, after closing all windows, I feel a huge different in the car just like I was disconnected from the outside.
This argument is really dumb. I understand deaf people can drive, too, but the additional auditory input is very helpful. When an emergency vehicle is far behind me, I hear the sirens and know to look in the mirror and move right to let it pass me. I can hear the bells of a railroad crossing even if I cannot see its lights blinking due to the road curvature, and start reducing my speed so I approach it smoothly. Some stupid junctions like the one described below, I hear a car approaching well before I can see it.
That said, there is no point making a self driving rig if it’s going to only be as good as the best humans are. For adoption, it must be provably better in any situation imaginable: moving obstacles, weather, dust, dark drunk humans in the night, emergency vehicles, no lanes drawn on the road, read all road signs correctly (say my Yaris gives me mistaken readings where maximum mass is confused with speed, it doesn’t know what “built-up” area means with regard to speed limits, it sometimes reads a sign that belongs to an adjacent road). For it all to work, you need more input than vision. And redundancy. Lots of redundancy.
This is a mischaracterization of what he is saying. Humans are unable to drive without vision. Even Chris Urmson agrees that LiDAR is a crutch that measures distance as opposed to computing it (albeit it only works in perfect weather). Musk is saying that if you are going to use a sensor to measure distance, don’t use photons in the visible light spectrum. Instead use photons that can penetrate objects (microwaves) to capture information that you wouldn’t have. The challenge is building a high-precision RADAR, which Tesla is attempting to do. Some HW4 vehicles have it but it’s unclear whether it is required or not. Ultimately, building a L4 AV requires solving very difficult problems in computer vision, which is exactly what they are trying to do
Musk did not publicly endorse high resolution radar until recently, and using high resolution radar is not a unique feature of Teslas. Waymo currently has 5 high resolution radars on their cars.
Given what is public now, I'd speculate that perhaps they were working on replacing radar all this time. Then they used vision as an excuse to dump the existing radar early when there was a parts shortage.
I see Musk acting a lot like Jobs in certain ways. They were both very cagey about the future direction of their products. Although Musk didn't exactly backtrack on radar, he wasn't transparent about Tesla's plans either. You don't telegraph your moves to your competitors. Similarly, I'd be very surprised if Tesla wasn't keeping an eye out for lidar crossing a specific cost/function threshold. They're not saying that aloud, though.
This is anachronic. Musk was making the vision argument to claim that Teslas sold at the time (which lacked any kind of LIDAR or radar) had all of the hardware they needed to do FSD, which was sold as "just a few months away via software update". I believe that they still make this claim officially, since it's important to deny that they dif false advertising on this front.
Tesla has been working on building a high res radar since 2018 and have yet to officially announce anything. Tesla dropped radar in new vehicles in mid-2021.
Musk has been claiming since 2018 that Teslas sold at the time have all the hardware needed to be fully independent driverless taxis. He made this claim very very clearly in conferences and marketing materials. And Teslas sold at the time only had cameras.
That he later may have changed his tune is probably true, as reality does eventually catch up even with someone like him.
Why is it self-evident? If your argument is that people can drive with only vision let me stop you right there and point to the fact that people are terrible drivers and the bill is in millions of injured and dead a year.
The main argument after the collision? "I haven't seen them!".
It's smart if you realize that they never had a choice. No matter what Musk says, they could do vision-only or nothing at all. Google started their program in 2009 and had a pile of cash. Tesla started their program in 2015 ish and until 2019 they were in a precarious financial position. So they never had the money or time to take Google head on. With vision, they could at least use their position to their advantage.
And it's a good bet. There's tons more to self driving than perceiving the world (lidar does not help a driver what to do in a novel and ambiguous situation, it's not a perception problem) and Tesla's vision is quite good based on all the FSD videos on YouTube.
Forgive me if this seems like a knee-jerk response, but literally zero human drivers use "vision alone" to drive. Humans drive with a spacio-temporal model of the world, visual, auditory, and haptic feedback, logical/symbollic rules about driving norms, emulative models of other drivers/agents in the driving environment, ethical judgments about what it is ok/not ok to collide with... and so on and so forth.
Reducing that to "vision" doesn't even make superficial sense.
I always wondered, and don't remember seeing it specified one way or another, do self-driving AIs use any sort of temporal modelling for the environment?
A model which can use predictive behaviour for the objects that the visual part detects seems orders of magnitude better than one that just does visual detection from scratch. Seems like a huge wasted opportunity to have to model the world starting from zero for every frame they receive from the cameras.
Objects that pop-in and out of the field of vision due to occlusion or other reasons seem to trip Teslas (in at least some of the reported incidents), but even like that it's hard to believe that they didn't implement such an obvious improvement.
It remains to be seen: Musk is making a bet that Tesla could win big or lose here. It is an interesting bet to be sure, but Waymo's bet is as well (sensor fusion, and LIDAR will become affordable over time). I like that we have some diversity in the bets being made at least.
> Elon and Andrej Karpathy argued that since humans can drive using just vision, that’s how we should do it in self driving cars, but I think that’s a flawed argument.
Of course. The real argument is "LIDAR is expensive in comparison and after scamming people for almost a decade, we have to be careful what kind of money we ask for". LIDAR was never considered an overkill by Elon.
That was just their reply to “is it even possible?”. Their argument is that vision actually works better than sensors. The signal processing data it receives is higher quality. You just have to know how to process it like the human brain does. It’s a harder solution, but if solved a better one
Humans also use sound as in tires squealing, something going "thump, thump, thump", someone screaming, horns, sirens, "crash, tinkle, tinkle, tinkle", etc.
Humans also use their sense of motion from the car sliding, rocking, tilting, jerking, etc.
Then humans integrate all such input, combine that with what they know about how cars work, traffic, people, the road, diagnose what has happened, apply years of prudent judgment, and then decide what to do.
E.g., maybe they have an ice chest with 20 pounds of ice which has melted and now, due to the motion of the car, has tilted, spilled, and is about to get the dog soaked with ice water. Good luck with the self-driving with a good response to such a scenario.
Humans can drive using just vision, but the autopilot doesn't have the processing capabilities of a human by a long long margin. So the smartest thing to do would be to compensate the worse processing capabilities with better senses.
I'm not privy to the legit decisions around Teslas tech vs PR that I occasionally run across. Does this mean they are dedicated to only using traditional vision comparable to humans or is that just their focus at the current time? I can appreciate there's only so much tech that can be reliably developed for mass production at a given time but Elon also tends to make some odd choices/statements out of principle.
Obviously they know that. They are just betting that they can do it with vision only, to save on expenses, and also work with the way their cars work today.
I agree. Vision only approaches are cheaper but far more difficult in the long run.
But that wouldn't be a problem if Tesla didn't advertise their cars as containing all the necessary hardware for self driving. If Tesla admitted that cameras were insufficient for self-driving, they would open themselves up to legal liability.
My question to Musk would be,"Is there one, cheap, device that could be deployed widely on roads that could improve self driving cars enough to push the technology into viability by a substantial margin?".
Not to perform the main driving task though. Deaf people tend to be able to get a driver's license no problem, and proprioception is mostly irrelevant unless you need to know where your feet are to ensure they're on the correct pedals.
I brlieve self driving can be achieved through standards, e.g. if roads become digital. Street signs and cars could communicate the current state of traffic in real time with other cars. It should be left to the state to detect certain events. E.g. traffic jams or, if a bicycle or pedestrian is about to cross the road. This way liability issues could be fairly split up between law- and car-makers.
That sounds great if we want to rebuild our entire road infrastructure. So using this for self driving will never happen since it would require every country to implement a very costly standard.
Sounds great in practice but isn’t realistic in the slightest
Not sure that a rebuild is necessary, could be a simple as beacons on the back of emergency vehicles, 'qr' coded signs, and reflective road markers. Tesla already detects traffic cones...
>> Sounds great in practice but isn’t realistic in the slightest
And that's also why a "self driving" car isn't realistic.
I think the proper term for what we have now should, at most, be something more akin to "Assisted Driving Features" because right now it's far to blurry.
This implies that they can create a piece of software as intelligence as the human brain. Absolutely absurd reasoning. If they are being serious with such a statement then they are complete fools. More than likely they are just making excuses for why they are being cheap. Still fools, but maybe not complete fools.
It doesn't imply that at all. That's like saying openai is creating a human level intelligence with chatgpt. Emulating a single function a human is able to perform really well is not the same as aiming for human level intelligence.
This has nothing to do with ChatGPT. They claim they only need vision because a human only needs vision. Although that statement in of itself is false because humans have other senses, but they can't know how much complexity of the human brain is required to make only vision image processing work. If they can't at least replicate that level of intelligence then they have no business making such a claim.
Also, humans are pretty horrible at driving. They constantly commit traffic violations and kill eachother. Driving kills about 1 in 103 people in their lifetime.
There is no other day to day activity where this level of risk is considered acceptable.
What are you talking about? Humans are shockingly good drivers. It is a average of ~80,000,000 miles, or ~5,000 years of regular driving between fatalitys and that includes the motorcyclists, drunks, and people who do not wear their seatbelts who account for ~70% of all deaths if I recall correctly. If you are a average driver who does not drive drunk and who wears your seatbelt and you started driving when agriculture was invented you would not be expected to have gotten into a fatal accident yet.
Anybody who says humans are bad drivers is almost certainly underestimating the difficulty of replacing humans by a factor of 1000x.
We can compare against another mode of human-controlled transportation. There are 1.37 deaths per 100 million passenger-miles driving in the US [1]. In comparison, there are ~0.2 deaths per 10 billion passenger-miles flying. Converting into the same units, there are 137 deaths per 10 billion passenger-miles driving. So you are 685X more likely to die while driving/riding in a car than flying. That's almost three orders of magnitude worse! Humans are pretty terrible drivers in comparison to how good we are at flying.
Pilots have mandatory sleep cycles, drug tests, significantly greater initial training, backup pilots, and dedicated airspace. If you could wipe out all of the tired, drunk/high, and teenagers from the road, I bet the driving stats would look significantly better.
We don't ask pilots to do basically anything. They are given dedicated lanes and have constant radar monitoring them anywhere near an airport where they may be expected to somehow come in contact with another plane.
Compared to sharing roads going opposite directions at high speeds inches away from each other, it is no contest that humans driving is the much more impressive number.
You should compare with GA to get closer to apples-and-apples. Comparing a highly regulated industry with everybody from 16 year olds to 90 year olds over and extreme range of experience and health isn't going to give you a useful result.
And even GA pilots are probably in much better physical shape than the general public that drives cars.
Flying (the type done commercially) is a much easier task than driving, well maybe except for takeoff and landing where the majority of miles are not spent.
The pilot can basically sleep most of the way.
I don’t know how we could say humans are “good” or “bad” drivers. What are we comparing against? We’re the only thing that drives cars (other than a few self driving cars).
If someone had said said “trains are safer than cars” I’d agree with that comparison, but the question is are humans good at driving cars. I don’t know if we’re particularly better at driving cars than trains, trains are just better designs.
I mean, you wouldn’t say humans are good at tic-tac-toe, and bad at chess, right? Chess is just a much harder game.
> It is a average of ~80,000,000 miles, or ~5,000 years of regular driving between fatalitys
That's actually not a very big number. 5000 years of regular driving is about the lifetime driving for 100-150 people. Which means one of them will have a fatal accident within their lifetime.
Why are you counting only fatalities? They are low because modern cars have lots of safety features. In 2021 there were 5,400,000 medically consulted injuries related to motor vehicles.
Apparently it’s 45,404. That’s still a little bit less than people killed by a gun the same year (48,830), and killed by opioids (nearly twice as many: 80,411). Let’s not forget an estimated 300k/year deaths related to obesity. Source: https://www.cdc.gov/nchs/fastats/injury.htm.
Whataboutism but puts those numbers in perspective.
The obesity deaths are comorbidity, which is… not to say that obesity killed them, but that it may have been a factor. E.g. obese 55 year olds that have a heart attack are in there, even though it is entirely possible for non-obese individuals to have heart attacks at young ages.
Guns… most of those are suicides, and that stat is uniquely American in terms of developed countries.
Opioids… also almost entirely self-inflicted (not willingly, but getting flattened by a bus is different than getting accidentally hooked)
Car stats also don’t count the incredible number of people who have lifelong or major injuries due to cars, which I would imagine is much higher. I feel pretty comfortable saying that a majority of people I know have been injured by cars, something that isn’t true for opioids or guns.
>Guns… most of those are suicides, and that stat is uniquely American in terms of developed countries.
That seemed unbelievable to me, so I had to do some checking. The CDC[0] put 48k suicides in 2021, and attributed 55% to guns. Leaving us with 26.4k suicide gun deaths. While still a large number, there are still a significant portion of GP's 48k gun deaths which were not self inflected.
Human vision is simply converting rays of visible light sensed by our light sensors (aka eyes) into electric signals which are subsequently converted into an image by our brain which is then processed.
Given that, perceiving the environment with radar, lidar, visible light, infrared, and so on is equivalent to human vision.
As far as I'm aware, Tesla uses more than just visible light sensors. Am I wrong in my understanding?
Humans aren't capable of emitting an electromagnetic pulse, sensing the reflection, and using the time of flight to calculate distance between themselves and an object. So, no, lidar and radar aren't equivalent to human vision even if you extend the idea of human vision over a wider range of the EM spectrum.
> Given that, perceiving the environment with radar, lidar, visible light, infrared, and so on is equivalent to human vision.
When paired with a general intelligence evolved over a couple million years, and visual sensors that have much more dynamic range than commercially available sensors yeah.
The problem is without those 2 things Teslas crash into fire trucks.
And different bands of EM have different properties so eyes aren’t equivalent to radar. Otherwise we would be able to see around corners.
And if humans had more sensory data we would definitely be integrating it into our driving. Otherwise ADAS tech wouldn’t be so commonplace in 2023.
Park sensors, for example, use radar but they are not equivalent to human vision because humans can't see the back of the car, and they can't measure the distance as accurately.
I meant that they are equivalent in a very fundamentalist sense.
You have input, taken by sensors, converted into usable form by a processor. There is no fundamental difference between visible light and radio waves, per se.
Thanks, I don't care to follow car news so I wasn't aware of that.
Though, it seems Tesla reincorporated radar from their post '21 models on, so the premise of this sub-thread is at best outdated and at worst a half-truth. Oh well.