I never know what to expect anymore. We live in a world where computers can describe paintings and write sonnets about them but a half-trillion dollar car company can't figure out how to parallel park with eight cameras.
DriveGPT as it hits 2 parked cars and runs over a dog: "Apologies for the confusion earlier. You are correct that the cars were in the planned local planner path..."
Yeah, and I'm still completely lost as to why resolution is such a limiting factor. If you know you're drawing a soccer ball why is a 512x512 soccer ball so much easier than a 1024x1024 soccer ball?
There are a few cases where people have used ChatGPT to generate SVG[0], with mostly unimpressive results. I'm sure sooner or later models will be developed specifically for creating vector drawings instead of raster, including with the ability to apply textures and gradients.
Also, the resolution of Stable Diffusion's output isn't much of a limitation if you're willing to use other tools to massage the output into something professional-quality. See [1]
It's not lidar they need. BMW, Mercedes, Porsche, etc. All can park themselves almost perfectly every time. Teslas can't, and will take 5x as long to do it when they can, if the computer can even bother to recognise the space.
It's software. Original Teslas with AP1 better than Teslas own in house software on their latest AP.
Remember that "cameras" aren't as good as human perception because human eyes interact with the environment instead of being passive sensors. (That is, if you can't see something you can move your head.)
Plus we have ears, are under a roof so can't get rained on, are self cleaning, temperature regulating, have much better dynamic range, wear driving glasses…
Which sounds like a lot until you realize 1) we drive over three trillion miles a year in the US, and 2) the majority of those accidents are concentrated to a fraction of all drivers. The median human driver is quite good, and the state of the art AI isn't even in the same galaxy yet.
I keep hearing this argument over and over, but I find it uncompelling. As a relatively young person with good vision, who has never been in an accident after many years of driving, and who doesn't make the kind of simple mistakes I've seen the absurd mistakes self-driving cars make and I would not trust my life to a self-driving car.
Asking people to accept a driverless car based on over-arching statistics is papering over some very glaring issues. For example, are most accidents in cars being caused by "average" drivers or are they young / old / intoxicated / distracted / bad vision? Are the statistics randomly distributed (e.g. any driver is just as likely as the next to get in accidents)? Because the driverless cars seem to have accidents at random in unpredictable ways, but human drivers can be excellent (no accidents, no tickets ever), or terrible (drive fast, tickets, high insurance, accidents, etc). The distribution of accidents among humans is not close to uniform, and is usually explainable. I wouldn't trust a poor human driver on a regular basis, nor would I trust an AI because I'm actually a much better driver than both (no tickets, no accidents, can handle complex situations the AI can't). Are the comparisons of human accidents being treated as homogenous (e.g. the chance of ramming full speed into a parked car the same as a fender-bender?). I see 5.8M car crashes anually, but deaths remain fairly low (~40k, .68%), vs 400 driverless accidents with ~20 deaths (5%), I'm not sure we're talking about the same type of accidents.
tl;dr papering over the complexity of driving and how good a portion of drivers might be by mixing non-homogenous groups of drivers and taking global statistics of all accidents and drivers to justify unreliable and relatively dangerous technology would be a strict downgrade for most good drivers (who are most of the population).
It's all trade offs. I'm just spitballing here, but if you have limited resources, you can either spend cash/time on lidar or invest in higher-quality mass-produced optics, or better computer vision software. If you get to a functional camera-only system sooner, might everyone be better off as you can deploy it more rapidly.
Manufacturing capacity of lidar components might be limited.
Another might be reliability/failure modes. If the system relies on lidar, that's another component that can break (or brownout and produce unreliable inputs).
So in a vaccum, yea a lidar+camera system is probably better, but who knows with real life trade offs.
(again, I just made these up, I do not work on this stuff, but these are a few scenarios I can imagine)
While ultrasonic sensors would be fine for parking, they don't have very good range so they aren't much help in avoiding, for example, crashing into stationary fire trucks or concrete lane dividers at freeway speeds.
From my experimentation, LLMs tend to kind of suck at rhyme and meter, and all but the simplest types of poetry, so even if you'd specified it probably wouldn't have been able to deliver.
This is definitely something they could be trained to be much better at, but I guess it's hasn't been a priority.