Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Mind blown of course.

Two things are interesting:

- No audio -- that must have been hard to add, or else it would have been there.

- Spelling is still probably hard to do (the familiar DallE problem)... e.g. a video showing a car driving past a billboard with specified text.



My intuition is that training on audio will be trivial if they can accomplish this for video. Maybe I'm wrong.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: