Hacker Newsnew | past | comments | ask | show | jobs | submit | more bildung's commentslogin

It's probably reasonable to take a step back here and ask: Why is this not a universal problem? It's not as if every juristication outside the US simply lets criminals run away.


Do you have an example of a good implementation of ai captions? I've only experienced those on youtube, and they are really bad. The automatic dubbing is even worse, but still.

On second thought this probably depends on the caption language.


I'm not going to defend the youtube captions as good, but even still, I find them incredibly helpful. My hearing is fine, but my processing is rubbish, and having a visual aid to help contextualize the sound is a big help, even when they're a bit wrong.

Your point about the caption language is probably right though. It's worse with jargon or proper names, and worse with non-American English speakers. If we they don't even get right all the common accents of English, I have little hope for other languages.


Automatic translation famously fails catastrophically with Japanese, because it's a language that heavily depends on implied rather than explicit context.

The minimal grammatically correct sentence is simply a verb, and it's an exercise to the reader to know what the subject and object are expected to be. (Essentially, the more formal/polite you get, the more things are added. You could say "kore wa atsu desu" to mean "this is hot." But you could also just say "atsu," which could also be interpreted as a question instead of a statement.)

Chinese seems to have similar issues, but I know less about how it's structured.

Anyway, it's really nice when Japanese music on YouTube includes a human-provided translation as captions. Automated ones are useless, when it doesn't give up entirely.


I assume people talk about transcription, not translation. Translation in youtube ime is indeed horrible in all languages I have tried, but transcription in english is good enough to be useful. However, the more technical jargon a video uses, the worse transcription is (translation is totally useless in anything technical there).


Automatic transcription in English heavily depend on accent, sound quality, and how well the speaker is articulating. It will often mistake words that sound alike to make non-sensible sentences, randomly skip words, or just inserts random words for no clear reason.

It does seem to do a few clever things. For lyrics it seem to first look for existing transcribed lyrics before making their own guesses (Timing however can be quite bad when it does this). Outside of that, AI transcribed videos is like an alien who has read a book on a dead language and is transcribing based on what the book say that the word should sound like phonetically. At times that can be good enough.

(A note on sound quality. It not the perceived quality. Many low res videos has perfectly acceptable, if somewhat lossy sound quality, but the transcriber goes insane. It likes prefer 1080p videos with what I assume much higher bit-rate for the sound.)


In the times I have noticed the transcription be bad, my speech comprehension itself is even worse. So I still find it useful. It is not substitution for human created (or at least curated) subtitles by any means, but better than nothing.


Do you have an example? YT captions being useless is a common trope I keep seeing on reddit that is not reflected in my experience at all. Feels like another "omg so bad" hyperbole that people just dogpile on, but would love to be proven wrong.


Captions seem to have been updated sometime between 7 and 15 months ago. Here's a reddit post from 7 months ago noticing the update: https://www.reddit.com/r/youtube/comments/1kd9210/autocaptio...

and here's Jeff Geerling 15 months ago showing how to use Whisper to make dramatically better captions: https://www.youtube.com/watch?v=S1M9NOtusM8

I assume Google has finally put some of their multimodal LLM work to good use. Before that, they were embarrassingly bad.


Interesting. I wonder if people saying that they are useless base it on experiences before that and have had them turned off since.


There are projects that will run Whisper or another transcription service locally on your computer, which has great quality. For whatever reason, Google chooses not to use their highest quality transcription models on YouTube, maybe due to cost.


I use Whisper running locally for automated transcription of many hours of audio on a daily basis.

For the most part, Whisper does much better than stuff I've tried in the past like Vosk. That said, it makes a somewhat annoying error that I never really experienced with others.

When the audio is low quality for a moment, it might misinterpret a word. That's fine, any speech recognition system will do that. The problem with Whisper is that the misinterpreted word can affect the next word, or several words. It's trying to align the next bits of audio syntactically with the mistaken word.

Older systems, you'd get a nonsense word where the noise was but the rest of the transcription would be unaffected. With Whisper, you may get a series of words that completely diverges from the audio. I can look at the start of the divergence and recognize the phonetic similarity that created the initial error. The following words may not be phonetically close to the audio at all.


Try Parakeet, it's more state of the art these days. There are others too like Meta's omnilingual one.


Ah yes, one of the standard replies whenever anyone mentions a way that an AI thing fails: "You're still using [X]? Well of course, that's not state of the art, you should be using [Y]."

You don't actually state whether you believe Parakeet is susceptible to the same class of mistakes...


¯\_(ツ)_/¯

I haven't seen those issues myself in my usage, it's just a suggestion, no need to be sarcastic about it.


It's an extremely common goalpost-moving pattern on HN, and it adds little to the conversation without actually addressing how or whether the outcome would be better.


Try it, or don't. Due to the nature of generative AI, what might be an issue for me might not be an issue for you, especially if we have differing use cases, so no one can give you the answer you seek except for yourself.


Apparently it's also shit. There was a discussion about it a few days ago that contains multiple project maintainers pointing out deepwiki didn't get their repos at all https://news.ycombinator.com/item?id=45884169


As counterpoints to illustrate Chinas current development:

* China has produced more PV panel capacity in the first half of this year than the US has installed, all in all, in all of its history

* China alone has installed PV capacity of over 1000 GW today

* China has installed battery electrical storage of about 100 GW / 300 GWh today and aims to have 180 GW in 2027


You just need to pay 60€ or so for a business license and off you go. A GmbH (a corporate structure with limited liability, somewhere between an LLC and Corp) is not needed if you want to start a) now and b) for almost zero cost.


I also think they have to be substantially cheaper than nvidia to have any chance, but the pro 6000 with 96G is already available at 7-8k - so half the price would have to be significantly below 4k.


Huh didn’t know that, nice. Intel’s still in trouble then :) IMHO they’ll try to sell the increased ram as worth the ‘premium’ (or, worth the ‘reduced not-nvidia penalty’)


The CCC hasn't been only about computers since inception. They were clearly already political in the 80s, just have a look at this zine: https://ds.ccc.de/pdfs/ds024.pdf


Words are just noises. Think of them as pointers. They point to a concept in the brain. What concept that may be differs from person to person. But as long as the words point to something, they aren't used wrongly.

The idea that there was some point in history were the pointer target was officially designated to be x is just false. That point in time never existed.


My point isn't that the use of the phrase is wrong, the point is that the colloquial understanding of the phrase is a bad concept.

[All] lawyers are bad CEOs is a statement that was made. Evidence to the contrary was presented. "The exception proves the rule" was used to dismiss that evidence.

It's used in a similar way as "God works in mysterious ways".


I think we witnessed a profound paradigm shift last week - China is the new global driving force to avert climate catastrophy. The probably peaked CO2 output last year. Last yeear they reached their 2030 target for renewables as share of total energy production. Almost 60% of new cars sold there are electric. And China produced more PV cells in the first half of this year alone than have been installed in the US in sum, ever.

And now they stated a public CO2 reduction goal for the first time.

I suspect people in the US haven't really noticed this as much because the 100%+ tariffs on cars and PV isolate the country from the dramatic changes happening everywhere else. Here in Germany I can buy 2 kWp in panels plus an inverter for under 400€.


From the start of the book:

"Each time you see a word that is highlighted, [...] it means that this term is a lexicon enabled term. By clicking on that term, you will see a page listing all other uses of that term within the book."


Very helpful for people who need to be constantly reminded to have top-of-mind instant access to the definition of words like "thing" and "what", at all times.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: