Hacker News new | past | comments | ask | show | jobs | submit login

FWIW, AssemblyAI has great trasncript quality in my experience, and they support streaming: https://www.assemblyai.com/docs/walkthroughs#realtime-stream...



We're using AssemblyAI too, and I agree that their transcription quality is good. But as soon as Whisper supports world-level timestamps, I think we'll seriously consider switching as the price difference is large ($0.36 per hour vs $0.9 per hour).


Both of those prices strike me as quite high, given that Whisper can be run relatively quickly on commodity hardware. It's not like the bandwidth is significant either, it's just audio.


It's pretty great from my perspective. I've been creating little supplemental ~10 minute videos for my class (using descript; i should probably switch to OBS), and the built in transcription is both wonderful (that it has it at all and is easy to fix) and horrible (the number of errors is very high). I'd happily pay a dime to have a higher quality starting transcription that saves me 5 minutes of fixing...


Try my app: https://apps.apple.com/app/wisprnote/id1671480366

It has great quality transcription from video and audio (in English only sorry if that's not you!). Uses Whisper.cpp plus VAD to skip silent / non-speech sections which introduce errors normally. Give a try let me know what you think! :)


A plug here but check out https://vidcap.app/

It’s based on a finetuned Whisper and you’d get unlimited transcriptions for $4.99/month


Why do you need Word-level timestamps? I don't understand what that's for...




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: