You split up the audio and send it over in a loop. Pass in the transcript of the...

banana_giraffe · on March 1, 2023

And:

> we suggest that you avoid breaking the audio up mid-sentence as this may cause some context to be lost.

That's really easy to put in a document, much harder to do in practice. Granted, it might not matter much in the real world, not sure yet.

Still, this will require more hand holding than I'd like.

travisjungroth · on March 1, 2023

I doubt it will matter if you're breaking up mid sentence if you pass in the previous as a prompt and split words. This is how Whisper does it internally.

It's not absolutely perfect, but splitting on the word boundary is one line of code with the same package in their docs: https://github.com/jiaaro/pydub/blob/master/API.markdown#sil...

25MB is also a lot. That's 30 minutes to an hour on MP3 at reasonable compression. A 2 hour movie would have three splits.

patricksamy · on March 10, 2023

If that helps, just wrote a script to split the audio and use the prompt parameter to provide context with the n-1 segment transcription: https://gist.github.com/patrick-samy/cf8470272d1ff23dff4e2b5...

mike_d · on March 1, 2023

The page includes a five line Python example of how to split audio without breaking mid-word.