What hardware are you running? Parakeet runs on nvidia and Mac and it’s way faster than Whisper. And I’ve had issues with training Qwen3 (and even Qwen2.5 but I think I was masking stop tokens wrong). I’ve had success with Gemma 3 though, and they have some really small models (270m and 1b). Maybe 270m for just transcript cleaning? I wonder if the 1b model can handle the transcript analysis…
Unfortunately I have zero experience with the Jetson family, and Parakeet itself is a pain to get running IMO - I took the easy option and used the ONNX version