You wouldn't have to any more than you need 0 through N frames in memory to calculate frame N+1. Whatever your decoding state completes at frame N can be considered a key frame.
It doesn’t have to be every frame though. Pre calculate to 25%/50%/75%, then as I’m playing, save key frames for more incremental points, and if I start scrubbing, calculate more around that region.
Edit: this doesn’t have to happen synchronously either, it can occur in a background thread or passively.