I think the innovation is using caching as so to make the cost of the approach manageable. The way they implemented it is that each time you create a chunk, you ask the llm to create an atomic chunk from the whole context. You need to do this for all tens of thousands of chunks in your data. This costs a lot. By caching the documents, you can spare costs
I don't understand how that helps here. They're not regenerating each chunk every time, this is about caching the state after running a large doc through a model. You can only do this kind of thing if you have access to the model itself, or it's provided by the API you use.