You run the corpus through the model piecemeal, recording the model's interpretation for each chunk as a vector of floating point numbers. Then when performing a completions request you first query the vectors and include the closest matches as context.