I prefer to use local models when running data extraction or processing over 10k...

I prefer to use local models when running data extraction or processing over 10k or more records. Hosted services would be slow and brittle at this point.

Mistral 7B fine-tunes (OpenChat is my favorite) just chug through the data and get the job done.

Details: using vLLM to run the models. Using ChatGPT-4 to condense information for complex prompts (that the local models will execute).

I think, the situation will just keep on getting better with each month.