Is there an easy way to play with these models, as someone who hasn't deployed them? I can download/compile llama.cpp, but I don't know which models to get/where to put them/how to run them, so if someone knows about some automated downloader along with some list of "best models", that would be very helpful.
For llama, the 4bit quantized ones, small models like the 7b one. The ggml format. That will run on your local cpu. Google those terms too. you can look on hugging face for the actual model to download then load it and send prompts to it
If you want to try out the Llama-2 models (7B, 13B, 70B), you can get started very easily with Anyscale Endpoints (~2 min). https://app.endpoints.anyscale.com/
I usually run them on Google Colab, and occasionally a GPU VPS on Lambda Labs. Hugging Face model card documentation usually have a complete Python example script for loading and running a model.