Is there an easy way to play with these models, as someone who hasn't deployed t...

tayo42 · on Aug 13, 2023

For llama, the 4bit quantized ones, small models like the 7b one. The ggml format. That will run on your local cpu. Google those terms too. you can look on hugging face for the actual model to download then load it and send prompts to it

stavros · on Aug 13, 2023

Thanks, maybe it's as easy as downloading the ggml and running it with Llama.cpp. I'll try that, thanks!

tayo42 · on Aug 14, 2023

there is also a python wrapper that has a web ui built in for llama.cpp, if it wasnt easy enough already

robertnishihara · on Aug 13, 2023

If you want to try out the Llama-2 models (7B, 13B, 70B), you can get started very easily with Anyscale Endpoints (~2 min). https://app.endpoints.anyscale.com/

mark_l_watson · on Aug 14, 2023

I usually run them on Google Colab, and occasionally a GPU VPS on Lambda Labs. Hugging Face model card documentation usually have a complete Python example script for loading and running a model.