Yes! You have two options: 1- Find some port for it to run locally on GitHub (it...

Yes!

You have two options:

1- Find some port for it to run locally on GitHub (it will be quantized and not that useful right now, till 2024-till Qualcomm will ship their phones). - see https://github.com/Bip-Rep/sherpa

*The better one* 2- Host the version on a server on the cloud (or locally on your computer with tunneling) using ooga-booga (with --api flag), and communicate with that self-hosted llama, directly. This way you won't be limited to 7/13B versions but can run the 70B...