1- Find some port for it to run locally on GitHub (it will be quantized and not that useful right now, till 2024-till Qualcomm will ship their phones).
- see https://github.com/Bip-Rep/sherpa
*The better one*
2- Host the version on a server on the cloud (or locally on your computer with tunneling) using ooga-booga (with --api flag), and communicate with that self-hosted llama, directly. This way you won't be limited to 7/13B versions but can run the 70B...
You have two options:
1- Find some port for it to run locally on GitHub (it will be quantized and not that useful right now, till 2024-till Qualcomm will ship their phones). - see https://github.com/Bip-Rep/sherpa
*The better one* 2- Host the version on a server on the cloud (or locally on your computer with tunneling) using ooga-booga (with --api flag), and communicate with that self-hosted llama, directly. This way you won't be limited to 7/13B versions but can run the 70B...