A Petals dev here. Recent models indeed outperform BLOOM with less parameters (for English). However, the largest LLaMA still doesn't fit into one consumer-grade GPU, and these models still benefit from increasing the number of parameters. So we believe that the Petals-like approach is useful for the newer models as well.
We have guides for adding other models to Petals in the repo. One of our contributors is working on adding the largest LLaMA right now. I doubt that we can host LLaMA in the public swarm due to its license, but there's a chance that we'll get similar models with more permissive license in future.
> I doubt that we can host LLaMA in the public swarm due to its license
Is there anything in the license that specifically forbids distributed usage? If not, you can run it on Petals and just specify that anyone using it must do so for research purposes (or whatever are the license terms)
We have guides for adding other models to Petals in the repo. One of our contributors is working on adding the largest LLaMA right now. I doubt that we can host LLaMA in the public swarm due to its license, but there's a chance that we'll get similar models with more permissive license in future.