llama.cpp probably won't be getting Jamba support anytime soon: https://github.c...

llama.cpp probably won't be getting Jamba support anytime soon: https://github.com/ggerganov/llama.cpp/issues/6372#issuecomm...

There is an MLX Mamba implementation, but nothing for Jamba either: https://github.com/alxndrTL/mamba.py/tree/main/mlx

You could run PyTorch on CPU and w/ a 12B activation pass, it might even run relatively fast (8 tok/s?), but a q4 quant would also easily fit on 2x3090s and should run at >60 tok/s.