Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Yes and its the same author this time published on the Gradient (the link before was to the personal blog). The Gradient by the way are amazing curators of AI news in general and have one of the better podcasts I am aware of interviewing developers in the trenches.

Adding: this resurgence in Mamba in general is also due to some actual sota progress with SSM like the new AI21 lab released this week [1] and likely to see others merging different architecture layers (this is a 52B MoE with 12B params active during inference blending both Mamba and transformers)

>As the first production-grade model based on Mamba architecture, Jamba achieves an unprecedented 3X throughput and fits 140K context on a single GPU.

[1] https://www.ai21.com/jamba



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: