Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I would be curious to know if anyone has tried a hybrid approach where you have a Mamba-like architecture for longer term recall but it's combined with a transformer for short term memory?


Yep, https://arxiv.org/abs/2402.04248 tried a Mambaformer which seemed to perform well.


maybe a fun karpathy video here...




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: