Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
OnnxStream running TinyLlama and Mistral 7B, with CUDA support (github.com/vitoplantamura)
17 points by Robin89 on Jan 14, 2024 | hide | past | favorite | 2 comments


This is a pretty interesting idea for diffusion, but much less so for llms, where (after prompt ingestion) the entire weights have to be cycled through for basically every word.


This works very good on my pc! I've got as i3-12100f CPU, 16GB RAM and RTX 2060. Running it with `llm --cuda 4` answers very fast, with a lot of hallucination though :|




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: