Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Author here. llamafile will work on stock Windows installs using CPU inference. No CUDA or MSVC or DLLs are required! The dev tools are only required to be installed, right now, if you want get faster GPU performance.


My attempt to run it with the my VS 2022 dev console and a newly downloaded CUDA installation ended in flames as the compilation stopped with "error limit reached", followed by it defaulting to a CPU run.

It does run on the CPU though, so at least that's pretty cool.


I've received a lot of good advice today on how we can potentially improve our Nvidia story so that nvcc doesn't need to be installed. With a little bit of luck, you'll have releases soon that get your GPU support working.


The CPU usage is around 30% when idle (not handling any HTTP requests) under Windows, so you won't want to keep this app running in background. Otherwise, it's a nice try.




Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: