Author here. llamafile will work on stock Windows installs using CPU inference. ...

vsnf · on Nov 30, 2023

My attempt to run it with the my VS 2022 dev console and a newly downloaded CUDA installation ended in flames as the compilation stopped with "error limit reached", followed by it defaulting to a CPU run.

It does run on the CPU though, so at least that's pretty cool.

jart · on Nov 30, 2023

I've received a lot of good advice today on how we can potentially improve our Nvidia story so that nvcc doesn't need to be installed. With a little bit of luck, you'll have releases soon that get your GPU support working.

abareplace · on Nov 30, 2023

The CPU usage is around 30% when idle (not handling any HTTP requests) under Windows, so you won't want to keep this app running in background. Otherwise, it's a nice try.