Who cares about executable size when the models are measured in gigabytes lol. I would prefer a Go/Node/Python/etc server for a HTTP service even at 10x the size over some guy's bespoke c++ any day of the week. Also, measuring the size of an executable after zipping is a nonsense benchmark in of itself
Nitro outstripped them, 3 MB executable with OpenAI HTTP server and persistent model load