It is not clear that a simple/small model with inference running on home hardwar...

It is not clear that a simple/small model with inference running on home hardware is energy or cost efficient compared to the scaled up inference of a large model with batch processing. There are dozens of optimizations possible when splitting an LLM on multiple tiny components on separate accelerator units and when one handles kv cache optimization at the data center level; these are simply not possible at home and would be a waste of effort and energy until you serve thousands to millions of requests in parallel.