Your home setup is much less efficient than production inference in a data center. Open source implementation of SDXL-Lightning runs at 12 images a second on TPU v5e-8, which uses ~2kW at full load. That’s 170J or about 1/400th the phone charge.
These models do not appear from thin air. Add in the training cost in terms of power. Yes it's capex and not opex, but it's not free by any means.
Plus, not all these models run on optimized TPUs, but mostly on nVIDIA cards. None of them are that efficient.
Otherwise I can argue that running these models are essentially free since my camera can do face recognition and tracking at 30fps w/o a noticeable power draw since it uses a dedicated, purpose built DSP for that stuff.
https://cloud.google.com/blog/products/compute/accelerating-...
https://arxiv.org/pdf/2502.01671