Just been dabbling with local models, and while the several models I've tried ge...

Just been dabbling with local models, and while the several models I've tried generates decent sentences while quantized, they suffered heavily in following instructions and picking up details.

So a larger model but fairly aggressively quantized could perform worse than a smaller variant of the model with just light quantization, even though the larger still used more memory in total.

I guess some of this is due to the models not being trained to the quantization levels I used. In any case, I say don't get blended by parameter count alone, compare performances.