> Why would the good models (that are barely okay at coding) be big, if it was c...

> Why would the good models (that are barely okay at coding) be big, if it was currently possible to build good models, that are small?

Because nobody tried yet using recent developments.

> but there is no reason to assume that people who work on small models find great optimizations that frontier models makers, who are very interested in efficient models, have not considered already.

Sure there is: they can iterate faster on small model architectures, try more tweaks, train more models. Maybe the larger companies "considered it", but a) they are more risk-averse due to the cost of training their large models, b) that doesn't mean their conclusions about a particular consideration are right, empirical data decides in the end.