There's also less of a special sauce for text models itself these days with the propietary being more on the pre-training data and training stack (e.g. how to get 10k GPUs/TPUs running together smoothly). Multi-modal models (or adjacent like Sora) are less likely to be open sourced in the immediate term.
There is a lot of work to make the actual infrastructure and lower level management of lots and lots of GPUs/TPUs open as well - my team focuses on making the infrastructure bit at least a bit more approachable on GKE and Kubernetes.
The actual training is still a bit of a small pool of very experienced people, but it's getting better. And every day serving models gets that much faster - you can often simply draft on Triton and TensorRT-LLM or vLLM and see significant wins month to month.