It's because Google has this exact same problem with their AI models. Also they would probably have to double their compute capacity if a billion of their customers started using it. (My made up numbers). It uses hundreds of GB of GPU RAM during the inference. I am guessing they don't have enough GPUs to do that and still have a Google Cloud.
It's different from OpenAI because of the existing user base is like a billion users or something.
It's different from OpenAI because of the existing user base is like a billion users or something.