I think it's the small TPM limits. I'll be way under the 10-30 requests per minute while using Cline, but it appears that the input tokens count towards the rate limit so I'll find myself limited to one message a minute if I let the conversation go on for too long, ironically due to Gemini's long context window. AFAIK Cline doesn't currently offer an option to limit the context explosion to lower than model capacity.
There is no reason to expect the other entrants in the market to drop out and give them monopoly power. The paid tier is also among the cheapest. People say it’s because they built their own their inference hardware and are genuinely able to serve it cheaper.
I use Gemini 2.5 pro experimental via openrouter in my openwebui for free. Was using sonnet 3.7 but I don't notice much difference so just default to the free thing now.