The biggest reason I avoid Gemini (and all of Google's models I've tried) is because I cannot get them to produce the same code I'd produce myself, while with OpenAI's models it's fairly trivial.
There is something deeper in the model that seemingly can be steered/programmed with the system/user prompts and it still produces kind of shitty code for some reason. Or I just haven't found the right way of prompting Google's stuff, could also be the reason, but seemingly the same approach works for OpenAI, Anthropic and others, not sure what to make of it.
I'm having the same issue with Gemini as soon as the context length exceeds 50k-ish. At that point, it starts to blurp out random code of terrible quality, even with clear instructions. It would often mix up various APIs. I spend a lot of time instructing it about not writing such code, with plenty of fewshot examples, but it doesn't seem to work. It's like it gets "confused".
The large context length is a huge advantage, but it doesn't seem to be able to use it effectively. Would you say that OpenAI models don't suffer from this problem?
There is something deeper in the model that seemingly can be steered/programmed with the system/user prompts and it still produces kind of shitty code for some reason. Or I just haven't found the right way of prompting Google's stuff, could also be the reason, but seemingly the same approach works for OpenAI, Anthropic and others, not sure what to make of it.