After comparing Gemini Pro and Claude Sonnet 3.7 coding answers side by side a few times, I decided to cancel my Anthropic subscription and just stick to Gemini.
One of the main advantages Anthropic currently has over Google is the tooling that comes with Claude Code. It may not generate better code, and it has a lower complexity ceiling, but it can automatically find and search files, and figure out how to fix a syntax error fast.
As another person that cancelled my Claude and switched to Gemini, I agree that Claude Code is very nice, but beyond some initial exploration I never felt comfortable using it for real work because Claude 3.7 is far too eager to overengineer half-baked solutions that extend far beyond what you asked it to do in the first place.
Paying real API money for Claude to jump the gun on solutions invalidated the advantage of having a tool as nice as Claude Code, at least for me, I admit everyone's mileage will vary.
Exactly my experience as well. Started out loving it but it almost moves too fast - building in functionality that i might want eventually but isn't yet appropriate for where the project is in terms of testing, or is just in completely the wrong place in the architecture. I try to give very direct and specific prompts but it still has the tendency to overreach. Of course it's likely that with more use i will learn better how to rein it in.
I've experienced this a lot as well. I also just yesterday had an interesting argument with claude.
It put an expensive API call inside a useEffect hook. I wanted the call elsewhere and it fought me on it pretty aggressively. Instead of removing the call, it started changing comments and function names to say that the call was just loading already fetched data from a cache (which was not true). I could not find a way to tell it to remove that API call from the useEffect hook, It just wrote more and more motivated excuses in the surrounding comments. It would have been very funny if it weren't so expensive.
Geez, I'm not one of the people who think AI is going to wake up and wipe us out, but experiences like yours do give me pause. Right now the AI isn't in the drivers seat and can only assert itself through verbal expression, but I know it's only a matter of time. We already saw Cursor themselves get a taste of this. To be clear I'm not suggesting the AI is sentient and malicious - I don't believe that at all. I think it's been trained/programmed/tuned to do this, though not intentionally, but the nature of these tools is they will surprise us
> but the nature of these tools is they will surprise us
Models used to do this much much more than now, so what it did doesn't surprise us.
The nature of these tools is to copy what we have already written. It has seen many threads where developers argue and dig in, they try to train the AI not to do that but sometimes it still happens and then it just roleplays as the developer that refuses to listen to anything you say.
I almost fear more that we'll create Bender from Futurama than some superintelligent enlightened AGI. It'll probably happen after Grok AI gets snuck some beer into its core cluster or something absurd.
Earlier this week a Cursor AI support agent told a user they could only use Cursor on one machine at a time, causing the user to cancel their subscription.
agreed, no matter what prompt I try, including asking Claude to promise not to implement code unless we agree on requirements and design, and to repeat that promise regularly, it jumps the gun, and implements (actually hallucinates) solutions way to soon. I changed to Gemini as a result.
I wanted some powershell code to do some sharepoint uploading. It created a 1000 line logging module that allowed me to log things at different levels like info, debug, error etc. Not really what I wanted.
This morning I tweaked my Open Codex config to also try gemma3:27b-it-qat - and Google’s olen source small is excellent: runs fast enough for a good dev experience, with very good functionality.
Typing `//use this as reference ai` in one file and `//copy this row to x ai!` and it will add those functions/files to context and act on both places. Altough I wish Aider would write `working on your request...` under my comment, now I have to keep Aider window in sight. Autocomplete and "add to context" and "enter your instructions" of other apps feel clunky.
I don't understand the appeal of investing in leaning and adapting your workflow to use an AI tool that is so tightly coupled to a single LLM provider, when there are other great AI tools available that are not locked to a single LLM provider. I would guess aider is the closest thing to claude code, but you can use pretty much any LLM.
The LLM field is moving so fast that what is the leading frontier model today, may not be the same tomorrow.
There are at least 10 projects currently aiming to recreate Claude Code, but for Gemini. For example, geminicodes.co by NotebookLM’s founding PM Raiza Martin
Tried Gemini Codes yesterday, as well as anon-kode and anon-codex. Gemini Codes is already broken and appears to be rather brittle (she disclosures as much), and the other two appear to still need some prompt improvements or someone adding vector embedding for them to be useful?
Perhaps someone can merge the best of Aider and codex/claude code now. Looking forward to it.
Google need to fix their Gemini web app at a basic level. It's slow, gets stuck on Show Thinking, rejects 200k token prompts that are sent one shot. Aistudio is in much better shape.
+1 on this. Improving Gemini apps and live mode will go such a long way for them. Google actually has the best model line-up now but the apps and APIs hold them back so much.
Uploading files on google is now great. I uploaded my python script and the text data files I was using the script to process. I asked it how best to optimize the code. It actually ran the python code on the data files. Then recommended changes then when prompted ran the script again to show the new results. At first I was like maybe hallucinating but no the data was correct.
Yes. Any API Key is allowed, Also you can assign different LLMs for different modes. It is great for cost-optimization. Like architect, code, ask, debug etc.
Only Claude (to my knowledge) has a desktop app which can directly, and usually quite intelligently, modify files and create repos on your desktop. It's the only "agentic" option among the major players.
"Claude, make me an app which will accept Stripe payments and sell an ebook about coding in Python; first create the app, then the ebook."
It would take a few passes but Claude could do this; obviously you can't do that with an API alone. That capability alone is worth $30/month in my opinion.
But there are third party options availabe that to the very same thing (e.g. https://aider.chat/ ) which allow you to plug in a model (or even a combination thereof e.g. deepseek as architect and claude as code writer) of your choice.
Therefore the advantage of the model provider providing such a thing doesn't matter, no?
> It would take a few passes but Claude could do this;
I'm sorry but absolutely nothing I've seen from using Claude indicates that you could give it a vague prompt like that and have it actually produce anything worth reading.
Can it output a book's worth of bullshit with that prompt? Yes. But if you think "write a book about Python" is where we are in the state of the art in language models in terms of the prompt you need to get a coherent product, I want some of whatever you are smoking because that has got to be the good shit
It looks the same, but for some reason Claude Code is much more capable. Codex got lost in my source code and hallucinated bunch of stuff, Claude on the same task just went to town, burned money and delivered.
Of course, this is only my experience and codex is still very young. I really hope it becomes as capable as Claude.
Part of it is probably tgat claude is just better at coding than what openai has available. I am considering trying to hack in support for gemini into codex and play around with it.
Also the "project" feature in claude improves experience significantly for coder, where you can customize your workflow. Would be great if gemini has this feature.
Yes, IME, Anthropic seemed to be ahead of Google by a decent amount with Sonnet 3.5 vs 1.5 Pro.
However, Sonnet 3.7 seemed like a very small increase, whereas 2.5 Pro seemed like quite a leap.
Now, IME, Google seems to be comfortably ahead.
2.5 Pro is a little slow, though.
I'm not sure which model Google uses for the AI answers on search, but I find myself using Search for a lot of things I might ask Gemini (via 2.5 Pro) if it was as fast as Search's AI answers.
I've been using Gemini 2.5 and Claude 3.7 for Rust development and I have been very impressed with Claude, which wasn't the case for some architectural discussions where Gemini impressed with it's structure and scope. OpenAI 4.5 and o1 have been disappointing in both contexts.
Gemini doesn't seem to be as keen to agree with me so I find it makes small improvements where Claude and OpenAI will go along with initial suggestions until specifically asked to make improvements.
I have noticed Gemini not accepting an instruction to "leave all other code the same but just modify this part" on a code that included use of an alpha API with a different interface than what Gemini knows is the correct current API. No matter how I promoted 2.5 pro, I couldn't get it to respect my use of the alpha API, it would just think I must be wrong.
So I think patterns from the training data are still overriding some actual logic/intelligence in the model. Or the Google assistant fine-tuning is messing it up.
I have been using gemini daily for coding for the last week, and I swear that they are pulling levers and A/B testing in the background. Which is a very google thing to do. They did the same thing with assistant, which I was a pretty heavy user of back in the day (I was driving a lot).
I have had a few epic refactoring failures with Gemini relative to Claude.
For example: I asked both to change a bunch of code into functions to pass into a `pipe` type function, and Gemini truly seemed to have no idea what it was supposed to do, and Claude just did it.
Maybe there was some user error or something, but after that I haven’t really used Gemini.
I’m curious if people are using Gemini and loving it are using it mostly for one-shotting, or if they’re working with it more closely like a pair programmer? I could buy that it could maybe be good at one but bad at the other?
This has been my experience too. Gemini might be better for vibe coding or architecture or whatever, but Claude consistently feels better for serious coding. That is, when I know exactly how I want something implemented in a large existing codebase, and I go through the full cycle of implementation, refinement, bug fixing, and testing, guiding the AI along the way.
It also seems to be better at incorporating knowledge from documentation and existing examples when provided.
My experience has been exactly the opposite - Sonnet did fine on trivial tasks, but couldn't e.g. fix a bug end-to-end (from bug description in the tracker to implementing the fix and adding tests) properly because it couldn't understand how the relevant code worked, whereas Gemini would consistently figure out the root cause and write decent fix & tests.
Perhaps this is down to specific tools and their prompts? In my case, this was Cursor used in agent mode.
Or perhaps it's about the languages involved - my experiments were with TypeScript and C++.
> Gemini would consistently figure out the root cause and write decent fix & tests.
I feel like you might be using it differently to me. I generally don't ask AI to find the cause of a bug, because it's quite bad at that. I use it to identify relevant parts of the code that could be involved in the bug, and then I come up with my own hypotheses for the cause. Then I use AI to help write tests to validate these hypotheses. I mostly use Rust.
I used to use them mostly in "smart code completion" mode myself until very recently. But with all the AI IDEs adding agentic mode, I was curious to see how well that fares if I let it drive.
And we aren't talking about trivial bugs here. For TypeScript, the most impressive bug it handled to date was an async race condition due to missing await causing a property to be overwritten with invalid value. For that one I actually had to do some manual debugging and tell it what I observed, but given that info, it was able to locate the problem in the code all by itself and fix it correctly and come up with a way to test it as well.
For C++, the codebase in question was gdb, the bug was a test issue, and it correctly found problematic code based solely on the test log (but I had to prod it a bit in the right direction for the fix).
I should note that this is Gemini Pro 2.5 specifically. When I tried Google's models previously (for all kinds of tasks), I was very unimpressed - it was noticeably worse than other SOTA models, so I was very skeptical going into this. Indeed, I started with Sonnet precisely because my past experience indicated that it was the best option, and I only tried Gemini after Sonnet fumbled.
I use it for basically everything I can, not just code completion, including end-to-end bug fixes when it makes sense. But most of the time even the current Gemini and Claude models fail with the hard things.
It might be because most bugs that you would encounter in other languages don't occur in the first place in Rust because of the stronger type system. The race condition one you mentioned wouldn't be possible for example. If something like that would occur, it's a compiler error and the AI fixes it while still in the initial implementation stage by looking at the linter errors. I also put a lot of effort into trying to use coding patterns that do as much validation as possible within the type system. So in the end all that's left are the more difficult bugs where a human is needed to assist (for now at least, I'm confident that the models are only going to get better).
Race conditions can span across processes (think async process communication).
That said I do wonder if the problems you're seeing are simply because there isn't that much Rust in the training set for the models - because, well, there's relatively little of it overall when you compare it to something like C++ or JS.
I've found that I need to point it to the right bit of logs or test output and narrow its attention by selectively adding to it's context. Claude 3.7 at least works well this way. If you don't it'll fumble around. Gemini hasn't worked as well for me though.
I partly wonder if different peoples prompt styles will lead to better results with different models.
I also cancelled my Anthropic yesterday, not because of Gemini but because it was the absolute worst time for Anthropic to limit their Pro plan to upsell their Max plan when there is so much competition out there
Manus.im also does code generation in a nice UI, but I’ll probably be using Gemini and Deepseek
Google has killed so many amazing businesses -- entire industries, even, by giving people something expensive for free until the competition dies, and then they enshittify hard.
It's cool to have access to it, but please be careful not to mistake corporate loss leaders for authentic products.
It's not free. And it's legit one of the best models. And it was a Google employee who was among the authors of the paper that's most recognized as kicking all this off. They give somewhat limited access in AIStudio (I have only hit the limits via API access, so I don't know what the chat UI limits are.) Don't they all do this? Maybe harder limits and no free API access. But I think most people don't even know about AIStudio.
True. They are ONLY good when they have competition. The sense of complacency that creeps in is so obvious as a customer.
To this day, the Google Home (or is it called Nest now?) speaker is the only physical product i've ever owned where it lost features over time. I used to be able to play the audio of a Youtube video (like a podcast) through it, but then Google decided that it was very very important that I only be able to play a Youtube video through a device with a screen, because it is imperative that I see a still image when I play a longform history podcast.
Obviously, this is a silly and highly specific example, but it is emblematic of how they neglect or enshittify massive swathes of their products as soon as the executive team loses interest and puts their A team on some shiny new object.
The experience on Sonos is terrible. There are countless examples of people sinking 1000s of dollars into Sonos ecosystem, and the new app update has rendered them useless.
I'm experiencing the same problem with my Google Home ecosystem. One day I can turn off the living room lights with the simple phrase "Turn off Living Room Lights," and then randomly for two straight days it doesn't understand my command
Preach it my friend. For years on the Google Home Hub (or Nest Hub or whatever) I could tell it to "favorite my photo" of what is on the screen. This allowed me to incrementally build a great list of my favorite photos on Google Photos and added a ton of value to my life. At some point that broke, and now it just says, "Sorry, I can't do that yet". Infuriating
The usage limit for experimental gets used up pretty fast in a vibe-coding situation. I found myself setting up an API account with billing enabled just to keep going.
How would I know if it’s useful to me without being able to trial it?
Googles previous approach (Pro models available only to Gemini Advanced subscribers, and Advanced trials can’t be stacked with Google One paid storage, or rather they convert the already paid storage portion to a paid, much shorter Advanced subscription!) was mind-bogglingly stupid.
Having a free tier on all models is the reasonable option here.
In this case, Google is a large investor in Anthropic.
I agree that giving away access to expensive models long term is not a good idea on several fronts. Personally, I subscribe to Gemini Advanced and I pay for using the Gemini APIs.
EDIT: a very good deal, at $10/month is https://apps.abacus.ai/chatllm/ that gives you access to almost all commercial models as well as the best open weight models. I have never come close at all to using my monthly credits with them. If you like to experiment with many models the service is a lot of fun.
The problem with tools like this is that somewhere in the chain between you and the LLM are token reducing “features”. Whether it’s the system prompt, a cheaper LLM middleman, or some other cost saving measure.
You’ll never know what that something is. For me, I can’t help but think that I’m getting an inferior service.
You can self host something like https://big-agi.com/ and grab your own keys from various providers. You end up with the above, but without the pitfalls you mentioned.
BIG-AI does look cool, and supports a different use case. ABACUS.AI takes your $10/month and gives you credits that go towards their costs of using OpenAI, Anthropic, Gemini, etc. Use of smaller open models use very few credits.
The also support an application development framework that looks interesting but I have never used it.
You might be correct about cost savings techniques in their processing pipeline. But they also add functionality: they bake web search into all models which is convenient. I have no affiliation with ABACUS.AI, I am just a happy customer. They currently let me play with 25 models.
Just look at Chrome to see the bard/gemini's future. HN folks didn't care about Chrome then but cry about Google's increasingly hostile development of Chrome.
Look at Android.
HN behaviour is more like a kid who sees the candy, wants the candy and eats as much as it can without worrying about the damaging effect that sugar will have on their health. Then, the diabetes diagnosis arrives and they complain
It's placing instructions AND user query at top and bottom. So if you have a prompt like this:
[Long system instructions - 200 tokens]
[Very long document for reference - 5000 tokens]
[User query - 32 tokens]
The key-values for first 5200 tokens can be cached and it's efficient to swap out the user query for a different one, you only need to prefill 32 tokens and generate output.
But the recommendation is to use this, where in this case you can only cache the first 200 tokens and need to prefill 5264 tokens every time the user submits a new query.
[Long system instructions - 200 tokens]
[User query - 32 tokens]
[Very long document for reference - 5000 tokens]
[Long system instructions - 200 tokens]
[User query - 32 tokens]
They’re offering to fund 160 million euros to bootstrap private enterprises providing launch services to the ESA. Of course they want to review the business plan to make sure it’s viable before dropping a bundle of government money on some random company.
Which is completely the wrong way of doing it. They are the market. So define it and leave it to private industry to fund and develop the capabilities to meet their requirements.
If they say, we are willing to spend X amount of money on Y number of launches. You can rent our launch sites for tests. Here is a list of requirements. They would see incredible results.
But no europe must micromanage and do things in their own bureaucratic way that has always failed.
> It’s not the only reason but an absolutely crucial factor is that EU states protect their domestic auto industry via tariffs and industrial policies/subsidies.
The Apache Arrow libraries are a good alternative for reading parquet files in Java. They provide a column oriented interface, rather than the ugly Avro stuff in the Apache Parquet library.
> BEIJING, March 31 (Reuters) - China, Japan and South Korea agreed to jointly respond to U.S. tariffs, a social media account affiliated with Chinese state media said on Monday, an assertion Seoul called "somewhat exaggerated", while Tokyo said there was no such discussion.
> Also perhaps our more enlightened policies are helping us achieve that higher per capita GDP?
Singapore’s GDP per-capita is likely fairly inflated as it doesn’t correct for the effect of multinational tax planning by large corporations on the GDP statistics , unlike say Ireland.
I was diagnosed with LADA type 1 diabetes. First in my family to have it.
My immediate reaction was wanting to put together something to track my diet, blood glucose weight and so on.
Thank you for sharing your experience.
reply