I don't know if I'm doing something wrong, but every time I ask gemini 2.5 for code it outputs SO MANY comments. An exaggerated amount of comments. Sections comments, step comments, block comments, inline comments, all the gang.
I usually remove the comments by hand. It's actually pretty helpful, it ensures I've reviewed every piece of code carefully, especially since most of the comments are literally just restating the next line, and "does this comment add any information?" is a really helpful question to make sure I understand the code.
I've found that heavily commented code can be better for the LLM to read later, so it pulls in explanatory comments into context at the same time as reading code, similar to pulling in @docs, so maybe it's doing that on purpose?
No, it's just bad. I've been writing a lot of Python code past two days with Gemini 2.5 Pro Preview, and all of its code was like:
```python
def whatever():
--- SECTION ONE OF THE CODE ---
...
--- SECTION TWO OF THE CODE ---
try:
[some "dangerous" code]
except Exception as e:
logging.error(f"Failed to save files to {output_path}: {e}")
# Decide whether to raise the error or just warn
# raise IOError(f"Failed to save files to {output_path}: {e}")
```
(it adds commented out code like that all the time, "just in case")
The training loop asked the model to one-shot working code for the given problems without being able to iterate. If you had to write code that had to work on the first try, and where a partially correct answer was better than complete failure, I bet your code would look like that too.
In any case, it knows what good code looks like. You can say "take this code and remove spurious comments and prefer narrow exception handling over catch-all", and it'll do just fine (in a way it wouldn't do just fine if your prompt told it to write it that way the first time, writing new code and editing existing code are different tasks).
It's only an example, there's pretty of irrelevant stuff that LLMs default to which is pretty bad Python. I'm not saying it's always bad but there's a ton of not so nice code or subtly wrong code generated (for example file and path manipulation).
There are a bunch of stupid behaviors of LLM coding that will be fixed by more awareness pretty soon. Imagine putting the docs and code for all of your libraries into the context window so it can understand what exceptions might be thrown!
Copilot and the likes have been around for 4 years, and we’ve been hearing this all along. I’m bullish on LLM assistants (not vibe coding) but I’d love to see some of these things actually start to happen.
I feel like it has gotten better over time, but I don't have any metrics to confirm this. And it may also depend on what type of you language/libraries that you use.
It just feels to me like trying to derive correct behavior without a proper spec so I don't see how it'll get that much better. Maybe we'll collectively remove the pathological code but otherwise I'm not seeing it.
It's certainly annoying, but you can try following up with "can you please remove superfluous comments? In particular, if a comment doesn't add anything to the understanding of the code, it doesn't deserve to be there".
I'm having the same issue, and no matter what I prompt (even stuff like "Don't add any comments at all to anything, at any time") it still tries to add these typical junior-dev comments where it's just re-iterating what the code on the next line does.
I prefer not to do that as comments are helpful to guide the LLM, and esp. show past decisions so it doesn't redo things, at least in the scope of a feature. For me this tends to be more of a final refactoring step to tidy them up.
I always thought these were there to ground the LLM on the task and produce better code, an artifact of the fact that this will autocomplete better based on past tokens. Similarly always thought this is why ChatGPT always starts every reply with repeating exactly what you asked again
Comments describing the organization and intent, perhaps. Comments just saying what a "require ..." line requires, not so much. (I find it will frequently put notes on the change it is making in comments, contrasting it with the previous state of the code; these aren't helpful at all to anyone doing further work on the result, and I wound up trimming a lot of them off by hand.)
I have the same issue plus unnecessary refactorings (that break functionality). it doesn't matter if I write a whole paragraph in the chat or the prompt explaining I don't want it to change anything else apart from what is required to fulfill my very specific request. It will just go rogue and massacre the entirety of the file.
This has also been my biggest gripe with Gemini 2.5 Pro. While it is fantastic at one-shotting major new features, when wanting to make smaller iterative changes, it always does big refactors at the same time. I haven't found a way to change that behavior through changes in my prompts.
Claude 3.7 Sonnet is much more restrained and does smaller changes.
This exact problem is something I’m hoping to fix with a tool that parses the source to AST and then has the LLM write code to modify the AST (which you then run to get your changes) rather than output code directly.
I’ve started in a narrow niche of python/flask webapps and constrained to that stack for now, but if you’re interested I’ve just opened it for signups: https://codeplusequalsai.com
Would love feedback! Especially if you see promising results in not getting huge refactors out of small change requests!
Interesting idea. But LLMs are trained on vast amount of "code as text" and tiny fraction of "code as AST"; wouldn't that significantly hurt the result quality?
Thanks and yeah that is a concern; however I have been getting quite good results from this AST approach, at least for building medium-complexity webapps. On the other hand though, this wasn't always true...the only OpenAI model that really works well is o3 series. Older models do write AST code but fail to do a good job because of the exact issue you mention, I suspect!
Having the LLM modify the AST seems like a great idea. Constraining an LLM to only generate valid code would be super interesting too. Hope this works out!
Asking it explicitly once (not necessarily every new prompt in context) to keep output minimal and strive to do nothing more than it is told works for me.
Really? I haven't tried Gemini 2.5 yet, but my main complaint with Claude 3.7 is this exact behavior - creating 200+ line diffs when I asked it to fix one function.
This is generally controllable with prompting. I usually include something like, “be excessively cautious and conservative in refactoring, only implementing the desired changes” to avoid.
I've used it via Google's own AI studio and via my own library/program using the API and finally via Aider. All of them lead to the same outcome, large chunks of changes to a lot of unrelated things ("helpful" refactors that I didn't ask for) and tons of unnecessary comments everywhere (like those comments you ask junior devs to stop making). No amount of prompting seems to address either problems.
Tell it not to write so many comments then. You have a great deal of flexibility in dictating the coding style and can even include that style in your system prompt or upload a coding style document and have Gemini use it.
Ok, so saying "Implement feature X" leads to a ton of comments. How do you rewrite that comment to not include "don't write comments" while making the output not containing comments? "Write only source code, no plain text with special characters in the beginning of the line" or what are you suggesting here in practical terms?
I also include something about "Target the comments towards a staff engineer that favors concise comments that focus on the why, and only for code that might cause confusion."
I also try and get it to channel that energy into the doc strings, so it isn't buried in the source.
This is sort of LLM specific. For some tasks you might try including the word comment but give the order at the beginning and end of the prompt. This is very model dependent. Like:
Refractor this. Do not write any comments.
<code to refractor>
As a reminder your task is to refractor the above code and do not write any comments.
Yes my suggestion is that negations can work just fine, depending on the model and task, and instead of avoiding negations you can try other promoting strategies like emphasizing what you want at the beginning and at the end of the prompt.
If you think negations never work tell Gemini 2.5 to "write 10 sentences that do not include the word the" and see what happens.
"Implement feature X, and as you do, insert only minimal and absolutely necessary comments that explain why something is being done, not what is being done."
I usually ask ChatGPT to "comment the shit out of this" for everything it writes. I find it vastly helps future LLM conversations pick up all of the context and why various pieces of code are there.
If it is ingesting data, there should also be a sample of the data in a comment.
Same experience. Especially the "step" comments about the performed changes are super annoying. Here is my prompt-rule to prevent them:
"5. You must never output any comments about the progress or type of changes of your refactoring or generation.
Example: you must NOT add comments like: 'Added dependency' or 'Changed to new style' or worst of all 'Keeping existing implementation'."
Depends on what you mean by "defensive". Anticipating error and non-happy-path cases and handling them is definitely good. Also fault tolerance, i.e. allowing parts of the application to fail without bringing down everything.
But I've heard "defensive code" used for the kind of code where almost every method validates its input parameters, wraps everything in a try-catch, returns nonsensical default values in failure scenarios, etc. This is a complete waste because the caller won't know what to do with the failed validations or thrown errors, and it's just unnecessary bloat that obfuscates the business logic. Validation, error handling and so on should be done in specific parts of the codebase (bonus points if you can encode the successful validation or the presence/absence of errors in the type system).
lots of hasattr("") rubbish, I've increased the amount of prompting but it still does this - basically it defers it's lack of compile time knowledge to runtime 'let's hope for the best, and see what happens!'
Trying to teach it FAIL FAST is an uphill struggle.
Oh and yes, returning mock objects if something goes wrong is a favourite.
It truly is an Idiot Savant - but still amazingly productive.
Does the code consist of many large try except blocks that catch "Exception", which Gemini seems to like doing, (I thought it was a bad practice to catch the generic Exception in Python)
Catching the generic exception is a nice middleground between not catching exceptions at all (and letting your script crash), and catching every conceivable exception individually and deciding exactly how to handle each one. Depends on how reliable you need your code to be.
May be these comments are actually originating from training annotated data? If I were to add code annotations for training data, I would sort of expect such comments which makes not much value for me but for the model, gives more contextual understanding…
2.5 was the most impressive model I use, but I agree about the comments. And when refactoring some code it wrote before, it just adds more comments, it becomes like archaeological history (disclaimer: I don’t use it for work, but to see what it can do, so I try to intervene as little as possible, and get it to refactor what it thinks it should)
My custom default Claude prompt asks it to never explain code unless specifically asked to. Also to produce modern and compact code. It's a beauty to see. You ask for code and you get code, nothing else.
I really liked the Gemini 2.5 pro model when it was first released - the upload code folder was very nice (but they removed it). The annoying things I find with the model is it does a really bad job of formatting the code it generates... I know I can use a code formatting tool and I do when i use gemini output but otherwise I find grok much easier to work with and yields better results.
> I really liked the Gemini 2.5 pro model when it was first released - the upload code folder was very nice (but they removed it).
Removed from where? I use the attach code folder feature every day from the Gemini web app (with a script that clones a local repo that deletes .git and anything matching a gitignore pattern).
It just got removed from the Add menu for me too. Now I have to click "Import Code" and then the "Upload Folder" button in the dialog. Maybe you got this roll out much earlier than I did?
It’s annoying, but I’ve done extensive work with this model and leaving the comments in for the first few iterations produced better outcomes. I expect this is baked into the RL they’re doing, but because of the context size, it’s not really an issue. You can just ask it to strip out in the final pass.
So many comments, more verbose code and will refactor stuff on its own. Still better than chatgpt, but I just want a small amount of code that does what I asked for so I can read through it quickly.
That’s been my experience as well. It’s especially jarring when asking for a refactor as it will leave a bunch of WIP-style comments highlighting the difference with the previous approach.
It's trained on the Google style I guess. Google code always feels excessively commented, to the point where I delete comments from Google samples so I can read the code.
I have a feeling this may be a cursor issue, perhaps cursors system prompt asks for comments? Asking in the aistudio UI for code and ending the prompt with "no code comments" has always worked for me
another great tell of code reviewers yolo'ing it is that LLM's usually put the full filename path on the output, so if you see a file with the filename / path on the first line, thats prob a llm output
And comments are bad? I mean you could tell it to not comment the code or to self-document with naming instead of inline comments, its a LLM it does what you tell it to