Plotting the source code “TODO” history of the most popular open source projects

snorberhuis · on May 17, 2021

I always suggest TODO's to be replaced during a Code Review by: 1. A ticket number that will be picked up shortly if should still be part of a larger change. In a healthy team, this is done within two weeks and you know where to perform changes when you pick it up. 2. You do not add a TODO, but explain your current understanding of what is wrong and what should be done. This way you can refresh the knowledge if it ever again is touched. With a simple TODO, this knowledge is usually not writtend down.

hinkley · on May 17, 2021

I don't think I can agree with this, since some TODOs have a different target audience.

The kind of TODO you're talking about is splitting a ticket into parts so you can hit an artificial deadline (ie, it's no longer 'done done', it's just 'done'). If the artificial deadline is your boss, then we're in a bad place. If it's another team needing a feature, that's pipelining and that's often okay.

For the TODOs that make it to PR without human error, I write most of them for the next person who adds functionality to an area, to either encourage them to do so or at least not make things worse. But sometimes it's for the person who hits the Rule of 3.

Those TODO's should be addressed in six months, not two weeks, and having someone call me on them in a PR is not particularly helpful. No, I'm not going to quadruple the scope of this story because you don't like the word TODO.

stevenhuang · on May 17, 2021

Or do both. Might as well add TODO: to make it stand out as a thing that can be improved while also making it greppable.

mihi · on May 17, 2021

Might also be cool to automatically create these tickets (when commited to master?). Then you don't forget, and even if they end up not that detailed, you at least get a nice list of all of them.

actinium226 · on May 17, 2021

TODO: Automatically create tickets based on TODO comments.

We'll get to it. Someday.

ableal · on May 17, 2021

https://en.wiktionary.org/wiki/round_tuit

There used to be an ASCII art version occasionally slapped on Usenet posts: "here's a round tuit, you can go ahead."

hinkley · on May 17, 2021

Caption: "An artist's impression of a round tuit."

nrub · on May 17, 2021

One option if you're on github, https://github.com/marketplace/actions/todo-to-issue

mihi · on May 20, 2021

That is really cool! Thanks for mentioning.

efnx · on May 18, 2021

If you want a cli tool for this I have one here https://github.com/Schell/todo_finder

You can output to markdown or to GitHub issues with a token.

I used to operate a service that did this for you.

jgwil2 · on May 17, 2021

I like this idea, but I would add that my preference for scenario 2 is to add HACK (or FIXME) along with the explanation instead. This way it is still searchable, but you make it a bit more clear that there is no obvious fix at the moment.

schleiss · on May 17, 2021

OP here. I used `git log -G TODO --reverse -p -- . > ~/Desktop/test.txt` and used the results in PHP to aggregate the data as I couldn't think of the bash one liners in the other comments :(

pc86 · on May 17, 2021

Stupid question, is "TODO" in this instance case sensitive?

tastroder · on May 17, 2021

It's case sensitive by default, you can add -i/--regexp-ignore-case to disable that.

thamer · on May 17, 2021

You might also want -w for whole words, otherwise commits by your colleague Todor might cause some confusion.

kuu · on May 17, 2021

Interesting to see that most of them are almost always growing. It would be interesting to compare them to some other metrics, such as TODO/lines_of_code or TODO/num_contributors, to compare the TODO's with the size of the project. I guess that as project gets bigger, it also gets more TODO's

cies · on May 17, 2021

> TODO/lines_of_code

At least this is needed for any meaningful comparison between project (or even in projects themselves, as some double in SLOC count over a few months)

orthonormal · on May 17, 2021

Ignoring the numbers on vertical axis, plots look like they are normalized to fit the plotting area. TODO/LoC should basically have the same form.

kuu · on May 17, 2021

I don't think so. These plots are normalized by the max number of TODO's, but that may vastly differ from max TODO/LoC of each project.

cies · on May 17, 2021

Indeed! When those numbers jump up/down in the graphs, there's prolly a code merge/purge cause to it.

travisgriggs · on May 17, 2021

It was interesting to see that Swift has 2K+. Seemed kinda high when I consider it's youth and uptake relative to some of its plotted peers.

I don't know wether to suspect that's because it has

a) an overly parliamentary development process that just creates lots of bookkeeping side affects

b) a very aspirational development community that is busy writing tons of "try to take over the world" goals to improve in various and sundry ways

c) indicates a lot of short sighted/highly focused language evolutions that leave a long trail of todos because that kind of "we have no big picture" creates a lot of corner/edge cases that need "todo" signs to document them

d) something else?

JulianMorrison · on May 17, 2021

Interesting that some of them don't grow, or don't grow much, over time.

Someone at PostgreSQL and Django is actually reading and fixing the TODOs.

jsmcgd · on May 17, 2021

It would also be interesting to plot the number of TODOs against the size of the code base too. One would assume that as a project grows, the number of outstanding TODOs would grow too. Where and when this isn't true, might reveal something more interesting.

WrtCdEvrydy · on May 17, 2021

We took some action on this internally at a place I worked.

We had a couple of projects that had unit tests neglected so we enforced that you had to round up to the next closest 1% on the package.json for your merge to be approved (as well as adding some unit tests)

In 3-4 weeks, code coverage slowly went from 20% to 73%.

hinkley · on May 17, 2021

I bet it's more to do with rate of growth than total volume.

I believe that one of the cognitive dissonances with people who think a lot of code is good news is that they become overwhelmed by how much they would actually write if they stuck to their convictions and so they start using TODOs to make themselves feel better about doing the wrong thing.

Projects that grow slower I suspect have fewer TODOs.

anarazel · on May 17, 2021

I suspect that in postgres'case that is just because a TODO list was moved out of the code...

zufallsheld · on May 17, 2021

While it's probably the case for postgres or Django, todos getting less could also mean high code churn.

macksd · on May 17, 2021

Yeah I noticed several cases were there to do's drop dramatically. I wondered if that was because that was a module that had a lot of to-dos that was also low quality, and the subject of a complete rewrite or replacement at some point. Golang was also interesting because of a huge increase followed by an almost equally huge decrease a short time later, both of them seemingly vertical. I wonder if something got reverted, or if they actually went back and addressed a bunch of to-dos.

QuercusMax · on May 17, 2021

Might be some type of automation / tool adding a bunch of TODOs (as part of a migration?) which are then automatically removed in a later step.

shireboy · on May 17, 2021

Well, I'm glad I'm not the only one who never gets around to my //TODO#s.

One related tip for devs: I've started adding "You are here" as a placemark for where I'm working in the code. So for example, on friday, if I want to pick up quickly next monday, I add "//TODO: YOU ARE HERE Finish doing foo". Then on monday, I search for "You are here" and pick up where I left off easier. Saves me a few cycles, though if I'm honest, I have quite a few of those hanging out too.

berkes · on May 17, 2021

I've been annotating my work several times a day, with `INK`, from "leave some water in the well", a productivity hack from Hemingway[0].

I forgot how I went from "Water in the well" to "ink the well", though. It's been a while since I started doing it, and I wrote a blog-post[1] with some scripts and helpers that I still use.

[0] https://www.fastcompany.com/3021905/hemingways-secret-to-mai... [1] https://berk.es/2012/05/30/leave-some-ink-in-the-well/

hinkley · on May 17, 2021

> in other words, never end a day’s work without knowing how you are going to start the next day.

This turns out to be very hard advice for developers to follow. I don't say that as a complaint about Hemingway, but as a complaint about developer neuroses.

I have tried many, many times to convince people to associate their 'sense of completion' not with the act of getting their changes into master but the act of committing their changes (or pushing it to a branch, if you are PR-driven). It works less than half the time, and almost always with the more junior people...

So many incidents of someone staying late to finish something, pushing it, then coming in late the next day (because they stayed late) to a bunch of upset coworkers who had to clean up the mess they made.

Completionism will be the death of us all.

berkes · on May 17, 2021

> but the act of committing their changes (or pushing it to a branch, if you are PR-driven)

I'm not sure if I understand you entirely correct, but it seems the "you have to commit" is a large part of the problem.

You don't.

Your harddrive is not going to crash between tonight and tomorrow. And if it may, and this is crucial, git is not a backup system, so get a proper backup system in place.

Commit when you have a coherent piece of work done. Or maybe commit a "WIP: working on foo, halted for emergency hotfix X" if stashing is out of the question.

Part of my "INK" workflow is not committing to RCS. Leave the dirty state as another mental nudge what you were working on; and add your current memory-state as annotations in the code behind an INK.

I've mentored a junior and he was the opposite, never committed untill he was finished, sometimes work of multiple weeks, then he commited it with "finished the FooBar". We agreed he would try to commit at least daily. So then he made one commit a day. "17:00 going home. commit" in the logs. Every day. Needless to say, we did not keep him for long.

hinkley · on May 17, 2021

> Commit when you have a coherent piece of work done.

No, that's the problem.

Commit when you are available to deal with the consequences of commit. Having a coherent piece of work reduces, but does not eliminate the chance for problems. If you leave before the code builds, you've created problems for someone else to fix. If you haven't budgeted time for the rollback to go wrong on the first try, you're creating problems for other people.

This is why reliability and responsiveness of CI/CD systems matter much more than most people allow. On a good project, pushing things after 4 is probably a bad idea. On less well run systems, anything after 3 might be risky. On a bad one, 2:30 might be pushing things. So depending on meetings and lunch time there may be very small intervals where you can push things, and when the clock is ticking we begin to rationalize, which just makes the likelihood of failure increase.

berkes · on May 18, 2021

> If you leave before the code builds, you've created problems for someone else to fix.

I'm curious to your workflow now. How can it be someone else's problem if you do not commit (and/or don't push) your work? Is everyone working only on master/main/develop branch? Do you work in a shared drive? Is your entire team working off a dropbox or network drive (I've seen this, I needed eyebleech)?

account42 · on May 18, 2021

A commit does not have any consequences. A push may.

superdimwit · on May 17, 2021

even better, don't leave it as a comment. Leave it uncommented, as a syntax error, so when you try and run your code on Monday morning you are reminded of it!

dep_b · on May 17, 2021

Just today I announced a sweep through all of the TODO's and to either turn them into issues, stories or remove them. Biggest problem in Xcode is that it clogs up the warning list and the real warnings get swamped by them. But it's funny to see that Rust has less outstanding TODO's than our brand new 45KLoC project.

the_duke · on May 17, 2021

TODOS are great documentation. They often reveal design decisions, suboptimal implementations and the thought process of the creator.

This information is often completely lost in issues that noone will ever look at again.

I also like to do TODO sweeps, but with a bias towards rewriting the into documentation or leaving them in if the are actionable.

nemo1618 · on May 17, 2021

Instead of TODO, I often add a NOTE instead now, e.g.:

   NOTE: Attempting to re-synchronize here would cause an infinite loop because...

This prefix distinguishes such info from normal comments, which exclusively pertain to the code-as-written.

stevenhuang · on May 17, 2021

Agreed but that's not quite what's being discussed here.

Notes are great but the context is to make a note about something that can be improved in the future (something actionable), hence it being a TODO.

lucb1e · on May 17, 2021

Huh, if I add a TODO to typical code then it's something that ideally needs to be done there (e.g. "todo handle errors from invalid inputs" -- it'll work if you don't but... it's not ideal), or in LaTeX reports it's something that needs to be resolved still before shipping to the customer (e.g. "todo set \endDate variable"). Never is it a design decision or documenting my thought process. Why would that ever be labeled todo or, if you mean that todos indirectly convey that, why would having todos throughout the code as a means of conveying design decisions be "great documentation"?

"This information is often completely lost in issues" - of course, because an "issue" is "a vital or unsettled matter" and are not meant to be revisited once resolved. Writing design decisions or thought processes in issues seems equally weird to using TODOs to document that. They're kept around because it costs nothing and you might want to refer back to it for details on some past problem, but I've never heard of that being considered documentation.

smitop · on May 17, 2021

The Rust graph is inaccurate since Rust uses FIXME instead of TODO. Almost all of the usage of "TODO" in rustc relates to the todo!() macro, usually to stub out parts of tests.

est31 · on May 17, 2021

Yeah you get vastly different numbers if you grep for FIXME (instead of the hundred or so todo's):

    $ rg FIXME compiler | wc -l
    1036
    $ rg FIXME | wc -l
    18656

cerved · on May 17, 2021

Can't you configure Xcode to filter out such warnings?

TODOs offer valuable insight but they may not warrant turning into issues or stories. Removing valuable information because it's cluttering seems like a shame. Better if you can filter it in such a way that it doesn't clutter

moring · on May 17, 2021

Quick idea: You could use a different wording that doesn't get recognized by Xcode but sticks out in a similar way. It would take some time to get used to it though.

dep_b · on May 17, 2021

I think it's actually my linter generating these warnings! But I do like them to be resolved so I consider them issues.

cerved · on May 17, 2021

everybody wants todos resolved but chances are that if a decision is made to fix, make issues or nuke them, they'll just get nuked instead

nickjj · on May 17, 2021

> Just today I announced a sweep through all of the TODO's and to either turn them into issues, stories or remove them.

I think it makes sense to get rid of inline TODOs.

On new unshipped projects I've gone down the road many many times where it starts with a TODO comment above a chunk of code and then it turns into a few lines of context or a summary of my thoughts. Then comes linking to references, recapping my thought process on the "why", potentially writing a couple of versions of the code and keeping them commented out, or even having a chat with a friend over IRC and copy / pasting the conversation next to the code.

Now you open a file and suddenly it's your code mixed with a massive brain dump of notes, research and a ton of other things that have no business being in your code base and of course your intent is to remove all of that stuff once you get the well thought out implementation but eventually all of this stuff builds up. Then it happens across multiple files and eventually it becomes really hard to figure out what you need to do.

Putting all of that stuff into a kanban board has been a huge win for me. Now I just drop all of that contextual info into a "research" list and I pick things off that list when I'm ready to really do them.

A made a video about this process a while back at https://youtu.be/HHOkcCqsipE?t=77. It shows an example of the before and the after.

lurchedsawyer · on May 17, 2021

Sorry to break it to you but these TODOs are a measure of technical dept you've accrued by limiting dev work to issues and stories.

WrtCdEvrydy · on May 17, 2021

"How noone could have seen how we got hacked (then we checked Git and found a `TODO: Fix Authentication`)"

lurchedsawyer · on May 17, 2021

"Yes, but authentication won't make the boat go faster"

twelvechairs · on May 17, 2021

Is it controversial to say that those with low TODOs are pretty clearly the cleanest packages I enjoy working with most? (Postgres, Rust, Django, VueJS, maybe Python)

mceachen · on May 17, 2021

Use and semantics of TODOs are decidedly not consistent across these projects.

Just because you (and I) see correlation with our expected biases shouldn't be construed as proof: merely interesting chart wiggles.

I think the shape (monotonic increase or sawtooth) can be used to see how a project handles either missing features of technical debt.

hinkley · on May 17, 2021

But if the semantics speak to project philosophy differences that result in a worse or better experience, then it is important that they don't have the same semantics.

actinium226 · on May 17, 2021

Well, is the converse true? Are the packages with high TODOs the dirtiest packages that you least enjoy working with?

hinkley · on May 17, 2021

"I'll do it later" is one of the more challenging personality faults to deal with in coworkers. There are very few people you can trust to make that statement, while most of the rest behave as if you should trust them as well, even though everybody knows they won't in fact do that later.

Not only will they not do that later, but at some point they will compare their productivity to someone who ended up having to 'do that later' for them, causing their other work to suffer.

_mlxl · on May 17, 2021

I was just thinking the other day that searching for TODO is probably a very good way to search a project for potential bugs or security issues. E.g; I see a bunch of todos in Firebase iOS SDK that look kind of interesting to an attacker. Without looking into how the methods are called I can't say if they are actually exploitable (and I am sure Firebase is fuzzed to high-hell) but it was a little seed planted in my head.

_kbh_ · on May 17, 2021

for a great example of this just have a look at the following macOS privesc the source of which came with the handy comment "deal with OOB".

https://blog.zecops.com/research/from-a-comment-to-a-cve-con...

lucb1e · on May 17, 2021

Am security tester. Can confirm.

Sometimes the vulnerabilities are just handed to you on a dark-themed platter and I don't look them in the mouth.

mekkkkkk · on May 17, 2021

The growing number is hardly surprising, yet I don't know exactly what the implications are. There are a lot of different categories of TODOs, ranging from harmless ("it would be nice if this was improved at some point") to critical ("this is a really bad solution that needs to be fixed ASAP"). I wonder if these repos has official definitions of what a TODO entails.

mnahkies · on May 17, 2021

Perhaps we should be including a priority in such comments, eg:

TODO: high: don't use bogosort

Would only be useful if it became a defacto standard though

esyir · on May 17, 2021

At that point, make an issue and add it to the tracker.

schleiss · on May 17, 2021

Yes, I agree. Priority and some context on what to do instead. Because as soon as it's a standard, big software vendors like Jetbrains could automatically categorize and mark the various lines.

spockz · on May 17, 2021

Back when I used Eclipse there was another level called “FIXME”. Maybe it works in IntelliJ as well.

tirpen · on May 17, 2021

It does.

At least it works in PyCharm, so most likely it works in all IntelliJ based IDEs.

ExtraE · on May 17, 2021

Yes, it does. In fact, you can configure it to use whatever you want. (I have it also match BUG as another priority)

kevingadd · on May 17, 2021

Some projects use FIXME for things that need more urgent improvement to distinguish them from 'would be nice' TODOs. I wonder how many of these repos that is the case for?

Siira · on May 17, 2021

I use a priority number after my tags: @todo2, @todo9, etc.

egypturnash · on May 17, 2021

Damn, what happened to Typescript in May/June 2018? It jumped from around 730 TODOs to 3000, and has never really come back down.

Wikipedia's list of versions suggests that this was probably related to version 3 happening in July 2019.

dkarras · on May 17, 2021

I'd like to see a todos added / todos disappeared graph that would account for the velocity of development of a project. For some, while the todos grow, the relative "todo per LOC" might decrease etc.

jpswade · on May 17, 2021

Todos are generally terrible practice in code, as they often don’t give any indication how they are going to get to done.

They never get prioritised and very rarely does anyone get to do them.

So what’s the point? It feels like a todo is really only there to serve as an excuse for suboptimal solutions.

FartyMcFarter · on May 17, 2021

> It feels like a todo is really only there to serve as an excuse for suboptimal solutions.

That's not the only way to use a TODO, but even if it was, it's still valuable to mark suboptimal code.

Imagine you're investigating a performance bug and see a comment in related code which says "TODO: use a faster sorting method", that's probably going to be helpful.

pjc50 · on May 17, 2021

Delivering something is far more preferable to never delivering a perfect solution, most of the time. TODO is a way of dealing with guilt about imperfection while actually shipping.

_Understated_ · on May 17, 2021

I dunno... I think they have their place.

I'm working on a personal project right now and in order to see how things look, if they work etc, I've got a bunch of vanilla js frontend code and //TODO in a few places to call an API and get actual data. It works great for now as I've hardcoded everything, got it looking broadly like the finished product, and it means that I just have to do the API calls (and programme the API too, of course).

I use them a lot.

eXpl0it3r · on May 17, 2021

On a personal project, I can see this working, on a team project, you're not gaining anything with TODOs, because the chances that you or your colleagues actually go back and fix/implement the TODO are close to zero.

Not only are you rarely gonna have the time to fix the TODO, but weeks, months, years later when you run into a TODO in your code, you have no idea what the actual requirements were, why it was not implemented, why it hasn't hurt anyone and whether anyone is actually needing it. Thus the TODO comment will remain forever, as you can't figure out what to do about it, without investing a lot of time and energy on requirement engineering.

Personally, I've started to block PRs with TODO comments that aren't directly mentioning the future implementation story/bug. As such, even if the TODO is forgotten in some way, you at least will find a reference point to what should have been done here.

berkes · on May 17, 2021

> because the chances that you or your colleagues actually go back and fix/implement the TODO are close to zero.

This depends entirely on the team, company and/or work, though. It certainly is not a given.

> TODO in your code, you have no idea what the actual requirements were, why it was not implemented, why it hasn't hurt anyone and whether anyone is actually needing it

This depends on the task you are TODO-ing. Sure, if it is a "TODO: seems broken, fix." or "TODO: make sure that users don't see this", you are putting not just the wrong things in TODOs you are not giving them enough context. Compare that with a "TODO: this duplicates the routine in FooBars#bar_bar, but we cannot move this to a generic helper until the BarBar can handle both ActiveUsers and PendingUsers. Once that polymorphism is implemented, this can be DRYd up", which gives context, predicaments, and communicates that the author knows it is suboptimal, and explains how the author would've fixed it.

eXpl0it3r · on May 18, 2021

It certainly all "depends". My main point is, that if you don't actively plan in to fix your TODOs, then they usually won't be fixed, which you can also kind of see in a lot of the charts where the amount of TODOs just goes up. And as such the question arises whether you really gain anything from them.

I much rather have someone finalize their implementation and create follow up stories/tasks to indicate what needs to be done next, than having hints of what should've/could've been done and nobody ever going back and cleaning those up.

mgfist · on May 18, 2021

Here's my most common use case:

1) Need to do a hotfix on some bug. Fastest fix is to just turn off something.

2) I turn it off (maybe commenting out the line) then add a TODO above it with the ticket number that corresponds to the ticket for turning it back on once the issue has been investigated.

3) Once I start working on the ticket to turn the thing back on, having the todos makes it easy to know exactly what to do (context isn't lost) and I don't end up missing some things because I can just search all of the TODOs.

4) If a TODO is missed, we have a script that will re-open any ticket for which a TODO ticket number is still in the codebase. i.e, let's say we have "TODO: xy-123 ..." If i close xy-123 without deleting that line, the script will re-open the ticket and comment saying that there is a remaining todo

marcosdumay · on May 17, 2021

> they often don’t give any indication how they are going to get to done

It would be a really bad practice to merge procedures and policy with technical artifacts. TODOs embedded in the code are technical artifacts, they say what should change, they really shouldn't say how and when.

> They never get prioritised and very rarely does anyone get to do them

Well, you are faulting the tool for your development practices. If your team doesn't look at TODOs, you indeed should avoid them. But that doesn't mean anything for other people.

At my workplace embedded TODOs would be bad too. At personal projects I find them quite useful with a similar life-cycle to warnings: you keep them there while the feature is being developed, but they must be gone by the time it's complete. Other people have different practices, and may successfully use them in different ways.

the__alchemist · on May 17, 2021

I'd rather have a todo (that marks the line with a nice bright color in my IDE) than the immediate alternatives of #1: Not implementing a feature because it has room for improvement, or #2: Leave out the deficiency marking because it makes the code look more complete.

kozziollek · on May 17, 2021

Cool! That date axis though...

PHP's 3 years 2011-2014 are much shorter than 2 years 2014-2016. NodeJS's years 2017-2020 that were ~50% longer than 2013-2017.

yiyus · on May 17, 2021

I guess that each data point is a commit and they just made more commits in the 2014-2016 period than in 2011-2014. But it's just a guess.

pabs3 · on May 17, 2021

I wonder what other strings people use like this. So far I have seen FIXME TODO HACK XXX BROKEN.

Anthony-G · on May 17, 2021

With Vim, each filetype has its own set of rules for syntax highlighting (mostly contributed by third-party maintainers). A quick grep through the `/usr/share/vim/vim82/syntax` directory shows the most popular set of such strings that are to be syntax highlighted by Vim are:

    TODO FIXME XXX

Other strings that appear are:

    BUG NOTE CHECK DEPRECATED HACK TBD FIX TEMP REFACTOR REVIEW HACK Todo

I guess these strings are most likely a mix of popular idioms for programming languages or particular to the habits/culture of the individal contributors to the syntax files.

GrumpySloth · on May 17, 2021

At work we use "TODO(JIRA-XXXX):", where JIRA-XXXX is the Jira ticket for the TODO. Every TODO needs an accompanying Jira ticket. Otherwise it won't pass code review.

jrochkind1 · on May 17, 2021

If it's got an accompanying JIRA ticket, what do you experience as the value of also including the `TODO` in a source comment, over just the jira ticket alone?

[edit: reasonable answers below, thanks!]

gbear605 · on May 17, 2021

Suppose a future developer is working on a separate ticket that touches the same code. If there’s an inline TODO, the future developer knows that the TODO needs something changed, which can help them understand how the code works and they might wind up resolving that TODO as part of the second ticket. If there isn’t an inline TODO, the issue might be resolved without that first ticket ever being touched. I see it inevitably leading to a lot of dead meaningless tickets crowding up the backlog. If an even later developer then was assigned one of those unknowingly-resolved tickets, they might spend a significant amount of time looking through the codebase to find where the code needs to be fixed, only to realize later that they’re looking for nothing.

GrumpySloth · on May 17, 2021

When reading the source code, you immediately see an acknowledgement of the deficiencies, instead of assuming that everything is as it should be, or needing to investigate what is ok and what is not. It also maintains a link between the ticket and the location in source code throughout future source code changes.

Izkata · on May 17, 2021

From the opposite direction as the other responses, I've recently been running across years-old weirdness that involves digging through commits and merges to find the original Jira issue to explain it, then add a comment and reference to it to the code. They're not TODOs, they're explanations for future devs about why something was done on an odd way, so explicitly distinguishing a TODO would be better for those cases.

DominikD · on May 17, 2021

I tend to use my initials for the TODOs I introduced.

anoncow · on May 17, 2021

The Golang Todo chart is interesting. It has a sharp peak in April 2018. Linux and Swift seem to be the most in number and uniform in growth.

forgotpwd16 · on May 17, 2021

I wonder what happened in Go there.

kgravenreuth · on May 17, 2021

most likely code generation?

thenoblesunfish · on May 17, 2021

Anyone got a (git-based) one liner to get this info for an arbitrary repo?

cerved · on May 17, 2021

I'll throw this into the mix

  git --no-pager grep -I --full-name --line-number 'TODO' |\
   sed 's/\(^[^:]*\):\([0-9]\+\):.*/\1\n\2/' |\
   xargs -d '\n' -n2 sh -c 'git --no-pager blame "$0" -L $1,$1'

Not blazing fast but I think it does okay-ish.

What it does:

1) git-grep for files that are checked in, not "binary" that contain the string 'TODO'

2) sed away the actual line contents (git-grep doesn't seem able to only output file:line-nr)

3) use xargs and sh to call git blame on that file:line-nr

This shows the last time the TODO line was modified, ie: it may have been created 10 years ago but somebody modified the last yesterday

edit: one might want to throw in --cached to git-grep to search the index and not just the current working-tree

pizza234 · on May 17, 2021

You've asked for it ;)

This prints the years - you can the group and plot them as you wish (it should be fairly easy, but I wrestled enough with git). It's a non-rigorous script (eg. it assumes nobody's email/name includes an `YYYY-MM-DD`-like string, and that filenames don't include the colon character):

    grep -P '\bTODO\b' -n -R -- * | awk -F: '{ system("git blame "$1" -L "$2","$2) }' | perl -lne 'print /(\d{4})-\d\d-\d\d/'

The working is actually fairly simple:

- grep prints filenames and lines

- awk captures the filename and line, and executes a git blame on it

- perl matches the year and prints it

In the Perl matching expression, month and day are not strictly necessary, but disambiguate potential 4-digit numbers in the email/name.

I'm very underwhelmed by the lack of customization of the `git blame` command - `--porcelain` is also uncustomizable, which makes things even uglier.

Note that `git blame` also mishandles some edges (printing "fatal: file [...] has only 1 line").

peff · on May 17, 2021

What do you want `blame --porcelain` to do that it doesn't? Using:

    git blame --line-porcelain "$1" -L "$2,"$2" |
    perl -MPOSIX=strftime -lne '/^author-time (\d+)/ and print strftime("%Y", localtime($1))'

I suppose it would be a little more convenient if you could ask `git blame` to format the whole line itself, but that wouldn't be part of the `--porcelain` output.

All that said, that pipeline is quite slow on something like linux.git, as it runs a series of blames which will walk over the same history many times. I think:

    git log -STODO --format=%ad --date=short

would be much faster (it's not _quite_ the same thing, as it counts TODOs which went away, but is a reasonable variant).

pizza234 · on May 17, 2021

> What do you want `blame --porcelain` to do that it doesn't? Using:

It increases the complexity, due to time conversion. One can certainly solve the general problem by throwing enough awk/perl/sed at it, but an option to customize the blame output would make it significantly more ergonomic (and the oneliner much simpler).

unwind · on May 17, 2021

As a starting point,

    git log --format=format:"%at %H" | sort -nr

gives a list, with the oldest entry first, of just "hash timestamp" pairs, one per line.

You can then use e.g.

    date -I --date='@1620720025'

to convert the timestamp back to a human-readable date in ISO format, i.e. "2021-05-11".

The next step would be to loop over the list, checkout each revision, grep and wc the TODOs, and collect into date buckets. Anyone? :)

valyagolev · on May 17, 2021

here you go

  git rebase -i --exec 'ack TODO | wc -l >> log' HEAD~20

(create a temp branch first!, then substitute your starting revision, and save-quit the editor that'll pop up)

Kipters · on May 17, 2021

I thought Golang's number of TODOs was much much lower than the others until I looked at the scale

garblegarble · on May 17, 2021

I'd be quite interested in seeing data on the age of TODOs over time - for instance, are there lots of old TODOs sitting around gathering dust while newer TODOs get fixed, or are they getting worked through?

atoav · on May 17, 2021

Django doing really well (or nobody dares to write TODOs anymore)

Gravityloss · on May 17, 2021

What about amount of TODOs per character or line of code?

staticshock · on May 17, 2021

Agreed, this could make the graphs much more comparable and maybe reflective of project culture. Comparing something like the linux kernel to VueJS is nonsensical without any normalization with respect to overall repo size.

rplnt · on May 17, 2021

Golang has/had TODOs in generatred code I assume?

wiz21c · on May 17, 2021

What about FIXME ?

varjag · on May 17, 2021

Tangential: magit-todos is a great package for Emacs/magit users to keep track of todos in a project.

javier10e6 · on May 17, 2021

TODO: Look Ma Im doling out work for whoever reads this. Are you the intern? You are it.

38932ur98u · on May 17, 2021

Would be interesting to add major release versions to the graphs as well

geon · on May 17, 2021

How did golang gain and lose 12k todos in an instant?