Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I'll ask again: What is the objective and repeatable metric of programmer productivity? One of the reasons I avoid writing about productivity is that I have no idea how to quantify it.

But if someone can show me how to measure it, I'm confident I can show him how to get paid by it.



Delivering a quality product within time and budget constraints, consistently. Oh, but that's a team effort, you say? Well, that's the working environment, isn't it? Projects are an awfully coarse measurement, you say? Well, that's the only deliverable that matters.

I recall reading years ago (in Peopleware or one of its contemporaries) about a company's evaluation of one of its coders. She was definitely mediocre by every measure they had. But someone noticed that every project she was on succeeded, over many projects and many years. Though she wasn't a monster at the keyboard, something she brought to the team engendered success. How productive was she? Would you want to hire her and have her on your team?

You must measure what you actually care about. Measuring things that you think are factors is fine and noble, but if you're not measuring the actual "product" of "productivity" then you'll never know how well your factors correlate with the real goal.


But someone noticed that every project she was on succeeded, over many projects and many years. Though she wasn't a monster at the keyboard, something she brought to the team engendered success.

Correlation does not equal causation. Another possible explanation: She was a monster at predicting project success and worked her way onto projects that were going to succeed with or without her.


"Correlation does not equal causation."

Sure, but if causation is effectively impossible to rigorously determine, it can end up being all you've got. If you've got the choice between going with someone whose presence correlates with success on the project and one who does not, my inability to be rigorously sure about causation isn't going to make me lose much sleep at night when I chose the correlative one.

I'm coming to dislike the citing of "correlation does not equal causation" when there's no way to determine causation at all, and when scientific certainty isn't the question at hand. At that point it's an excessively-powerful criticism, one that can't be discharged, so is it really a useful criticism at all if so?


I'm coming to dislike the citing of "correlation does not equal causation" when there's no way to determine causation at all

This is perfectly understandable, however this particular discussion is one where the difference between correlation and causation is appropriate. We are talking specifically about paying programmers by their "productivity." If you want to say that "productivity" is defined as the correlation between a person and project success regardless of whether there is a causal relationship or not, and regardless of whether they engage in programming activities, project management activities, picking good project activities, discussion activities, or even just making everyone else espresso so they can produce working code, that's fine.

But what we're saying in that case is that we can't measure the productivity of a programmer, we can't establish a relationship between programming activities on the scale of a single person. I agree that the correlations you can observe are perfectly useful for management and that one can deliver great (or working, or valuable) software without an objective metric of programmer productivity. I agree that this elusive metric may not be necessary. It may not even be useful, as I tried to demonstrate elsewhere when I discussed Ned, Fred, Ed, and Jed.

But that really underscores my point: We can't tie compensation to programmer productivity because we can't measure it. Your point seems to be that we don't have to tie compensation to programmer productivity, that we can tie it to correlation with project success, for example.

Fine with me, I'd say we're in violent agreement and that our stances are compatible.


Yes, I wrote on the assumption we aren't going to establish causation, so I was begging your question.


Sure, in general. In this case managers were fairly familiar with her and the teams and posited that the success was due to what she added to team discussions. I think they were probably correct.

OTOH, having a project success divining rod could have its own value.


"Correlation does not imply causation, but it does waggle its eyebrows and gesture furtively while mouthing 'look over there!'"

http://xkcd.com/552/

Even if she was just a monster at predicting project success, I'd still want her on my team. Can you imagine how useful it would be to have someone able to consistently predict project success working with you?

(Interestingly, there's a less flattering interpretation of the data as well. Given a large enough organization that promiscuously shuffles people onto new projects, some of which randomly succeed, someone is going to have randomly ended up on all the successful projects. We'd then look at them and say "Look how awesome they are!", when really they just got lucky over and over again and we're looking because they got lucky. This requires that the organization size be large relative to the number of possible combinations of projects, though, which becomes increasingly unlikely over time. It's like the people who look at Warren Buffett and proclaim "he just got lucky for 40 years in a row", then do out the math and realize that the chance of someone being that consistently lucky is several million to one.)


So ask her if she's willing to be on this project! Seriously though, ask your developer why there is a correlation between them and project success - if they can tell you, even if it shows they should be a business analyst, you're on to something very useful and valuable.


"Correlation does not equal causation. Another possible explanation: She was a monster at predicting project success and worked her way onto projects that were going to succeed with or without her."

Fair point, but what would you pay for knowing in advance what projects are going to succeed and by inference which may fail...


And how would you measure Intuition then? Maybe she had terrible intuition but is just a fantastically productive programmer.

Correlation not implying causation is a big deal because it's possible to draw (probably exponentially) many alternative causal chains than the one that you're discovering correlation along.

If the above isn't the case, and it's at least theoretically possible to design experiments like that, then correlation does[1] equal causation.

[1] Sort of. See pretty much anything written by Judea Pearl.


It's at least worth running more experiments.


"I recall reading years ago (in Peopleware or one of its contemporaries) about a company's evaluation of one of its coders. She was definitely mediocre by every measure they had. But someone noticed that every project she was on succeeded, over many projects and many years."

She was the Shane Battier of software.

(I happen to be reading a book on basketball right now.)


Interesting you should mention it. Shane Battier gets traded to Memphis, and for the first time in their history, they have a playoff win and are well within the sight of a series win over the top ranked Spurs.


There is none. In fact this whole good programmers are 10x-100x more productive then average programmers is treated as gospel around here, but I don't trust that either as a first principal or obvious assumption.

In my experience, there are good programmers and bad programmers, just like good project managers, bad project managers, good people managers, bad people managers, etc.

A bad employee is bad for your company period.


Not just here, but industry-wide. The x10 figure comes from studies. Brooks talks about them (mythical man-month), and I think Peopleware mentions some. The question then turns to what these studies are actually measuring...

But that's just for programming. When you get into the application of software - unmet needs - you can easily get x100 or far far higher. This is because the value of software is more closely related to the need it meets (that exists in users) rather than any quality of the software (that exists in code).


The only objective and repeatable metric of programmer productivity is if a single programmer is assigned the task of delivering a tool or component, from scratch, by himself. His productivity is the inverse of the time to delivery.

And that doesn't take into account maintenance.


that doesn't take into account maintenance

This excellent insight takes us to a place where productivity is even harder to measure: How does one measure the value of a piece of software? Software that has subtle bugs flying under the rader of our test suite has some kind of negative value. Software that lowers the "productivity" (however we measure it) of future programmers who need to extend or change it has negative value associated with it. How do we measure that?

Imagine the exact same formally specified requirements handed to four different programmers:

The first programmer, "Ned," does the job in a straightforward fashion, and delivers working code passing all tests.

"Fred" does the job using uncommon techniques (parser combinators, for example) in less time and produces less code.

"Ed" proposes that if some of the requirements are relaxed, there is an open source solution that can do the job with trivial integration.

The last programmer, "Jed," sees some commonality with some existing code and proposes that the scope be expanded to produce a new piece of infrastructure (message queues, web services, SOAP, &c) solving a more general problem.

How do we judge Ned, Ed, Fred, and Jed?


Fortunately in the real world such situations never arise except at a few exceptional companies. At most companies the difference between the top and bottom developers is very noticeable.

Ned would have to help Jim with his work because Jim is really not qualified to be employed as a developer, but he eventually accomplishes the bare minimum by using a lot of other people's time. There are probably a couple Jims and a few more people who accomplish the bare minimum if given enough time. Then you have Bob who, every sees as some kind of hero because he works 10 hours every day and "finishes" a lot of work. Unfortunately most of Bob's work requires constant maintenance, often by Bob. Somehow this makes Bob seem like more of a hero.

Most developers are aware of who on their team is more productive even if they cannot quantify it.


By how fun they are.

As people.

I'm convinced the only real-world metric that matters is "how much other people want to work with you". Being an easygoing, fun person facilitates that dramatically.


I think this is true up to a certain point, then stops mattering. Once someone is at the 'pleasant enough to be around' level, I'd much prefer he/she be more intelligent or skilled than more fun if we're working together. This is partly because lower key, quieter, more intelligent people can often turn out to be more interesting and fun in the long run than people who have great social skills and confidence but less depth in their thinking and personalities.


If we extend "fun to work with" to include "quieter, more intelligent" people, I think palish's definition may work rather well. That group definitely falls into "people I want to work with."


Weirdly enough, I just finished reading about how NBA players overwhelmingly wanted to play with Bill Russell over Wilt Chamberlain, as part of an argument that Russell was the better player.


All (but not only) people with good people skills and zero productivity would be motivated to say this.

Edit: I think I might have misunderstood what palish meant by "By how fun they are." to exclude productivity and focus entirely on personality. My mistake, if so.


That might be true in the very short term, but in the medium- or long-term, no one actually enjoys working with "fun" people who have zero or negative productivity.


In which case you join me in disagreeing with the statement that 'the only real-world metric that matters is "how much other people want to work with you'."


No, he's saying that metric has components that include things like 'actually does their job'.


Okay, fair enough. I shouldn't have taken that sentence out of the original context, which said "By how fun they are." Or maybe I just totally misunderstood palish's point; actually rereading, that seems more likely than not. Apologies.


How do we judge Ned, Ed, Fred, and Jed?

We can't judge them in a vacuum. What is the rest of the team like? Are they all of the same mindset as Ned? Can they understand Fred's code? Which requirements did Ed relax and what is the quality of the open source code? Does what Jed suggested make sense?


We can't judge them in a vacuum

Does this imply that programmer "productivity" cannot be judged in a vacuum, no matter how rigorously we attempt to specify the task?


In my experience, programmers' competences aren't so broad that we can take the decisions you presented. Repeatable measures are easier when there is a repeatable conventions to deliver work.

IOW, either I'm supervised by someone that should know relative merits, or I'm judged by the final results. In either case, I'm not paid more, because I believe that the real value is more on errors I prevent than in lines I write.


In my experience, programmers' competences aren't so broad that we can take the decisions you presented.

Are you suggesting that in your experience, programmers never use unusual techniques, never push back on requirements, and never engage in refactoring or infrastructure construction?

Repeatable measures are easier when there is a repeatable conventions to deliver work.

If wishes were horses, beggars would ride. Sure it's easier when there are repeated conventions. But are there actually repeated conventions? You could create an environment with repeated conventions by firing Fred, Ed, and Jed. Now you can measure everything with ease. Are you better off?


In my current job my boss has much control over my work, he is happy with my performance, he knows that I'm way over average, but (please, believe me on this) he's powerless to improve my situation.

In other jobs I've been much more autonomous, performed much better than now, but there was nobody that noticed it, because there wasn't anyone to compare to, bosses didn't really understand what I was doing (just that it worked) and specially because they didn't know what could have gone wrong and I made right.

Time ago I was in a different environment. I was in an intermediate situation. My autonomy was limited, but not so much. Bosses were very experienced and knew how difficult my work was. I got raises and a promotion. Then I was better off, and I'd say so was the company..


> And that doesn't take into account maintenance.

Which is the elephant in the development room. IMO, maintainability is exactly what good programmers should be judged on. Any programming project of any meaningful size and usefulness will spend at least an order of magnitude more time in maintenance and enhancements than in initial development. For the largest projects (Google, Facebook), maintainability becomes 1000 or more times more important than the original development time.

Problem is, you can't quantify maintainability. What a good programmer contributes to maintainability and enhanceability is the absence of certain problems or classes of problems. Everything from choosing an exotic languange that nobody else can maintain, to the choice and organization of source control. We take that sort of thing for granted in HN type projects, but problems like "nobody knows how to compile that" are a really big problem at corporate enterprise scales, especially companies that are not fundamentally technologists, say finance or medical or shipping. An individual programmer is more likely to incur reward if he looks like he's playing the hero in a bad platform and ecosystem ("only this guy has the wizardry to manage that!") than if he built a good understandable platform in the first place.


Very true. A friend of mine, who worked for a very large company at the time, once described the perfect system for achieving recognition as a developer. Firstly, write code that's full of bugs and is hard to maintain. Secondly, once the code is shipped and starts going wrong, make sure you step up to the plate and fix all the bugs that you created. That way, you get kudos for delivering code rapidly and for all the customer support work you do. Contrast that with the guy who writes good quality code who doesn't ship as rapidly and who doesn't have half as much customer contact.


I don't have the citation on hand, but this goes along with a study I read about customer loyalty. Customers who have a problem which is promptly corrected by the company are more loyal than customers that never have any problems.


I think that makes a lot of sense - relationships build loyalty and trust (as long as they are positive in nature). Buying something and never interacting with the producer doesn't build much of a relationship - it's merely a transaction.


I think there's an awful lot of truth in that. You could make the argument that customers of Enterprise Software (that we love to deride) are not paying for the awesome quality of the software so much as for the assurance that the company will pull out all the stops to support them if anything does go wrong with the software. On a psychological level, that leads to a good relationship. On a practical level, that means that the customer is paying not for a great experience when things are going well but for a good experience when things go wrong.


Unfortunately employers (myself included) like to have different programmers working on different tasks, where it can be difficult to measure non-equivalent tasks.


I disagree. It's faster to write garbage than it is to write good, clean code.


I'm not sure I agree with this. Granted good design takes some extra work. But for the most part, in my experience, coders who write bad code also do it slowly (perhaps because they don't understand it), while those who write cleaning code are able to move more quickly. As a project grows, you have to refer back to your own work more often. The price for poor code starts to be paid almost as soon as the cursor leaves the page.


Productivity is a factor of efficiency and effectiveness. Efficiency being the measurement of how fast things get done, and effectiveness a measure of how well things get done (meeting the requirements, including all features, number of bugs/issues that crop up, quality of code).

This is essentially the same thing you are saying, just broken down. Still, it's difficult (not impossible) to measure one programmer against the next in terms of productivity. Using these metrics it is quite possible to measure a programmer against himself over time.

The problem is that measuring the necessary factors affects the level of productivity being measured.


"This is essentially the same thing you are saying, just broken down."

Actually, I took his meaning as productivity being only a measure of how fast the work gets done, your efficiency, and not of how well the code is written, your effectiveness.

His comment: "And that doesn't take into account maintenance.", implies to me that code quality is not a factor.

Unfortunately, this is the measure that seems to predominate, especially with non-technical managers. If it takes twice as long to do it right, all the non-technical manager sees is that it took twice as long. The subsequent reduction in maintenance time and cost from doing it right doesn't seem to get noticed.

Of course, the ideal is to get the job done fast and right.


So much of it comes down to managers not being able to tell good code from bad. If you look at quality as a crapshoot (it's genuinely hard to tell architecture astronauts from good code) then indeterminate quality quickly beats indeterminate quality slowly.


I worked at an enterprise software company with a Scrum system that used two-week sprints. This means we would estimate all the tasks for the next two weeks in a big meeting at the beginning of the sprint, then at the end of the sprint we'd have another big meeting to close out the tasks and do any analysis on met/missed targets.

Tasks tended to get separated so there wasn't too much direct collaboration between coders within a sprint, and the estimations were mostly a group effort, but the person who was getting assigned a task had final say to tweak the numbers. In this environment, it was fairly easy to see who was the more productive programmers. Some people got their tasks done quickly and could easily take parts of other features, and other people were usually late and needed others to help them finish their features. If you were a programmer there, you could easily rank the programmers based on productivity. I bet QA could do the same if they tried.

However, management focused on it being a team effort. As long as we finished everything up by the end of the week they did their best to reward everyone and fire no one.


A friend of mine used this successfully with groups practicing Test Driven Development:

    - First, institute "Test First" development
    - Randomly pick some fraction of new tests to be reviewed 
        (Demand a minimum quality level.)
    - Measure productivity in terms of new passing tests completed
        with some multiplier for the quality score
This system could be gamed, but it would require a conspiracy consisting of a large fraction of the team, and any system could be gamed in that situation. This is the only method I know of that works when analyzed logically and has been shown to work in practice.


Which would be rewarded in your situation, building 30 really simple pages by hand or spending 1/2 that much time to code something which generated those pages automatically?

Once again, the best programers write as little code as possible which allows them to focus on quality over quantity. It's easy to turn 10 lines of code into 20 classes and gain nothing, it's much harder to see those 10 lines of code once someone has written the 20 classes.


Which would be rewarded in your situation, building 30 really simple pages by hand or spending 1/2 that much time to code something which generated those pages automatically?

This method doesn't have web pages in mind. More along the lines of something like domain models for something like energy trading.

Writing tests for 20 shallow, repetitive classes would result in 20 shallow looking test classes, and that programmer would be called on it.

Web pages are really a narrow area of programming.

Once again, the best programers write as little code as possible which allows them to focus on quality over quantity.

How many functional specs are they accomplishing while they are doing this? I've known programmers who've created "entire new functional sections" in their app with 25 lines of code. Your issue is addressed by paying attention to functional specs.


Look, all metrics can be gamed.

Take an existing project find all the places that link to each section of code and you have some idea how reusable things are. Tell people your doing this ahead of time and you promote spaghetti code. Ask people how difficult an objective is and you get a wide range of biases based on how you use the information etc.

The secret is not how to get the most reliable data, it's how to get the best outcomes including they way people try to game the system.


Look, all metrics can be gamed.

Look, now it's obvious you didn't carefully read the original comment! (Left as exercise.)


I'd be concerned that the approach you laid out is something of a proxy for lines of code delivered.

My most productive days are the ones where I've removed huge blocks of unnecessary code.


I'd be concerned that the approach you laid out is something of a proxy for lines of code delivered.

Absolutely incorrect. Did you actually read my proposal? If the tests are of high quality, then the code passing them will be substantive. Also, in new development, what matters is functional specs delivered, and in a properly run TDD project, these two are strongly related.


My thought was that judging productivity by lines of code or by tests written (whatever the quality) is judging by what was done rather than by what should be done.

An extreme example of this sort of thing would be a programmer who looks at what needs to be done and says: "We don't need to write ANY code there's an open source app/library for doing exactly what we need here."

By making that suggestion it's quite possible that they've saved their company months of work (versus implementing everything themselves). However, in a purely Number of Tests written * Quality of tests metric they're a miserable failure.

I think TDD and tests are a solid way to write software, but I don't think they're a great way to judge programmer productivity.


KLOC mentality applied to TDD.


That would be correct, except that the tests are peer reviewed in my proposal. You can't "code me a new minivan" in this situation, unless the test review process gets corrupted.


I agree but then why salaries of top mgmt. is not decided using the same criteria.


I'm of the opinion that programming is more an art than a science, so to me asking the objective metric of programmer productivity is like asking the objective metric of painter productivity.

In my experience, if you're technical you just kind of know if a programmer is good or not. If you're not technical, you find a technical person you trust and ask them.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: