Like every single software development principle, this phrase really needs to be explained with more context and considered with more subtlety than the usual "It's best practice" advice, for a number of reasons (some of which are stated in the article) of which I think the following two are the most important:
Firstly, if you want to actually understand how the 'wheel' is invented then yes, you should re-invent it. The process of re-invention involves discovering what actually goes into some of the tools you use. Even if you never use your re-invented wheel in public (often advisable), the process of learning is invaluable in understanding the tools you do use.
Secondly, however, what wheel are we even talking about? The wheel is a timeless design, seemingly perfectly suited its task. The software libraries and tools that are usually picked as targets for 'not re-invention' are not wheels. They're higher level abstractions that pre-suppose certain ways of working. There's no timeless design here, just a bunch of arbitrary desicions about how something should work at a higher level with some amount of the decision making you'd have to do without it already done. Is this a bad thing? Of course not. But understanding that all of the 'wheels' are just this and are not magical black boxes that can't be understoor or shouldn't be looked at is important.
There are good times to not immediately go and re-implement *and publish* existing tools (emphasis on the publish, you should do things for learning), but understaning why you're choosing to do or NOT do a 'reinvention' is crucial.
The usual allegoric rebuttal is to show how many types of wheel there are, and how wheels have improved over time. The wheel as a basic concept is to mount a rotating circular shaped thing on a platform for moving. The first wheels were made of wood, at some point spokes were introduced, then other materials. Many more innovations (many of them mutually exclusive) are required to realize a formula 1 car, or a ralley car, or a bus, or a plane.
There is so much work put into bicycle wheels, and you can carefully select a wheel depending on if you are building a time trial bike, a regular road bike, a commuter bike, mountain bike…
If we could create physical things as easily as software, we’d absolutely see bicycle hobbyists and certainly little shops designing their own wheels.
Sometimes when reinventing the wheel, you realize that pot-holes suck. Learn to appreciate the wheel and reinvent the road. Learn to appreciate the road and reinvent the rocket.
I guess I'm walking home.
This is well-put. I think it speaks to "its the journey, not the destination", not learning to ski by only reading books, and Chesterson's Fence[0], off the top of my head.
Stone, then wood, then wood with spokes, then wood with spokes and iron trim, then we eventually added rubber, rubber tubing, then all metal spoke with rubber. For Mars rovers they made new types of air-less wheels.
The saying ‘do not reinvent the wheel’ is just silly
Re-invented or re-implemented? The design was always the same just the materials have changed (and maybe there's something about motor racing and new wheels being available every year...)
> this phrase really needs to be explained with more context
No, absolutely not. This is a first person problem.
The primary reason to reinvent wheels is to provide the most immediate and/or portable solution to a problem. By immediate I mean only from the perspective of the product.
That is a first person problem because many people cannot, such as neurological impairment, imagine any operating condition beyond the efforts of their own individual labor. That is where the cliche of not reinventing wheels is most used as an empty defensive argument.
I'll add "reduce code size and complexity" to the list of benefits. A python library to calculate a simhash, or track changes on a django model, or auto generate test fixtures, will often be 90% configuration cruft for other usecases, and 10% the code your app actually cares about. Reading the library and extracting and finetuning the core logic makes you responsible for the bugs in the 10%, but no longer affected by bugs in the 90%.
Hard agree. A library should not inflict complex use cases' complexity on simple use cases, but sometimes they do, either because they're poorly designed or because they're overkill for your use case. But often I see pain and complexity excused with "this is the library that everybody else uses."
Sometimes a simple bespoke solution minimizes costs compared to the complexity of using a massive hairball with a ton of power that you don't need.
One big caveat to this: there's a tendency to underestimate the cost and complexity of a solution that you, personally, developed. If new developers coming onto the project disagree, they're probably right.
The big caveat is a big one. Choose your battles wisely!
There are plenty of things that look simpler than an established library at first glance (I/O of specialized formats comes to mind quickly). However, a lot of the complexity of that established library can wind up being edge cases that you actually _do_ care about, you just don't realize it yet.
It's easy to wind up blind to maintenance burden of "just a quick add to the in-house version" repeated over and over again until you wind up with something that has all of the complexities of the widely used library you were trying to avoid.
With that said, I still agree that it's good to write things from scratch and avoid complex dependencies where possible! I just think choosing the right cases to do so can be a bit of an art. It's a good one to hone.
>
The classic "I'll write my own csv parser - how hard can it be?"
I did as part of my work. It was easy.
To be very clear: the CSV files that are used are outputs from another tool, so they are much more "well-behaved" and "well-defined" (e.g. no escaping in particular for newlines; well-known separators; well-known encoding; ...) than many CSV files that you find on the internet.
On the other hand, some columns need a little bit of "special" handling (you could also do this as a post-processing step, but it is faster to be able to attach a handler to a column to do this handling directly during the parsing).
Under these circumstances (very well-behaved CSV files, but on the other hand wishing the capability to do some processing as part of the CSV reading), likely any existing library for parsing CSV would likely either be like taking a sledgehammer to crack a nut, or would have to be modified to suit the requirements.
So, writing a (very simple) own CSV reader implementation was the right choice.
You were incredibly lucky. I've never heard of anyone who insisted on integrating via CSV files who was also capable of consistently providing valid CSV files.
> I've never heard of anyone who insisted on integrating via CSV files who was also capable of consistently providing valid CSV files.
To be fair: problematic CSV files do occur. But for the functionality that the program provides, it suffices if in such a situation, an error message is shown to the user that helps him track down where the problem with the CSV file is. Or if the reading does not fail, the user can see in the visualization of the read data where the error with the CSV file was.
In other words: what is not expected is that the program gracefully has to
- automatically find out the "intended behaviour" (column separators, encoding, escaping, ...) of the CSV parsing,
And for anyone who's not convinced by CSV, consider parsing XML with a regex. "I don't need a full XML parser, I just need this little piece of data! Let's keep things lightweight. This can just be a regex..."
I've said it many times myself and been eventually burned by it each time. I'm not saying it's always wrong, but stop and think whether or not you can _really_ trust that "little piece of data" not to grow...
It's easy if the fields are all numbers and you have a good handle on whether any of them will be negative, in scientific notation, etc.
Once strings are in play, it quickly gets very hairy though, with quoting and escaping that's all over the place.
Badly formed, damaged, or truncated files are another caution area— are you allowed to bail, or required to? Is it up to your parser to flag when something looks hinky so a human can check it out? Or to make a judgment call about how hinky is hinky enough that the whole process needs to abort?
Regardless of the format if you're parsing something and encounter an error there are very few circumstances where the correct action is to return mangled dat.
Maybe? If the dataset is large and the stakes are low, maybe you just drop the affected records, or mark them as incomplete somehow. Or generate a failures spool on the side for manual review after the fact. Certainly in a lot of research settings it could be enough to just call out that 3% of your input records had to be excluded due to data validation issues, and then move on with whatever the analysis is.
It's not usually realistic to force your data source into compliance, nor is manually fixing it in between typically a worthwhile pursuit either.
At my current workplace the word "bespoke" is used to mean anything that is "business logic" and everyone are very much discouraged from working on such things. On the other hand we've got a fantastic set of home made tooling and libraries, all impressive software engineering, almost as good as the of the shelf alternatives.
> [...] Be wary of abstractions made for fabricated use cases.
Very well put and I would argue this applies to general software development. This is one of the biggest difference between my freshly-out-of-college self and me right now and something I try to teach engineer I'm trying to grow into "seniors".
Too many time have I seen a lot of wasted efforts on trying to build hyper flexible components ("this can do anything!") to support potential future use cases, which often never come to be. This typically results in components that don't do that much and/or are hard to use.
Design your components as simply as you need them, but no simpler. This typically gives more flexibility to grow rather than accounting for currently-not-needed use cases.
I'm not sure, perhaps this is an issue with how our craft is taught, but I think we're missing something when we talk about the (economic) tradeoffs when making these decisions.
Keeping components simple, decoupled and with minimal dependencies allows for a high degree of optionality. When you pair this with a simple system that is easy to reason about - you're doing huge favours to your future self.
On the other hand, hanging off a bunch of unused features, especially ones that have interacting configuration - that's more like adding lead weights to your future self. It weighs you down and adds friction. And we tend to do a terrible job of predicting our future needs.
Kent Beck does a great job discussing the costs of these tradeoffs in his recent book "Tidy First". It builds upon the YAGNI principle, but adds a level of cost analysis that should allow you to sell these ideas to the managerial level.
I think some of it comes from a sense of admiration or even awe of complex systems. You’ve just been introduced to some of these tools as a college student and you really want to use them as they seem so neat.
But then as you start dealing with over-engineered system, you become intimately aware of the downsides of poorly abstracted system and you start becoming much more careful in your approach.
Somewhere, at a tender age, I read a paean to the Chevy Straight Six engine block. One of the most heavily modified engines of all time. Later on when I read Zen and the Art of Motorcyle Maintenance it had a similar vibe and effect.
I still sometimes use it as an allegory. As it originally shipped it had very low power density. It’s an unremarkable engine. But what it has in spades is potential. You can modify it to increase cylinder diameter, you can strap a giant header on it to improve volume and compression more. You can hang blowers and custom manifolds and more carbs off it to suck out more power. IIRC at the end of its reign they had people coaxing 3, almost 4 times the first gen OEM horsepower out of these things. They had turned it into a beast for that generation of “makers”.
I don’t think there’s an accepted set of concrete criteria for making software that can absorb major design changes later without a great deal of effort and stress. How you write code that can accept an abstraction layer at the last responsible moment.
Some people have an intuition for it, but it’s sort of an ineffable quality, buried in Best Practices in a way that is not particularly actionable.
So people having been scarred by past attempts to refactor code reach for the abstraction in fear, just in case, because they don’t know what else to do and it’s not written down anywhere, but abstractions are.
Definitely. Among other things, this is akin to Work Hardening.
Refactoring tries to avoid this but the slope of that line can still end up being positive and you can’t refactor forever unless you’re very careful. And “very careful” is also not yet quantified.
In scientific software development "don't want to reinvent the wheel" is an oft-repeated mantra that I like to push back on when I hear it. To be fair it's often used in the context of "we'd rather/like to collaborate", rather than an appeal to use "that exact thing".
Re-inventing things independently in parallel (parallel evolution analogies) is perhaps a strong indication that something interesting is going on. How do we know we got it "right" if we don't converge independently? If we invent a square wheel, and stopped because "wheel", we'd be in a horrible place. Science is a process, the process of reinventing is a great way to realize new things, and to train, at a low level, scientists. I suspect the process of re-inventing is also important in building out our (long term) ability to depend on our "gut feelings", thus providing the ability to nudge us to experiment along one path or another.
Reinventing certain wheels is arguably the only way to be sure you understand them. For example Monte Carlo sampling implementations.
The logical conclusion of this mindset is mathematics, where people literally prove all of algebra and calculus to themselves as they learn it. There are good pedagogical reasons for doing this.
I'm seeing scientific software being reinvented as kind of a scattered shitshow compared to when it was all drawn up in house using very few computer languages most of the time originally.
OTOH, if you're going to invent something every day anyway, it's really not that bad to reinvent the same thing more than once, even in the same year ;)
Can be more variety than watching the same movie more than once, plus you might even get more out of it the second time either way.
Now what hits me is when you don't even know if your goal is possible but it would be good if it was, and it seems maybe within reach more than other impossible things, it takes a lot of efforts like this before any pay off. You eventually get the feeling of what to pursue, and when to wind down if a fruitless pursuit is the kind that cannot be brought to bear.
Now with things like the wheel, you know it's possible to begin with, and quite popular already too, so that's a different starting point when you know it's a proven quantity of some kind, you're not taking the same kind of risks.
Then you have to consider if you eventually do make something difficult possible, chances are it's barely possible and kind of touchy. But at least then you know it's possible, that can be one of the best milestones right there. Probably still need to keep on inventing so you can make it even more possible, though.
I think it's most satisfying to have a mix of reinventions along with an ongoing effort on things that haven't been fully invented yet.
- Did YOU invent the last wheel, or any before it? If not, then you will make the same mistakes the last inventor made. Until you make a bunch of wheels, you'll probably suck at it.
- You learn more by studying old wheels than trying to bang one out yourself. Study the principles behind the designs rather than shooting from the hip. This is why we study medicine, science and engineering, and don't try to invent new medicines, sciences, and engineering disciplines from ignorance.
- Novel-ness is only good when it fixes more problems than it introduces. Novel-ness introduces not only "bugs" from a new, untested design, but also the problems of changing people's expectations, requiring new training, and possibly requiring all the other parts to change to fit the new wheel (which is often more work than just dealing with the old shitty wheel!). New things are not inherently good. Incremental change is almost always better, even if it's harder because you have to struggle with all the existing limitations. Your job isn't to do something easy, it's to make a product better.
- Only after you make your new wheel will you find out if it's good or not. Don't go into it assuming what you have is better just because you like the idea of it better. In fact, the more you like the idea, the more you should question it; ego is a dirty liar. Kill your darlings and be prepared to accept others' judgement of the thing, and the reality of it moving on the road.
Especially the last one is just a painful reality of the process. I think it's somewhat similar to the scientific method in that regard. Often your hypothesis is just false, but that does not make the attempt less valid.
I this this ought to be an iterative process, such as study principles so that you don't start from absolute scratch, then make a wheel that sucks, study some more and do more resarch yourself, rince and repeat until satisfied.
There is so much nuance that doesn't get captured in all the study you can do about how a certain thing is made.
I find that for me to deeply understand something, I have to reinvent it. There's SO many nuance not captured in textbooks or papers. I reminds me of the feeling of attending lectures by great teachers. Everything makes so much sense until you start the homework assignment.
Creating a lighter, faster wheel that only works with sort of cart your company builds might invite accusations of “reinventing the wheel” but often it’s just “doing engineering”
But when you need other people to work with your wheel it's much harder to find those capable enough/that want to deal with that. Also when shit hits the fan and the wheel reinventor left the company you're gonna wish you had a standard wheel.
Yeah I’ve seen solo wheel reinvention lead to frustration with people leaving. It should be group effort or at least lots of “teach the wheel” (can we switch off this analogy yet :p)
I am a wheel re-inventor. Nice article. The _Specificity_ reason listed is usually the driving factor for me, with the others being downstream effects. In short, the wheels are often built for a different chassis than the one I'm using. Adapting these may be more difficult than making a new wheel.
This. I'm trying to set up a personal developer blog and I have a very specific set of requirements. Tried several static blogging frameworks. Apart from the software bloat, I found myself spending a gratituous amount of time trying to customize pieces to my needs. Finally got sick of it, landed up writing a python script and after a few days and < 200 lines of code - I have a working prototype that fits my current needs. I will be doing more of the wheel re-inventing for other projects I have in mind. I strongly agree with your observation that adapting generic frameworks to specific needs probably have longer learning curves than just building a new wheel.
The specificity reason is interesting as it relates to what feels like an assumption in software that all software components are neatly shaped boxes that a) perfectly encapsulate an area of functionality and b) can be placed into neatly shaped holes of 'required functionality', neither of which ever seem to be true.
The boxes are always weirdly shaped with odd edges, and the holes they have to fill are always oddly shaped. The code written to join the two is at minimum glue that seals the two edges, but also usually involves converting one shape to another.
It's funny how often engineers say "it depends"! Even a basic axom like don't reinvent the wheel doesn't always apply. After all, we have entire industries dedicated to doing exactly that! Goodyear spends a lot of time investing in new wheel technology every year.
I laughed out loud and can't agree more. If I have to boil down law school to one sentence, it's this one... it depends on the particular circumstances.
If you're a PL/compiler/GC hacker, then here are wheels you should reinvent in order to even just have a basic idea of WTF is going on in the Big Serious Production Wheels that you might end up being gainfully employed to maintain:
- Invent your own language, and write at least an interpreter for it, to get a feel for what makes a language work at all, or not.
- Invent your own compiler IR. Don't worry if you make a bunch of mistakes. Don't worry about whether you follow my advice for how to do it, or anyone else's advice. Make it your own and learn from your mistakes.
- Invent your own way of doing the major compiler optimizations. Of course there are established ways of doing SSA conversion, CSE, constant prop, regalloc, instruction selection, etc. But you won't know why they are that way unless you try to make your own, and then either succeed because you are smarter than everyone else (it's possible that you are), or succeed because you literally reinvented the wheel (then you understand the compiler's wheels better than your friends because you got there from first principles), or you'll fail (most likely outcome) but then you'll understand why the real wheels work the way that they do better than others.
- Reinvent memory management. Write your own GC or whatever.
That's how I learned the craft. Can't think of a better way to learn.
Reinventing the wheel often means breaking things. Innovation often requires getting rid of backwards compatibility. The status quo is promoted by those who have invested in it, so disrupting it can be met with fierce resistance.
When I tell people that file systems are antiquated and need to be replaced with something much better; I often get strong push back.
This is a wheel that I have been reinventing for some time. It's not something that can be fixed with minor tweaks.
> Reinventing the wheel often means breaking things. Innovation often requires getting rid of backwards compatibility. The status quo is promoted by those who have invested in it, so disrupting it can be met with fierce resistance.
This is a terribly simplistic take. To start off, it ignores the fact that major rewrites are known for failing.
The object store I have created offers a fast, convenient way to attach multiple tags to each object. Each tag is a single value which can be a string, number, boolean, datetime stamp, etc..
You can still organize things using folders, but you can query for things based off their tags as well.
So, you'd tag all these files with .. what? "pnpm, node_modules"?
find $HOMEBREW_PREFIX/opt/pnpm/libexec/lib/node_modules -type f | wc -l
1450
For your .pdf or .docx that are just lumped into $HOME/Documents, I'm with you, they don't exactly need "folders" but a _filesystem_ based only on tags would be horrific in any python or node shop since their primitive unit is a file within a folder
I decided to reinvent SVG Filters because Safari won't let them be used with HTML canvas elements[1] ... well, at least that's the official reason.
Unofficially I just wanted a decent filter engine[2] that worked well with my canvas library, and there were things like SVG "filter chaining" which I really liked the idea of - but could they be done in a simpler way[3]? Also: proving to nobody that canvas filters can be done (fairly) efficiently without the need for WebGL shaders (because: why not?). And then I discovered I really like coding up filter functions and went a bit mad with them[4][5] and now the hole is so deep the only option left for me is to keep digging ...
[1] - Though I think that's changing this year? May already have changed - but I'm not gonna un-reinvent my filters even if the Safari folks have shipped the fix!
> > Minimize third-party dependencies. Master the platform’s built-ins and accumulate your own toolbox over time.
> I would like to work with this person.
Before you make such a bold claim, consider that this way of doing programming leads to an accumulation of knowledge that is barely transferable if you switch jobs.
I would claim that a central reason why many programmers like third-party dependencies is that being knowledgeable in these is a set of skills that is better marketable and transferable to jobs at other companies. In other words: applying these approach to programming can easily result in a career trap.
You're probably right. Anyway, I will be eternally grateful to the creators of H2 Database for showing that it is very much possible to create an entire database including a web interface without forcing dozens of additional dependencies onto your users.
To me this is one of the differences between making a wheel for learning or a wheel for innovation, as I mention in the blog. The latter can truly be reinvention, while the former is indeed often more like simply making your own wheel.
there is an implicit assumption that we've all settled on what transportation looks like. its a car with 4 wheels, and windows, and a chassis and a glovebox. there are already a whole spectrum of gloveboxes we can but, open source, SaaS, etc.
why would you make a new glovebox. its got a latch, and a hinge.
The single biggest advantage of a self invented wheel is that you know how to use it. Most lines of code you write, because you know what they mean. Getting to know the pros and cons of somebody else's wheel is often quite an investment.
I follow advice I heard decades ago (by I think John Carmack): Implement it yourself and then throw it away.
This is a great way to learn why libraries and tools like compilers are the way they are.
I practiced this in the late 90s by making my own 3D maths library and my own “standard” library.
I ended up using the built-in 3D maths in DirectX and the C++ STL in the commercial game engine code I worked on later. But having practiced on my own libraries helped me understand the standard ones a lot better.
Yes, it has to be balanced appropriately. I didn't emphasize this a lot in the blog post, but this is really important in a work setting, compared to when working on personal projects. I still think wheel-reinvention has its place in professional teams, but the related challenges should be taken more serious in that case.
I think these folks, who reproduced the aircraft of the Wright Brothers using similar materials, methods, and tools, are a peak example of the value of reinventing as a path to deeper understanding.
I'm definitely a wheel re-inventor (in the educational and entrepreneur sense), and I've come across the same learning points myself. Recently, I just started blogging about my little wheels, and I think its been one of the most satisfying aspects of working on a project!
A lot of time I find myself reinventing the wheel is because of some framework that has decided that inverted catenary flooring is the future and their provided wheels are excellent for going in the standard use case direction.
The color contrast of the site, especially the background color and the font thickness (on android phone at least) is not good. Many visually impaired people would have very difficult time reading the content.
Firstly, if you want to actually understand how the 'wheel' is invented then yes, you should re-invent it. The process of re-invention involves discovering what actually goes into some of the tools you use. Even if you never use your re-invented wheel in public (often advisable), the process of learning is invaluable in understanding the tools you do use.
Secondly, however, what wheel are we even talking about? The wheel is a timeless design, seemingly perfectly suited its task. The software libraries and tools that are usually picked as targets for 'not re-invention' are not wheels. They're higher level abstractions that pre-suppose certain ways of working. There's no timeless design here, just a bunch of arbitrary desicions about how something should work at a higher level with some amount of the decision making you'd have to do without it already done. Is this a bad thing? Of course not. But understanding that all of the 'wheels' are just this and are not magical black boxes that can't be understoor or shouldn't be looked at is important.
There are good times to not immediately go and re-implement *and publish* existing tools (emphasis on the publish, you should do things for learning), but understaning why you're choosing to do or NOT do a 'reinvention' is crucial.