One of the tricks I like to use to get up to speed quickly is making a hyperlinked dictionary / glossary (a wiki or a github page with anchor links work nicely). The biggest roadblock to understanding the code you read is not knowing the exact meaning of the terminology chosen by the original developers. Building the dictionary helps internalise it quickly, leading to much less confusion when reading.
Terms present in the UI (or API docs) of the segment you wish to understand are a good starting point.
I do something similar when reading terminology used in code but for a different reason. I like to steal the good name ideas and I keep them in a mind map I can later look at for naming inspiration.
You might be interested in this single page app I made to handle similar use cases of automatically hyperlinking documentation:
https://github.com/elusivegames/termify
Sorry there's no live demo, but the readme should explain it easily enough.
Thanks for the link. I enjoyed the breakdown, and can see how it works, but wow that code would be easier to read split up over three php files and a template.
I agree with the opinion that reading code is not like reading literature but I do find it useful to compare the 2 things. The book "The Psychology of Computer Programming" talks about how most good novelists actually read lots of novels while amateur novelists naively think they have enough inspiration without reading ideas written by others. And the same thing happens with programming, most really good programmers have read a lot of code in addition to writing code. I have found this to be the case for myself, the more code I read the more creative inspiration I have available when writing my own code.
I am being curious here. Do anyone really print out the code so that the comprehension really increases? I can imagine all the benefits, but can someone give any first hand experience were this has actually helped them?
Also, how better is it than the code map on Sublime Text or syntax colouring we have anyway?
Sure, at a prior job we passed out hardcopies of a side-by-side diff before scheduling a peer review (including only the affected methods/functions).
No syntax highlighting, but it was quite useful. Off the top of my head, the two main benefits are: 1) You free up your computer for the review 2) You get a different form factor to interact with.
Regarding the first benefit, having a printout means you can use your computer to run the application, look up docs, and even (ironically) navigate the code in your IDE.
As for the second, having a printout means you can spread out the pages to compare different bits of code, draw diagrams, connect arrows, circle bits of code for notes, etc.
Yes, I do print source code. If it's somebody's code (or mine from long ago)
I need to rewrite or reorganize and it fits on several pages of paper, I find
it much easier to analyze it this way.
I can doodle and write remarks on top of the code, but I think much more
important aspect of written code is it gives different friction profile than
browsing the same code with editor (or on a screen in general).
This "friction" thing is hard for me to put into words, as it's quite subtle
feeling. I can't freely search the code for identifiers nor run it, so I'm
sort of forced to make notes and remember things, but I can mark parts of the
code in different ways, and I can write an alternative pseudocode alongside
the original, and I can add a TODO or REMOVE or UNNECESSARY marks.
I do a similar thing for code I need to write, except, obviously, I have
nothing to print yet. I write the parts of the code on paper, (what allows
me to omit all the uninteresting trivia). Then I can proceed with what I would
do with printed code.
I've done this recently for a few of the open-source third-party frameworks we use. I'll print out the source code and relevant unit test code for a particular class I'm interested in learning about (or in the case of js code I'll usually just print the entire file). I'll start by reading the unit tests first and then read the actual source. The real benefit of doing this is that I can easily write hand-written notes alongside code to come back and review later. I've found this way of note-taking to be superior to other note-taking tools that allow notes alongside code (like org-mode).
I have done, for code that fits on 2-8 pages (basically a demonstration program). I used vim's ":hardcopy", so the printed code was already syntax-highlighted in the way I was used to.
This did let me work on the code when away from my workstation, but I didn't otherwise find it very helpful. If you have lots of code, some form of code navigation is very helpful.
I like to print things out because it makes them very concrete. I can literally see the whole code base I need to become familiar with. It's very useful for working in legacy code bases, or when trying to reverse engineer what is actually happening. The place memory (e.g. remembering that a variable was defined on the bottom of the third page of the printout) helps to situate you.
However, I'd say it's only worth it if you need to do "deep reading" of convoluted code, or if you're in very unfamiliar territory. Also, it doesn't work when the "printout" becomes too large—a full printout of a 30K LOC project is about 600+ pages. No way you're going to read all that!
I feel like most of the code I read at work should never be printed. Even if it is production code, it would make it very "real" and... embarrassing. Haha.
I don't need to read the article to know in this case the best reading of their code for rendering plain text in HTML should be to have absolutely no code to read instead of any code they can have that might actually be brilliant. Lol.
Not sure in which context these tips were written. Having been through all of this, and frequently comparing my code-reading abilities with those of my junior colleagues (hey, I even managed to make it to a point where I essentially make my living from reading other's code), maybe a few more-realistic tips (and a shorter list):
1. actually read the bloody code until you understand well what it does. Don't stop before that, or it won't count.
2. read as many different projects as you can -- different languages, technologies, paradigms, coding styles.
3. Don't spend time on code-comprehension tools, it won't work (speaking from personal experience. The latest was with "Understand C++" from SciTools, a great tool BTW). Master grep.
4. Drawing flowcharts, running under debugger, etc -- all that would be great but nobody allots time for that anymore (neither yourself nor your managers).
5. Don't hope for usable doc for external libraries. Exceptions exist, but it won't be a rule.
My brain will tune out whatever piece of code I've already read once. So if I initially skim some piece of code, figuring out what it -really- does is HARD.
Every subsequent look at that piece of code will be colored by my initial (mis)understanding of how it works
Same here, when I am looking for something and I know the area I will read the method or whatever it is rather quickly and be like "oh okay, this is what it does so the problem is not here".
Then after some time when I look at it again, I notice that actually I missed a step in that method which actually did contain the problem, which I did not see by skimming over the piece of code earlier.
3.) -> While newer grep versions and SSDs helped with performance, using ag or any of it's many implementations tends to get results faster, especially on large code bases.
4.) I only had to resort to flow charts a couple times so far, mostly with Java libraries to externalize the data and control flow, because it grew just waaayy to large to keep in my head. A particular vivid memory I have here is netty (which also has a tendency for a particularly convoluted and complex control flow realized through modifying pipelines conditionally in many places).
I'm interested in your take on #4: why wouldn't you allot time to run code in a debugger? Personally I would like all my code to be written against a debugger and with test suites allowing me to step through it.
I've been chided for not doing this in the past (I just read the code usually) and would actually agree with that person in retrospect.
Because, in many cases, getting it into the debugger has a huge fixed cost -- imagine your code should run on an embedded platform: find the right board, get it connected it into the infrastructure (getting right signals in and out), get the debug build of the image etc -- you see the point.
More generally, studying the code under debugger means you need particular input data -- the one suitable for the part of the code you study. That can take time to get.
You've made some really good points about reading code in both of your posts. I'm fairly new to it, but I've started a daily habit of reading code everyday and I've realized this is an important skill I've omitted in the past.
Since most of the code I read (C# code) can be ran on my platform, I'll sometimes run into that rare piece of code that I can't believe it does what it says it does, in those cases I'll download the repo and debug it and many times be surprised to find there was a gap in my understanding. Reading code helps me find knowledge gaps I didn't even know were there.
Ah I see. This is actually a very good point. One team I was on would constantly get hung up on this sort of issue. I believe a more general ability to just run a code review and study session would've saved lots of time.
Well, if the data you're ingesting/parsing/etc is anywhere near well formed or having a known structural definition, you should be able to mock the code handling it reasonably well, even up to the point of covering many weird edge cases. And even with perfect data up front, you would still encounter bugs in your production code at some point.
In fact, isn't this usually the case with any application not yet in production?
As an example, I had a teammate who would get blocked on needing JSON that matched a production set of data. Yet, she knew what the various fields of the data would most likely contain. She ended up writing the code anyway without the "production" data she asked for and it worked great minus a few small changes. It was ultimately just psychological block, IMO.
For me, if I have to go through "layers of abstraction hell" to get to the point I want the debugger to be in, I'm better off with pencil and paper. This is especial true with enterprise software.
For a moment I thought this was another of the "meh, it uses JS, I want to surf with my Lynx!!!!" rants but wow that page is really slow. I only counted 9 secs for the original page and 11 secs for your link, still ... for a bit of text? Not even an image? What are they doing? setTimeout before they show the HTML?
Edit: opening the same link again is instant. Weird.
Ward switched to using his project, Smallest Federated Wiki, which basically doesn't do a thing without JS.
In fact, if I remember correctly, the slow load times are because it loads the entire wiki tree in the background to make linking easier for the engine... But a worse experience for the user.
Spinner loads instantly for me. Everything else takes ~10 seconds.
I'm on a 250 megabit connection. I haven't seen a page take this long to load in months.
Once a particular page is visited, it does load quickly. But for new ones it's always the same experience. Unless they're buckling under load of some kind and this provides a fancy cache, it's pretty ridiculous.
---
edit: decided to watch the network on a new page, http://wiki.c2.com/?LiterateProgramming for instance. Most everything downloaded in 10s of milliseconds, but the LiterateProgramming XHR request sat waiting for 6.5 seconds. Maybe they are struggling for some reason?
Hi, I'm working on a static renderer for the c2 source json, AMA.
I'll be hosting a mirror (I think the license allows for that but I'll have to check) and then integrating it with the one hosted on the original domain.
I also kind of want yo set up some kind of edge caching static rendering with loose constraints on how up-to-date pages need to be (because the point is providing a read-only Lynx/CURL experience), but I think this might end up being too much work technically or politically.
Those pages are loading instantly for me at present, but even if they were slow, your comment is far too uncharitable to make for a good HN post. Physically angry? Come on—it's a single programmer's project from decades ago. He's probably tackling the problem of upgrading it by himself. The last thing he needs (or we, for that matter) is an angry chorus of internet entitlement.
If you're hot under the collar, please cool down before commenting here.
On my browser (up-to-date Firefox with uBlock Origin), I can load pages from links from other sites (about 5 seconds to load SoftwareMasterpiece), but trying to follow links within the site gives me nothing but a spinner, no matter how long I wait.
I'm impressed at how the author of this website managed to take plain text content and literally make it slower to load than a 1080p 60FPS video. You really have to try to do something like that, bravo. I normally try to avoid commenting on stuff like this, but that's pretty much all I can say considering it took me actual minutes to load (from Japan).
In lieu of the article's content, I'll spare readers the trouble and provide something better (and faster to load!):
-Read lots of code
There you go, you're well on your way to becoming a code-reading master!
>how the author of this website managed to take plain text content and literally make it slower to load
I didn't comprehensively debug the whole page but in a cursory 2 minute trace in Chrome Dev Tools, I found javascript that makes an XHR request for a 265k text file[1] with 36000 lines. The javascript code then parses the names (which takes ~300ms) and then seemingly does nothing with the information. It doesn't show up anywhere on any subsequent DOM rewrites. The actual visible content that people read is only 10k.
For mobile users, you punish them twice by (1) subtracting 265k from their data plan and then (2) wasting battery power on cpu to parse it for no reason. That one example of useless bloat is probably multiplied dozens of times elsewhere in the lifecycle of the webpage.
It seems like a list of all the pages on the wiki, presumably to turn mentions of those pages in the text into links automatically. Still, it seems better to do this on the server, if possible.
Well I finally loaded it. It's a piece about how the support code (argument parsing, error handling) around the code that "actually does anything" has more lines than the "actually does anything" code.
I must admit, this was my first thought too. "Oh look, c2 wiki is back up! Huh, a loading spinner, it must be doing something fancy... oh look, it looks EXACTLY THE SAME AS BEFORE."
The general content seems on track, if obvious ('find the high level logic', 'draw flowcharts if it's complicated', 'grep for whatever you're looking for') but some comments are out of date ("print the code because there's more code space on your desk" no longer applies, I literally have more square centimeters of monitor than of desk space and that monitor area can scroll!) and some are flat-out wrong ("software doesn't always have an entry point"? Really?)
c2 now uses Federated Wiki, which for engineering reasons (I think), loads the entire wiki tree as a key part of the json that serves as the content for every page.
Looks like new maintainers took control and decided the old codebase wasn't to their liking.
Thing is, these changes come with loads of tradeoffs the newcomers usually are completely oblivious to and the new product is of lesser quality than the original, but they feel good about themselves now :)
I keep JavaScript if on mobile and around 95% of sites still work. This one not so much. If your site is just text please dispense with all the JavaScript.
Given the state and performance of the site the tag line "The original wiki rewritten as a single page application" is practically satire: It's a SPA now, of course performance was degraded by a factor 100+.
Terms present in the UI (or API docs) of the segment you wish to understand are a good starting point.