Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Tips for Reading Code (2014) (c2.com)
177 points by _qc3o on Dec 30, 2016 | hide | past | favorite | 69 comments


One of the tricks I like to use to get up to speed quickly is making a hyperlinked dictionary / glossary (a wiki or a github page with anchor links work nicely). The biggest roadblock to understanding the code you read is not knowing the exact meaning of the terminology chosen by the original developers. Building the dictionary helps internalise it quickly, leading to much less confusion when reading.

Terms present in the UI (or API docs) of the segment you wish to understand are a good starting point.


I do something similar when reading terminology used in code but for a different reason. I like to steal the good name ideas and I keep them in a mind map I can later look at for naming inspiration.


You might be interested in this single page app I made to handle similar use cases of automatically hyperlinking documentation: https://github.com/elusivegames/termify Sorry there's no live demo, but the readme should explain it easily enough.


Thanks for the link. I enjoyed the breakdown, and can see how it works, but wow that code would be easier to read split up over three php files and a template.


Definitely. The goal was something I could just drop in as one file to a folder of docs, hence the ugliness :)


On the subject of reading code, I thought this was a good article (basically, you can't read it like you read literature):

http://www.gigamonkeys.com/code-reading/


I agree with the opinion that reading code is not like reading literature but I do find it useful to compare the 2 things. The book "The Psychology of Computer Programming" talks about how most good novelists actually read lots of novels while amateur novelists naively think they have enough inspiration without reading ideas written by others. And the same thing happens with programming, most really good programmers have read a lot of code in addition to writing code. I have found this to be the case for myself, the more code I read the more creative inspiration I have available when writing my own code.


I am being curious here. Do anyone really print out the code so that the comprehension really increases? I can imagine all the benefits, but can someone give any first hand experience were this has actually helped them?

Also, how better is it than the code map on Sublime Text or syntax colouring we have anyway?


Sure, at a prior job we passed out hardcopies of a side-by-side diff before scheduling a peer review (including only the affected methods/functions).

No syntax highlighting, but it was quite useful. Off the top of my head, the two main benefits are: 1) You free up your computer for the review 2) You get a different form factor to interact with.

Regarding the first benefit, having a printout means you can use your computer to run the application, look up docs, and even (ironically) navigate the code in your IDE.

As for the second, having a printout means you can spread out the pages to compare different bits of code, draw diagrams, connect arrows, circle bits of code for notes, etc.

It was pretty useful.


Yes, I do print source code. If it's somebody's code (or mine from long ago) I need to rewrite or reorganize and it fits on several pages of paper, I find it much easier to analyze it this way.

I can doodle and write remarks on top of the code, but I think much more important aspect of written code is it gives different friction profile than browsing the same code with editor (or on a screen in general).

This "friction" thing is hard for me to put into words, as it's quite subtle feeling. I can't freely search the code for identifiers nor run it, so I'm sort of forced to make notes and remember things, but I can mark parts of the code in different ways, and I can write an alternative pseudocode alongside the original, and I can add a TODO or REMOVE or UNNECESSARY marks.

I do a similar thing for code I need to write, except, obviously, I have nothing to print yet. I write the parts of the code on paper, (what allows me to omit all the uninteresting trivia). Then I can proceed with what I would do with printed code.


I've done this recently for a few of the open-source third-party frameworks we use. I'll print out the source code and relevant unit test code for a particular class I'm interested in learning about (or in the case of js code I'll usually just print the entire file). I'll start by reading the unit tests first and then read the actual source. The real benefit of doing this is that I can easily write hand-written notes alongside code to come back and review later. I've found this way of note-taking to be superior to other note-taking tools that allow notes alongside code (like org-mode).


I have done, for code that fits on 2-8 pages (basically a demonstration program). I used vim's ":hardcopy", so the printed code was already syntax-highlighted in the way I was used to.

This did let me work on the code when away from my workstation, but I didn't otherwise find it very helpful. If you have lots of code, some form of code navigation is very helpful.


Vim's :TOhtml can be handy if you need to see syntax highlighed source on a tablet or some other device that doesn't run vim.


I do, usually if one of the following is true:

* I haven't worked with the codebase in the last 3 months

* I wasn't the author

* I'm heading into a major refactor

Paper is malleable. I can cut it out, slide things around on a desk, to show better relationships.

I can write all over the code without forcing any format or alignments.

It helps me understand the control flow, and the exposed API, faster.


I like to print things out because it makes them very concrete. I can literally see the whole code base I need to become familiar with. It's very useful for working in legacy code bases, or when trying to reverse engineer what is actually happening. The place memory (e.g. remembering that a variable was defined on the bottom of the third page of the printout) helps to situate you.

However, I'd say it's only worth it if you need to do "deep reading" of convoluted code, or if you're in very unfamiliar territory. Also, it doesn't work when the "printout" becomes too large—a full printout of a 30K LOC project is about 600+ pages. No way you're going to read all that!


I feel like most of the code I read at work should never be printed. Even if it is production code, it would make it very "real" and... embarrassing. Haha.


I don't need to read the article to know in this case the best reading of their code for rendering plain text in HTML should be to have absolutely no code to read instead of any code they can have that might actually be brilliant. Lol.


So sad to see/experience c2 as a Javascript-dependent experience.


This article is basically saying use your brain.


Not sure in which context these tips were written. Having been through all of this, and frequently comparing my code-reading abilities with those of my junior colleagues (hey, I even managed to make it to a point where I essentially make my living from reading other's code), maybe a few more-realistic tips (and a shorter list):

1. actually read the bloody code until you understand well what it does. Don't stop before that, or it won't count.

2. read as many different projects as you can -- different languages, technologies, paradigms, coding styles.

3. Don't spend time on code-comprehension tools, it won't work (speaking from personal experience. The latest was with "Understand C++" from SciTools, a great tool BTW). Master grep.

4. Drawing flowcharts, running under debugger, etc -- all that would be great but nobody allots time for that anymore (neither yourself nor your managers).

5. Don't hope for usable doc for external libraries. Exceptions exist, but it won't be a rule.


1 always gets me.

My brain will tune out whatever piece of code I've already read once. So if I initially skim some piece of code, figuring out what it -really- does is HARD.

Every subsequent look at that piece of code will be colored by my initial (mis)understanding of how it works


Try reading it backwards. It forces your brain to look at the be text differently.

This trick is also useful proofreading documents you've read many times already.


Same here, when I am looking for something and I know the area I will read the method or whatever it is rather quickly and be like "oh okay, this is what it does so the problem is not here".

Then after some time when I look at it again, I notice that actually I missed a step in that method which actually did contain the problem, which I did not see by skimming over the piece of code earlier.


3.) -> While newer grep versions and SSDs helped with performance, using ag or any of it's many implementations tends to get results faster, especially on large code bases.

4.) I only had to resort to flow charts a couple times so far, mostly with Java libraries to externalize the data and control flow, because it grew just waaayy to large to keep in my head. A particular vivid memory I have here is netty (which also has a tendency for a particularly convoluted and complex control flow realized through modifying pipelines conditionally in many places).


> using ag or any of it's many implementations tends to get results faster, especially on large code bases.

Especially because ag tends to be smart about what it will ignore (e.g., the contents of node_modules, etc.)


ag is also multi-threaded, while recursive GNU grep doesn't do that (you'd need find+xargs workaround).


I like ripgrep, it's instantaneous on our large (>100k loc) codebase


I'm interested in your take on #4: why wouldn't you allot time to run code in a debugger? Personally I would like all my code to be written against a debugger and with test suites allowing me to step through it.

I've been chided for not doing this in the past (I just read the code usually) and would actually agree with that person in retrospect.


Because, in many cases, getting it into the debugger has a huge fixed cost -- imagine your code should run on an embedded platform: find the right board, get it connected it into the infrastructure (getting right signals in and out), get the debug build of the image etc -- you see the point.

More generally, studying the code under debugger means you need particular input data -- the one suitable for the part of the code you study. That can take time to get.


You've made some really good points about reading code in both of your posts. I'm fairly new to it, but I've started a daily habit of reading code everyday and I've realized this is an important skill I've omitted in the past.

Since most of the code I read (C# code) can be ran on my platform, I'll sometimes run into that rare piece of code that I can't believe it does what it says it does, in those cases I'll download the repo and debug it and many times be surprised to find there was a gap in my understanding. Reading code helps me find knowledge gaps I didn't even know were there.


Ah I see. This is actually a very good point. One team I was on would constantly get hung up on this sort of issue. I believe a more general ability to just run a code review and study session would've saved lots of time.


But... if you don't test the code with actual reasonable data, how do you know it is correct?

Seems weird to say a team was always hung up on ensuring the correctness of their code. I guess it depends on what you work on to an extent.


Well, if the data you're ingesting/parsing/etc is anywhere near well formed or having a known structural definition, you should be able to mock the code handling it reasonably well, even up to the point of covering many weird edge cases. And even with perfect data up front, you would still encounter bugs in your production code at some point.

In fact, isn't this usually the case with any application not yet in production?

As an example, I had a teammate who would get blocked on needing JSON that matched a production set of data. Yet, she knew what the various fields of the data would most likely contain. She ended up writing the code anyway without the "production" data she asked for and it worked great minus a few small changes. It was ultimately just psychological block, IMO.


For me, if I have to go through "layers of abstraction hell" to get to the point I want the debugger to be in, I'm better off with pencil and paper. This is especial true with enterprise software.


The new c2 browsing experience makes me physically angry. My skin temperature goes up, and my face makes an automatic scowl.

I don't know how they turned the most basic HTML page into something that takes 30 seconds of spinner to load the same thing.

I counted 20 seconds to load http://wiki.c2.com/?SoftwareMasterpiece


For a moment I thought this was another of the "meh, it uses JS, I want to surf with my Lynx!!!!" rants but wow that page is really slow. I only counted 9 secs for the original page and 11 secs for your link, still ... for a bit of text? Not even an image? What are they doing? setTimeout before they show the HTML?

Edit: opening the same link again is instant. Weird.


Ironically, this page looks like it doesn't even need JS. It's comical how long it took to load.


Ward switched to using his project, Smallest Federated Wiki, which basically doesn't do a thing without JS.

In fact, if I remember correctly, the slow load times are because it loads the entire wiki tree in the background to make linking easier for the engine... But a worse experience for the user.


Loads instantly for me. Might have something to do with your network or service provider.


Spinner loads instantly for me. Everything else takes ~10 seconds.

I'm on a 250 megabit connection. I haven't seen a page take this long to load in months.

Once a particular page is visited, it does load quickly. But for new ones it's always the same experience. Unless they're buckling under load of some kind and this provides a fancy cache, it's pretty ridiculous.

---

edit: decided to watch the network on a new page, http://wiki.c2.com/?LiterateProgramming for instance. Most everything downloaded in 10s of milliseconds, but the LiterateProgramming XHR request sat waiting for 6.5 seconds. Maybe they are struggling for some reason?


You're right. Took a while for me on that one but second time around loaded instantly.


Hi, I'm working on a static renderer for the c2 source json, AMA.

I'll be hosting a mirror (I think the license allows for that but I'll have to check) and then integrating it with the one hosted on the original domain.

I also kind of want yo set up some kind of edge caching static rendering with loose constraints on how up-to-date pages need to be (because the point is providing a read-only Lynx/CURL experience), but I think this might end up being too much work technically or politically.


Thanks for the great work keeping c2 online! It really is a treasure trove of experience from the old ones.


I'm more of an outside contributor motivated by my strong dislike of the new side, but sure :)


Those pages are loading instantly for me at present, but even if they were slow, your comment is far too uncharitable to make for a good HN post. Physically angry? Come on—it's a single programmer's project from decades ago. He's probably tackling the problem of upgrading it by himself. The last thing he needs (or we, for that matter) is an angry chorus of internet entitlement.

If you're hot under the collar, please cool down before commenting here.


On my browser (up-to-date Firefox with uBlock Origin), I can load pages from links from other sites (about 5 seconds to load SoftwareMasterpiece), but trying to follow links within the site gives me nothing but a spinner, no matter how long I wait.

I don't feel this is what Wiki was meant to be.


The page is not loading... even the cached version from http://webcache.googleusercontent.com/ is not loading!

I am really frustrated with the new c2.com pages :P


It really seems they time jumped from 1995 to 2016 but made it even worse.


I gave up after 60 seconds on LTE.


I'm impressed at how the author of this website managed to take plain text content and literally make it slower to load than a 1080p 60FPS video. You really have to try to do something like that, bravo. I normally try to avoid commenting on stuff like this, but that's pretty much all I can say considering it took me actual minutes to load (from Japan).

In lieu of the article's content, I'll spare readers the trouble and provide something better (and faster to load!):

-Read lots of code

There you go, you're well on your way to becoming a code-reading master!


>how the author of this website managed to take plain text content and literally make it slower to load

I didn't comprehensively debug the whole page but in a cursory 2 minute trace in Chrome Dev Tools, I found javascript that makes an XHR request for a 265k text file[1] with 36000 lines. The javascript code then parses the names (which takes ~300ms) and then seemingly does nothing with the information. It doesn't show up anywhere on any subsequent DOM rewrites. The actual visible content that people read is only 10k.

For mobile users, you punish them twice by (1) subtracting 265k from their data plan and then (2) wasting battery power on cpu to parse it for no reason. That one example of useless bloat is probably multiplied dozens of times elsewhere in the lifecycle of the webpage.

[1] don't click on this if you're on mobile phone: http://wiki.c2.com/names.txt


It doesn't seem to be the JS, though. The request that takes a whoppin' 33 seconds - server-side - for me is http://c2.com/wiki/remodel/pages/TipsForReadingCode


It seems like a list of all the pages on the wiki, presumably to turn mentions of those pages in the text into links automatically. Still, it seems better to do this on the server, if possible.


The irony is that they also host an article about Over Engineering:

http://wiki.c2.com/?OverEngineering

I didn't wait for it to finish loading and actually read it though.


Well I finally loaded it. It's a piece about how the support code (argument parsing, error handling) around the code that "actually does anything" has more lines than the "actually does anything" code.


Wow, you mean just like I spend more time reading specs, quoting, researching, schmoozing with clients etc. than "actually doing work"?


He didn't mean the tasks around programming, he meant the bits of code around the main problem domain in your codebase.


I must admit, this was my first thought too. "Oh look, c2 wiki is back up! Huh, a loading spinner, it must be doing something fancy... oh look, it looks EXACTLY THE SAME AS BEFORE."

The general content seems on track, if obvious ('find the high level logic', 'draw flowcharts if it's complicated', 'grep for whatever you're looking for') but some comments are out of date ("print the code because there's more code space on your desk" no longer applies, I literally have more square centimeters of monitor than of desk space and that monitor area can scroll!) and some are flat-out wrong ("software doesn't always have an entry point"? Really?)


the sad thing is The Wiki (this is the original one) used to be a simple text thingy, with instant load.

I have no idea what happened, but I presume the new version just doesn't handle the HN effect well enough.


> doesn't handle the HN effect well enough.

Not at all.

c2 now uses Federated Wiki, which for engineering reasons (I think), loads the entire wiki tree as a key part of the json that serves as the content for every page.


Looks like new maintainers took control and decided the old codebase wasn't to their liking.

Thing is, these changes come with loads of tradeoffs the newcomers usually are completely oblivious to and the new product is of lesser quality than the original, but they feel good about themselves now :)


I keep JavaScript if on mobile and around 95% of sites still work. This one not so much. If your site is just text please dispense with all the JavaScript.


It was taking me a long time as well (Canada), so I just quit. I thought it was noscript but I allowed the page.



"Remodeled" is an understatement. It was destroyed. Luckily we have the archive...


I think he needs some time to improve it. More explanations here:

https://github.com/WardCunningham/remodeling


Given the state and performance of the site the tag line "The original wiki rewritten as a single page application" is practically satire: It's a SPA now, of course performance was degraded by a factor 100+.


I thought this was a fake article because of the spinning gif.


The page doesn't load.




Consider applying for YC's Winter 2026 batch! Applications are open till Nov 10

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: