> If you don’t review, check, and merge docs the same way your org reviews, checks, and merges code, you’re not doing docs-as-code — you’re doing docs-as-bore.
While some WYSIWYG cloud-based docs platforms make it easier to make changes, that's not necessarily what you want. Docs are a critical component of how your users perceive your product - you want to have checks that it meets certain quality and accuracy standards. Just like your code.
And if you're an engineering lead company, you probably want your docs updates to be coordinated with your product releases. Git is just the logical place to put your docs in that case.
I've even created a company specifically to help with this workflow: https://www.doctave.com
Also, lots of comments here seem to be thinking of docstrings and other in-code documentation. I think that's really a different category that has a different set of goals and issues. This post is specifically about customer-facing documentation.
> you probably want your docs updates to be coordinated with your product releases
People keep saying that but I can't help to wonder if they're every on the receiving end of documentation.
Documentation isn't code. It may all be text, and share other similarities, but it's something fundamentally different. You can't substitute code for docs, or docs for code. The above reasoning is what ends up publishing /docs/5.3/chapter1/installing.html. Reading documentation for a specific version of the product is not desirable.
To understand the product, it's very relevant what happens in version 5.4, and I should not have to diff different releases of the docs to find out. The documentation should say clearly that this function will be deprecated, or look different in the next version. In short, documentation should neither be branched with code, nor released with it. It should live in parallel and describe not only how the product works, but more importantly why it works like it does.
Your product looks really interesting. I'm a big advocate for docs-as-code at my company. We use Confluence generally, but our team uses GitLab and Gatsby for our documentation/blogs as we really value the Merge Request workflow.
Our biggest challenge is local development. A WYSIWYG is just so useful in that regard. Is that what Doctave Studio is trying to solve?
You're correct - Doctave Studio is for making local development easier. It packages the whole "authoring environment", so it's all you need to start writing.
It's technically not WYSIWYG, but you do get a side-by-side real-time preview of your rendered Markdown content and OpenAPI specs that update as you type.
You get autocomplete, broken links checking, etc. Everything you'd expect from an editor.
Having worked both as a developer and as a technical communicator (for software), I'm thinking that low friction for developers is paramount, and that therefore the way to get developers to put some effort into it is to have documents both (a) written in Markdown (or adoc or rST or typst) and (b) co-located with code under version control. Change the code, change the docs, no screwing around, BUT a quick & simple brain dump can suffice because of [see next paragraph].
To whit: I have yet to hear of a documentation system that provides fully bidirectional updates between such documents-as-code and edits made further downstream along the documentation production pipeline. That is to say, when a TC person or a reviewer makes edits to content, these changes should propagate back to the docs-as-code material.
Then everyone benefits, including senior devs whose scarce time is optimised by having TC people expand and polish their hasty scribblings, and junior devs who have well-maintained documentation at-hand in-place.
Documentation near code is a good idea, but unfortunately it covers only part of the problem.
There is a big portion of documentation that should be available to "other persons", such as architects or project managers, who may not want to visit the codebase.
Another challenge is that for these people, diagrams are typically more useful than text. This still requires some manual effort which is difficult to achieve with Markdown.
I recently started keeping docs in Markdown but with a CD job to render them and publish to Confluence. We only have internal users but many of them are non-technical. This seems to be a sweet spot -- easy for us to update, but also easy for users to find.
It also opens up the possibility of generating the docs themselves from any data that are available at build time. For example, we have a page with a pretty complex Graphviz visualization that gets built from some of the data files we ship with the product. When the data changes, so do these diagrams. They literally can't get out of sync without us noticing (the build breaks). I see more opportunities for this kind of thing all the time now that I'm looking.
I don’t know if it’s built in or a plugin but there is a way to embed a markdown document in version control into a confluence page. I don’t recall if it was any markdown page or had to be in bitbucket.
I used a simple intro and then embedded the docs from version control for a couple chunks of our architecture. Helped a lot.
If you replace the domain of the raw GitHub file with pointillism.io, it will fetch and render right out of your repo. This removes the CICD requirements and ensures currency.
Agreed with your point of accessibility to people who don't live in the code.
But that doesn't preculude documentation near code nor Markdown. It just means that you need a CI job to publish the doc stored in the git repo as (for example) HTML or PDF.
For diagrams stuff like PlantUML are great, edit as text, publish as images.
Can confirm. We are using Sphinx(-Needs actually) and I can send deeplinks to develop, release or feature branch rendered docs to everyone, including non technical staff. You just have to know the "latest of develop" or release links always work, the branch ones are for quick and ephemeral communication only. We include PlantUML and drawio as source along with the code and PR reviews check that code and doc update in sync.
Personally, I think they should get over it. Git is not a "software dev" program. Version control is crucial to pretty much everyone who touches the product. I think therefore everyone should know how to navigate their codebase and use Git, at least a little.
Also, you can use git submodules for your documentation so it's kind of separate from code. The only problem is submodules kind of suck ass, so I don't know if devs would be keen for that.
Regarding para 2, I am assuming that there is something like a CMS in place that can pluck docfiles from version control and massage them and insert them into an outline/ToC. (And then propagate changes back to version control.)
Regarding para 3, there's now many GUI tools for working with Mermaid et al. But are any of them properly integrated into documentation systems ?
Architects should definitely be visiting the codebase. Why would project managers be updating docs? I'd have thought stuff like docs translations done by translators would be a better candidate for non-code editing.
the second you put docs into version control next to code it's no longer low friction enough. Suddenly you have peer review for docs change is far too much work.
In all my 26 years of working as a software dev, in the ~20 companies that I've worked or consulted for in various countries, never once did I see code documentation work out well.
It wasn't that the will was not there: in almost every single of the companies the devs and management agreed that good documentation was important. And every now and then some heroic effort would be made to finally clean up Confluence or whatever system was used at the time. And for a little while that would work, then slowly documentation would become neglected here and there, and in the end it would become so bad that nobody would use it anymore. Except perhaps to showcase your commitment to best dev practices to your manager when it was time for your performance review.
It doesn't matter if documentation has to be added in the code, or to some external system or both. Sooner or later it will be neglected, become out of sync and eventually become worse than no documentation at all.
With the exception of external libraries or API's, software simply changes too fast. You'd need an army of technical writers to keep all documentation in sync.
And unlike what the author writes, I have actually been working for a couple of companies that gave up and simply used the code as documentation, and surprisingly this worked out pretty well.
Sad to read your n=20 was like that. There are also cases where documentation led companies to growth—one example is Stripe.
Poking the finger at documentation is easy because it's visible and readily available. Would you say the bit rot phenomenon you described never happened to code, or processes, or UIs? Docs reflect organizations.
> There are also cases where documentation led companies to growth—one example is Stripe.
when you said 'there are also' i thought you were talking about the same thing as the comment you were responding to. Not sure how existence of something totally unrelated to original comment is relevant.
If people aren't reading your documentation it means they don't trust your documentation.
The solution is to build that trust, by building a culture of active documentation maintenance.
My favorite trick for that is to keep the docs in the same repo as the code and actively enforce relevant documentation updates as part of the code review process.
Once developers learn that new code cannot land on main without accompany documentation updates they learn to trust that documentation pretty fast.
> My favorite trick is to [...] actively enforce [...]
If active enforcement were an option, and it led to success, I don't imagine there would be a problem.
My favorite trick is to tell people really nicely they should write comments (preferrably doc comments on internally exported interfaces).
Yet, they don't. And so we're back to the dilemma:
If a significant percentage of competent programmers choose to not comment their code, or update the comments that are in the code, the comments will lie over time. And so it is probably better to not include them.
We haven't even reached the point in the conversation where we ask "But why is it you, dear competent programmers with decades of experience, think we should continue to minimize the amount of comments with good conscience?" or ask "Why is it, dear project managers, that we tolerate a total lack of communication through any other means than executable code?" And the answer is probably: Because a lot of autists produce a lot of valuable code, and they don't like to talk if they can avoid it, and we value their work and relay what they made to the wordy-word people.
> If people aren't reading your documentation it means they don't trust your documentation.
I don't trust your documentation unless there's alignment among developers. Alignment is a luxury you don't get in legacy shops.
Many of us "autists" may be of the opinion that the code IS the documentation and describing that in ambiguous natural language for some unspecified audience is just not very useful (and can be actively harmful). Use the source luke.
The documentation discussion tends to take for granted that documentation is automatically valuable, and more documentation there is, the better. I find well named and structured code and actually executable standalone examples are far more valuable for understanding a codebase than some by-rote restating of what the code already says.
If people don't add documentation to their PR, I add that documentation to the PR for them (under my own name as a separate commit), then land the code.
Do that often enough and people get the idea that documentation isn't optional.
Executable tests in Rustdoc are amazing. For those not involved, they are run when you write `cargo test` and they are included as markdown code blocks in your crate’s documentation.
They’re not an excellent place for extensive testing. But they are super useful for making sure your documentation examples are updated and functional.
#![deny(missing_docs)] is also a great way to ensure you don’t forget to document things.
Doctests are indeed a great format for documentation. They are automatically checked for currentness/correctness, aren't ambiguous like natural language and are close to the implementation.
But I have the feeling that doctests aren't seen as documentation the way that is often desired in the documentation discussions.
I read the article as referring to documentation for end users -- not internal documentation / comments written by developers who thought they knew what the code they wrote was doing.
It is inevitable because some competent programmers deliberately don’t comment their code, read comments, or delete other people’s comments when they are stale.
I personally read and write a lot. I track things in git messages by cross-referencing issues, and I comment my code.
But my point is: when you have a cultural divide among competent programmers on whether to comment, not commenting game-theoretically wins, because the outcome where you have comments that get updated some percent of the time is worse than not having those comments.
Instead, embed what you want to say in a comment in the code itself, or in a test.
Documents your libraries and APIs if they are used by people outside your team.
I tried to do it seriously a number of times. Perhaps I am doing it wrong, but my productivity drops like a stone.
Writing down your thoughts as you go and maintaining them takes serious time. The romantic notion of writing (and reading) code like a book is appealing, but writing books is hard and arduous, it cannot be underestimated as a craft on its own right, and there is coding to be done.
There is also the question of structuring the literate code. Telling a story of how you are building it or explaining how it works has a very different flow and order to how code is usually structured for good maintainability.
Please correct me if I’m wrong, because I would love to dive into it, but I don’t think there has ever been any major piece of software developed following literate programming (at least as Knuth envisioned it). I also don’t think there is any significant book that contains a sizeable working program embedded in it throughout, that can be compiled and executed as-is.
In practice Knuth was most concerned with embedding short code snippets in his papers and books. Having the whole thing be an actual compilable program was secondary, and it was mostly short academic proof-of-concept prototypes and algorithms.
Don’t get me wrong, I love the concept, that’s why I have given it multiple serious tries over the years, likely I will again, and why I think I have some insight of what happens when you use it for “real-world” work.
NoWeb can support multi-thousand page documents which can compile to tens of thousands of lines of code.
I used it at a deep tech startup I worked a number of years ago to document the theory behind why the code was doing what it was. Doubly useful since I could just use a regular bibtex citation system for papers which had done some part of what we were trying to do.
My code became the defacto technical onboarding document, still in use today, despite the fact that none of the code in it has been updated since I left.
Thanks for the insight. Just to be clear, writing documentation after the fact with lots of code snippets is obviously good and is standard practice.
You can take the next step and ensure that the entirety of the code is in your documentation as snippets. This usually doesn’t make much sense, there’s lots of code that it is not worth explaining in a literate style. And what’s the point of the documentation containing the whole program if it is written after the fact and you already have a standard more maintainable codebase as the ground truth? The fact that your literate code didn’t get touched says a lot.
To me the name Literate Programming implies that you write the code in a literate style from the outset. If you make it literate after the fact it is just normal documentation with snippets isn’t it?
>The fact that your literate code didn’t get touched says a lot.
The fact it's still used 5 years later without any of the code still being in production says more.
>To me the name Literate Programming implies that you write the code in a literate style from the outset.
This seems like a fundamental misunderstanding of what it means to write. You should perhaps look at how people who do it for a living write books or articles. The final document has little to do with what you spend most your time editing.
I absolutely hacked on the tangled source code of the program when trying to fix bugs or extend capabilities. Once I knew what I wanted to do I put it back in the literate program, usually finding a lot more bugs in the process.
> I absolutely hacked on the tangled source code of the program when trying to fix bugs or extend capabilities. Once I knew what I wanted to do I put it back in the literate program
Does NoWeb automate this "untangling" process in any way? I sometimes use Weave.jl [1] when I'm thinking out loud through code, and at times it would be nice to just work on the tangled code, refactor and reorganize things, and have it all untangle back into the original in some way. I have no idea how that would work though, and it would likely be pretty limited even if it existed, but I'm curious what the usual approach you take to this is.
No, in noweb programs you insert chunks of code in multiple places and have conflicts when you try and automatically merge the code back too often.
Org mode has a function which does this, but they didn't allow for arbitrary chunk nesting the last time I looked.
Emacs has a number of very useful features in the modes for noweb/tex, one of which is jumping to the chunk which the code came from in the tangled source code on the pretty printed PDF. This follows the spirit of what you want. In fact SyncTeX support comes pretty much out of the box for noweb files and makes their editing a breeze, either as text or code.
Of course if you're not on Emacs than god help you.
Would you please help me understand your workflow for
> jumping to the chunk which the code came from in the tangled source code on the pretty printed PDF
Do the codeblocks in your pdfs contain hyperlinks back to the org file where they came from?
No I'm using noweb. There is an option in noweb to add comments in the tangled code with the line and file from which they originate. Then there's an Emacs mode that let's you jump to that code. I wrote a little function that let you instead jump to the line in the same like in pdf using SyncTeX.
Perhaps I came across as too critical, I have a lot of respect for what you did and for the craft of technical writing. And I definitely understand that writing is not done linearly and is very iterative.
Correct me if I'm wrong, but it sounds like you were documenting existing code and that the result was very valuable as documentation, but not necessarily as code. You were acting as a technical writer not a programmer, it's a bit of a disconnect to call it Literate Programming, even if you were using Literate Programming tooling (NoWeb).
This kind of documentation is common practice all over industry and it is valued, but I don't think Literate Programming is considered to be widely adopted because of that.
I'm having a hard time even understanding what the question is here.
You seem to be confusing the tools with the work being done.
You can write a prototype of a C function in Python to see if you understand the requirements before you commit to the much harder task of writing it in C. That doesn't mean you're not writing a C program.
The same is true for literate programming. I can write code outside the main literate program when I'm not sure it's meant to do before I put it back in.
What I'm saying is simply that as you describe it, you are first writing the code normally, and then separately writing some documentation about it accompanied by code snippets for context.
But if that's Literate Programming then everyone is doing it and it's not a very meaningful label, it's just documentation.
I do get it, the distinction is that you are using NoWeb and you can convert between the documentation and the code, and that the documentation contains the entirety of the code. I suppose that's neat.
At some point, this boils down to a pointless discussion of semantics (my fault). "Literate Programming" as you describe it does not sound like a style of "Programming". Actually, when you reverse the Programming/Writing emphasis, it simply becomes "Technical Writing", which is what everyone does, because that's what's actually needed. And it is done by great writers rather than great programmers (which may describe Knuth, with the upmost respect).
I always interpreted it as writing the text and the code together, logging your thought process, thinking of code like a piece of literature as it is written, rather than adding some documentation to it later. The notion that writing it like this will yield better code, regardless of its value as documentation. I suppose that's why it was unproductive for me, it is a rather romantic interpretation (again, my fault).
I don't know how you code, but the first draft of code is never what ends up in the code base. Neither is the first draft of the documentation. You can write both together, but until you have an idea of what the structure of the code would look like, and how to split it up then you're better off doing multiple drafts.
As always code is read much more often than it is written and literate programming is used for the reading part, not the writing part. The efficiency comes in not having to guess what 0x5F3759DF is there for.
I have skimmed the TeX literate PDF (I did a number of times in the past too). Frankly isn’t it just like normal code with verbose comments? I have seen lots of code like that and it is not referred to as literate. The only difference is that this is a PDF, which makes it less practical and it is still not particularly readable as a book.
It might have great book-like typography but not the "narrative" structure that helps you properly understand how it works without getting bogged down in details first. There's no coherent outline, no chapters or sections for major systems or design decisions, no overarching overviews, no relating different parts and giving context. There's also no story of how it was built or a log of his thoughts throughout problem-solving process, that would have been another good angle. Instead it's just the code from top to bottom with embedded very local commentary. The code itself is actually rather hard to parse visually by modern typographic standards.
The issue is that probably I am misinterpreting what Knuth intended. The Literate Programming concept was a product of its time, and it has evolved into more practical modern documentation standards that are not so tightly linked to the code and don't exhaustively cover every line. The only problematic thing about it might just be the grandiose name Literate Programming, without that it's mainly good common-sense advice for quality documentation, but not necessarily a practical programming paradigm like the name implies.
Again, I'm having a hard time understanding what the issue is. It seems like you are deeply confused about what literate programming is and how it works.
All of the navigation issues are taken care of by using <<chunks of code>> in a nested structure. You follow the numbers in those, like a follow your own adventure game, to find out whatever you need.
The index has a listing of everything used in the program along with where it was defined and where it was used in case you want to find something specific.
More modern tools, like NoWeb, turn all of this into hyperlinks so you can jump around the pretty printed version without having to loop up page numbers.
I have read the paper in the past, I am well versed about WEB, and I believe I have done literate programming at length for a number of non-trivial projects.
I have explained my thinking in a separate comment (apologies for creating two branches). In short, I do think you are right and that I had an overly romantic notion of Literate Programming in mind.
Literate programming, as originally described by Knuth, is a good essential idea embodied as a bunch of accidental instantiations of the idea that have gone badly out of date. Knuth's ideas at the time add a layer on top of programming languages to allow you to rearrange the code in a lot of ways that the languages at the time didn't support well or at all. It essentially adds an independent concept of "function", and adds on top of any ability the language had to have documentation its own documentation overlay on top.
Problem is, in the meantime, languages got a lot better at functions, got more flexible in their organization, built in better capabilities for documentation and comments, and it all goes a different direction than the languages did. The result in the modern era is a rather bizarre multi-headed hydra of conflicting ideas about how things should be documented and tested.
If someone wanted to resurrect the ideas, they need to not just try to get people to do what Knuth laid out decades ago super harder... they really need to sit down from the very beginning and work out how to update it in the modern era to be less redundant to what we already have. It could be as simple as taking modern doc strings and upgrading them a bit to allow highly-formatted comments to be embedded into code. Or, instead of trying to "weave" the code into a static book, allow the user to specify an entry point and then follow through everything that happens in the functions that it calls and turn that into a book, e.g., say "I'm going to enter this web framework through this path, tell me everything that happens". Or some other idea I don't have yet. Something that harmonizes with modern languages instead of fighting them.
which uses a LaTeX package for this which I put together with a bit of help from tex.stackexchange --- the big advantage to it is that it allows editing the documentation/code with "normal" syntax highlighting, the disadvantage is that the .sty file has to be edited/updated to match the files which are being output and I still don't have a good setup for the readme.md
I find having the typeset PDF w/ its hyperlinked ToC and marginalia and indices helps a lot in having a "nice" version which I can look through to remind myself of what was intended at a given point, and most importantly, to find _where_ that was written down. Working on a re-write now --- we'll see if this holds up for that.
Awesome links, thank you. I did come across "Physically Based Rendering" at some point, I forgot about it. This is definitely an excellent example of Literate Programming.
Knuth is not the average programmer by far. And I am not talking about coding skills. Knuth is a writer at heart. He was also from a time where writing code on paper was the norm. Literate programming is good for Knuth, but maybe not for most coders today, who grew up on fast computers and IDEs.
my favourite documentation is minimal running code examples. give me example inputs to get the job done - i.e. not just "inputs: x is a y", but actually create a minimal version of y and show it going into f(x) and coming back out as a genuine object (as opposed to a mock) that I can inspect/prod until I understand what's happening.
Depends how you write code. When I use Semantic Kernel, my KernelFunctions include a well-defined documentation for the inputs and outputs, then using the System Prompt you can provide the concepts and glue between the various plugins. It is the function specification as a whole. Precision is important, although GPT is not yet perfect -- perhaps in another year or two it will be.
And, again, issue threads are only helpful if the people corresponding in them are good communicators who put in the work to leave a legible record of what happened.
Many Github issue threads that I read are incomprehensible. The thread is not where the story is (it's in the memories of the people involved). Explaining is work. This is why they make kids write reports as assignments in school.
Typically this is what product managers and stuff are for. They stick around pretty much just to organize the Jira and make sure everything is fleshed-out. I think if your git frontend links commits to Jira (ours does) AND you have pretty strict requirements for your Jira items it can work out.
It seems like the “why” is eternally lost on us in this scenario.
If I could wave a magic wand and make all the colleagues verbalise what they’re doing, I would.
In the meantime, I’ll take the downvotes for pointing out that it’s fundamentally caused by a cultural gap, not just “those who don’t get it yet, and those who do.”
Not at any level? My imagination is reeling—I’m thinking of Cold War era Spy Novels describing siloed groups who don’t know what any other group is doing. Each is laser focused on their own tasks and everything is a tactical choice. It’s the great reveal in the Movie adaptation when the separate threads come together and the grand national strategy is revealed. And now I’m also imagining the comedy versions of this genre, where the reveal has no purpose.
At some level there is _structure_, and it can be communicated. If for no other reason than to validate the evident structure in code.
> “…sitting at the same table as the almighty coding knights? […] Remove documentation…and your products cease to exist, their inner workings left to the imagination of…”
The author finished the sentence with “users”, but who are they talking about? Those coding knights and their imagination which sort of parallels my Spy genre example.
That said, the article seems a bit naive about the depth and breadth—reads like Ra-Ra.
Take my opinion with a grain of salt. Today I work for a small private firm, but years ago I worked in a corporate job (with tech writers). I drank the Kool-Aid. I believed the work I was doing was important and critical because someone up the vertical decided it was and they hired someone who hired someone who hired me to do a job--its corporate world.
The tell for preaching to the choir, what triggered my response, are the hyperbolic strong statements ("products cease to exist", "your business don’t crumble overnight", "failures are dramatic") which sound defensive. The rest of the blog post reads as defining what exists and setting standards, and further explanation of why it's all important. This is the breadth and depth. What I'm not reading are on/off-ramps and shortcuts that all of the developers who disagree about the mission critical position of documentation would regard as sensible accommodations.
Today I work in a small firm and we need documentation. What I learned from underpaying small business is documentation is important and has a role (otherwise our small firm wouldn't waste our time on it). But if I took the position argued here, I would be told I was wasting time, that I should focus, get it done, and move on my job tasks. If you're working closely with writers, then you need to believe what they're doing is important without being told. If you don't believe this, then maybe your role does not need documentation (yet? or your group is small enough and people like the job security or you're just overworked, or something else). When I'm working collaboratively and I hear people tell me what I'm doing is not important, I have to believe there is some truth to it. A limit.
You can't context-out business requirements, IMO. Also code can have bugs. You need more formal requirements definition so you can compare behavior to requirements so you can find out bugs. Otherwise, how do you find the bugs?
I prefer when developers and project managers create massive google docs for specs and descriptions. Double points if you share the document with only a handful of favorite employees. Also, ignore all requests to get permission to this document. Eye roll in all meetings if someone hasn't read this document. You can get to god mode if you hide comments around the doc.
Doing docs-as-code by putting documentation in the codebase is somewhat of an oversimplification.
Usually, you want to use feature flags to decouple changes in Product from Engineering-driven deployments, so that new feature work can be rolled out slowly and its impact on KPIs measured, and so that engineering can de-risk the delivery of large changes by putting those changes behind feature flags with 0% rollout.
The problem is, documentation is much closer to Product than Engineering, meaning, doing docs-as-code correctly means putting your documentation behind the same feature flags. So you should have both a public static site that shows the documentation for features whose flags have 100% rollout, plus your application itself should also show the documentation, so that you can show the documentation to users for feature flags enabled for those users.
There aren't really any frameworks that have this docs-coupled-to-feature-flags pattern available off-the-shelf and most companies don't consider it high-enough priority to build in-house.
No, the closest is staging documentation changes until the features reach 100% rollout, then manually pressing a button to try to launch the docs at the same time. But when users don't find docs for new features (that are rolled out only to them), they naturally revert back to trying to talk directly to Engineering, which doesn't scale.
An example of this mismatch is when QA isn't testing documentation changes: best-practice QA discovery-testing is for QA to have a production tenant, where the feature flag is enabled only for the QA tenant, but if QA doesn't have documentation for how the feature is supposed to work, then they end up having completely separate infrastructure that nobody else pays attention to, or else QA is talking directly to Product/Engineering, which somewhat defeats the purpose since that's something that customers can't do.
I was thinking this might have been a treatise on prompt engineering, or something like Rational Rose.
I was actually pleasantly disappointed. I am glad to see documentation treated as a "first class" engineering Discipline. I would say the same goes to creative authors, such as graphic designers, or 3D modelers.
I'm fairly big on good documentation[0], myself, and feel its absence, whenever I look at most codebases, these days.
> Documentation is vital software infrastructure
Definitely agree. I am feeling the decline in Apple's documentation (I'm an old-time Apple app developer, and remember Inside Macintosh). It's gotten absolutely awful, these days. I'm just starting on a SwiftUI app (I think I may have finally found a good application for it), and, boy howdy, is that documentation ... less than ideal.
Fortunately, someone else thinks similarly, and did something about it[1]. I hate to rely on these types of things, because they often lose sync with the subject, but it is totally necessary.
This article is awesome and resonates with my experience! I used to work for one of the big cloud vendors. We frequently had doc writers in our sync calls. Quite frankly, they never had a chance! Much of our documentation, our errors, status codes etc. was burried in the code and we (developers) were simply to busy to work effectively with technical writers.
Don't get me wrong here. Where I say "busy" I don't mean their work wasn't important (it was!) but we had our own projects and jumping on a call to improve errors messages is not too exciting for most of us - and it's not the stuff that drives promotions. As a develper it's simply to easy to gatekeep code.
Now follows a shameless plug: I'd be so excited to hear your opinion about a tool I'm building. It's called https://api-fiddle.com. Conceptually, it should help developers work API-first instead of code-first. It's great for devs because it makes the work faster and safer. BUT, an added benefit is that API docs are no longer burried in code and technical writers have a nice interface to contribute to them (and hold devs accountable!).
So, I feel like "Docs-as-Code" has some context I'm missing, so I'm going to comment on docs in general.
I think there multiple kinds of docs for software.
* Comments explaining a specific section of code.
* API docs describing functions/classes/etc.
* Docs on how to use a library/class/etc. Usually including simple, isolated, examples.
* Tutorials on how to create simplified applications using the developed tools.
* Docs on how to deploy, configure, and maintain an application.
* Docs on how to use an application.
* Docs on how to troubleshoot an application.
* Docs on how to integrate applications.
* And likely others I'm missing.
Personally, I've been seriously frustrated by how bad most of the open source (haven't done much with proprietary code) documentation is. Case in point is Drupal and Symfony. Trying to use api.drupal.org is not fun, and Symfony's docs always cover the basics, and then there's nothing on pulling everything together into something complicated. So you try to dig into the actual code, and end up finding multiple layers of uncommented abstractions. Yes, I can eventually figure out what is going on if I put the effort in, but that's a lot of time that could be save by a few lines of comments.
I usually end up asking JetBrains AI about what I need, then use what it says after I fix the errors it makes... It's also very good at summarizing everything I'd find if I used a normal search. But that all only works if others have already asked and answered my questions.
Some things I've been trying to do to improve my own code's documentation:
* Unless the line is super obvious, even if I think it is obvious, I try to leave a comment. Yes, it seems pointless, but I have gone back to old code I remember being obvious without said comment enough times that I think it is worth it.
* Avoiding "elegance" in favor of "explicitness". For example, I use full `if` statements instead of ternary operators even when ternary operators would look better. For whatever reason the syntax of ternary operators has never sunk in for me, and the explicitness of `if` is much easier to parse. I also use very descriptive function and variable names. Basically, if I have to think about what something means, I try to change it so I don't have to.
* Split out functions into smaller functions as much as I can't. This means I can use descriptive function names. And I'm pretty sure it's just good practice.
I also have been trying to figure out ways to keep higher level docs closer to my code. I have some ideas, but haven't tried them yet. Has anyone ever written something that detects changes to a method/function, and then when you save your file it pops up asking if related docs need updating? Maybe add comments to the method pointing to where related docs live, and then your IDE/tool uses that to know what docs need updating?
"Has anyone ever written something that detects changes to a method/function, and then when you save your file it pops up asking if related docs need updating?"
I've got a partial solution to that: I have automated tests that introspect my code for things that need documentation and then fail if those items aren't at least mentioned in the docs. Works really well.
That's a good way to do it. I was actually thinking of a Git hook or something in the ci pipeline as a place to start. So reading about how you implemented it was helpful. Thanks for sharing!
This feels like a job ripe for startup disruption.
LLM documentation generators tend to benefit from context, and nothing provides better context than a mostly functional code base. The best part is the code doesn’t even need to compile for the LLM to build the context needed.
Problem I'm having as a developer with LLM documentation is their reliability, or rather lack of it. Every time there is an assertation I end up having to double-confirm it because they tend to be wrong as often as they're right. Reading imaginary hallucinated documentation is just about as useful as zero documentation.
While I could keep doing this for the rest of my life, my employer doesen't really appreciate the extra expense. A technical writer is much, much cheaper than the dozens of developers trying to confirm the docs.
> Reading imaginary hallucinated documentation is just about as useful as zero documentation.
no, it's worse. it's closer to reading outdated documentation that outright lies and gives examples that don't work, and will cause you to waste hours/days learning things that aren't relevant to the api anymore.
The main problem is that documentation should be written on why code is written the way it is, and why it exists in the first place. This context is typically not available in the code itself. In the best case, this is encoded in requirements and design documentation, but more often than not, the information remains only in the heads of customers, architects, and developers.
The code itself merely describes how something is solved. Summarizing that in documentation can be useful at times, but it is not the full story. Especially for code that lives on for some time, the original design philosophy is often lost, and it is forced in horrible directions.
You're speaking from a position of survivorship. The jobs you've been hired to perform give you confidence. It's all the jobs you haven't been hired for that I'm more focused on.
Code doesn't capture architectural decision making, one of the most important being why other solutions were not used. LLMs would need to be nearly omniscient (understanding underlying hardware, customer needs, budgets even) to derive the reasoning behind those decisions afterwards from code alone.
This completely misses what valuable code documentation is for. Anyone can read the code and figure out what it does, what as a documentation can be convenient, but it's not really all that useful, especially if it rots over time. Even if it's painful at times, the what is all there in front of you in the code. Valuable documentation explains the why, why was one approach chosen over another, why do we need to do this to begin with, why are the edge cases the way they are. This information is not present in the code and no tool can ever extract it since it isn't there.
A source code repository is actually a terrible place to get that context. All of the things like decisions and how this relates to the customer are completely missing.
Isn't that what sw development today is? A description of a system and what we want it to do. With the advance of compilers, libraries, frameworks, linters, autocomplete systems and so on, we're already very close to describing the minimum amount of information the system needs in order to produce the correct result. To my knowledge actually physically writing the software has not been a bottleneck in a very, very long time.
Right now, it takes skill and labor to move descriptions between representations for business goals, engineering (where we have frameworks and linters, etc.), and external/customer facing documentation.
The customer is faced with an output from the design process. I think that we can turn that around now. Let customers edit part of the documentation, and let the AI adapt the system to their need.
You need good source material, including docs, to have LLMs generate docs that are accurate, reliable, and safe. LLMs have interesting applications in areas like SDK and API docs, for sure, but can't replace an entire function.
Correct but this is a good starting point for code that is written after the cut off of language models training data as you cannot otherwise debate accurate code form then for the newer versions of the library.
> If you don’t review, check, and merge docs the same way your org reviews, checks, and merges code, you’re not doing docs-as-code — you’re doing docs-as-bore.
While some WYSIWYG cloud-based docs platforms make it easier to make changes, that's not necessarily what you want. Docs are a critical component of how your users perceive your product - you want to have checks that it meets certain quality and accuracy standards. Just like your code.
And if you're an engineering lead company, you probably want your docs updates to be coordinated with your product releases. Git is just the logical place to put your docs in that case.
I've even created a company specifically to help with this workflow: https://www.doctave.com
Also, lots of comments here seem to be thinking of docstrings and other in-code documentation. I think that's really a different category that has a different set of goals and issues. This post is specifically about customer-facing documentation.