I felt most of this was just plain common sense. People read things by headings and sub headings. People look for relevant documentation for product x under product x not y. QA and code samples primes the LLM for what most developers like myself hunt for (a quick answer or a simple code snippet). The forum part got me. Seeing as forums tend to be variable in the quality and quantity of info. If the author(s) suggest forums why not discord servers and gitter chat as well? I know of serveral projects where the real documentation, examples and help is locked up on the discord/gitter channels. Also, in the same vein, why not Github PRs/issues as well? Having the LLM diagnose when an issue was cleared up, migration strats, etc. from github PRs/issues (as I've had to use from time to time) would be great too. Of course, Github/Discord/Gitter would require some kind of filtering to make sure it is data worth ingesting into the LLM but if it can identify it was worth ingesting then perhaps it could also suggest to the documentation team something worth documenting.
You nailed it. All of above sources are also super helpful for LLMs, but you have to be careful about how to ingest/parse them. For example for a Discourse Forum only including questions that have been marked "Resolved" by an official team member can work quite well. Same goes for Discord/Slack forums and GitHub Discussions etc.
I love that writing LLM-friendly docs is just... writing good docs. There's a ton of overlap between accessibility work and preparing things to be used by LLMs.
I wonder if an unintended side effect of this AI hype cycle is a huge investment in more accessible applications.
One surprising (to me at least) benefit of hooking up an LLM to your docs is that it is actually a really useful way to find gaps in your docs. For example, when an LLM cannot answer a user question, there's a good chance it's because the answer is not documented anywhere.
Confabulation should not be stable. Ie. you can generate answer let's say 3 times with different seed / non-zero temp and if it arrives at different answers you can categorize it as confabulation - which likely means, again, that the answer is not present and humans will likely also have different interpretations.
Just like how (it used to be) that writing good content would rank high in search results. But writing good content is hard. So we'll kid ourselves into writing good content for LLMs. Which is equally hard, but we'll feel like we're getting a leg up over everyone else -- who are all also doing the same thing.
It's unfortunate that people are more motivated to write for LLMs that are then used by humans than write for humans to begin with. Especially when the reason to use LLMs is because, on average, content is subpar making it difficult to find the good content.
Another case of Tragedy of the Commons Ruins Everything Around Me.
Fair point. Although good writing for humans = good writing for LLMs and vice versa. So I'm hopeful this new excitement around AI for docs if anything will just encourage folks to put even more effort into writing great docs.
No, it's not a fair point IMHO. LLMs are arguably the best way we've found to organize and represent textual information.
A document you can hold a meaningful conversation with is a big freaking deal, far superior to any conventional resource when you're trying to learn how to do something new.
My current working hypothesis is that the way to get the best out of an LLM (and any AI which uses them as the human interface layer) is the same way to get the best out of a human — because it's trained on humans interacting with other humans.
If you yell and swear at the chatbot, you'll get the response most similar to how a human would respond to yelling and swearing. I know the stereotype about drill instructors, but does that even work for marines, or is it just an exercise in learning to cope with stress?
I wonder how many of these groups went the opposite direction — creating the structure of those web pages by using an LLM?
I've (obviously, like almost everyone) experimented with creating stuff with ChatGPT, and… hmm. I was going to write "it made web pages like that", but: Clever Hans. I don't know if I might have subconsciously primed it to, because that's also something I like.
Presumably all the structure and section headings that they recommend don't have to be rendered by a browser as visible to humans. The LLMs should be smart enough to understand HTML directives that don't add a lot of unnecessary visual structure.
It can go too far. Too many section headings and it becomes unreadable, like an undergraduate textbook where you're constantly being distracted by sections and boxes.