Speaking of skill, learning a new language is always daunting, but I found that LLMs do a pretty good job of generating Typst code. I relied on that a lot for generating snippets of code, and learning about the language, which would've taken me more time otherwise. Although the Typst docs are pretty good, regardless.
What LLMs? In my experience they do a terrible job with Typst. Very frequently ChatGPT and Gemini will produce code that doesnt work. Im not sure if it's using an older syntax or just hallucinating. Additionally, it's rarely able to fix it after I provide the error and even copy-past docs.
Maybe I was just unlucky or you had better luck with another model. But I was very surprised to here this because Typst is my chief example for a language that LLMs are bad at.
This was a few months ago, but mainly Claude Sonnet 3.5 IIRC.
You can't escape hallucinations, of course, but they can be mitigated somewhat. Don't ask it to generate a bunch of code at once. Feed it a snippet of code, and tell it precisely what you want it to do. These are general LLM rules applicable to any language and project, and I've had success with them with Typst. I also used it just to get explanations, or general ideas about how to do something, and then figure out the actual syntax and working code myself. It's strange that you haven't had luck with pasting docs. That has usually worked well for me.
I also think that LLMs don't struggle as much with Typst because the language is relatively simple, and there is little bad and outdated content about it online, so they weren't trained on it. I assume that the API probably hasn't changed as much either and there haven't been many compatibility issues, so it's not as important which version the LLM was trained on.
I just tried the same prompt in chatGPT and it gave 10 errors. Mostly they were because it was using `#` as a comment character, which suggests that it has not been given very much typst code to examine.
I just tried the same prompt with Claude Sonnet 4.5. (Using a generic "programming" system prompt. It's a bit long, so I won't paste it here, but I can share it if you're interested.)
This creates a Typst document with:
- US Letter page size (8.5" × 11")
- 2-column layout
- Sample text (replace `#lorem(100)` with your actual content)
------------------
Which compiled to PDF without errors using Typst CLI v0.13.1. The two column layout is not visible with `lorem(100)`, but `lorem(1000)` shows it clearly.
I'm surprised you got 20 compile errors for what should be a very simple document.
So I would call this a success. I doubt Claude is that much better than ChatGPT (GPT-5?).
I'm not saying that this invalidates your experience, but I would suggest experimenting with different strategies and LLMs. Try some prompt variations, feeding it more (or less) context, higher-quality context, documentation, etc. These tools are highly sensitive and unreliable, but can be coerced into producing helpful results with some care and patience.
EDIT: I was curious, so I tried the same with ChatGPT-4o using OpenRouter's default system prompt. It produced this:
------------------
// Set US letter page size
#set page(width: 8.5in, height: 11in)
// Create two-column layout using grid
#layout(grid: (
columns: 2,
gutter: 0.5in, // space between columns
))
// Example content
= My Two-Column Document
This is some text in the first paragraph. It will flow into two columns across the page.
== Section One
Here is more content that will automatically be distributed between the two columns. Typst handles the balancing and splitting of content across columns for you.
== Section Two
You can continue to add as much content as you'd like, and it will remain in the two-column format automatically.
------------------
This failed to compile because of a single error with `#layout`. Still, Typst errors are clear, and I could've probably fixed it easily myself by referencing the documentation.
Then I tried the same model with my custom generic programming system prompt, and it produced this:
------------------
#set page(
width: 8.5in,
height: 11in,
margin: 1in,
columns: 2,
column-gap: 0.5in,
)
= Title
Here is some example text in the first paragraph. It will flow into two columns automatically.
== Section
More text continues here, and Typst will manage the column layout accordingly.
------------------
Which is better, but still failed to compile because `column-gap` is not a valid argument. Simply removing it compiled without errors.
I still would've consulted the official documentation to determine whether this is idiomatic code or not, but these are not terrible results. As with all LLMs, you'll get the most out of them if you use them as assistants, rather than vibe coding tools.
Yep this is how I started my Typst journey. I was intimidated by Typst at first and wanted to do some mildly complicated documents that really isn't covered by the tutorial, so I had ChatGPT generate elements of the document I need. Now I'm a more self-sufficient being able to generate functions and use more complicated features of Typst and better exercise use of the docs.
> but I found that LLMs do a pretty good job of generating Typst code.
Interestingly, I've had the opposite experience. ChatGPT and Claude repeatedly gave me errors, apologized profusely, and then said, "ah, I had the wrong keyword. It's actually <blahblah>"--and that would simply give me another error and a subsequent apology.
At least Gemini had the good taste of telling me that it didn't know how to do what I wanted with typst.
It's certainly possible that I was trying to do something a little too unusual (who knows), but I chalked it up to the LLMs not having a large enough corpus of training text.
On the bright side, the typst documentation is quite good and it was just a matter of adjusting example code that got me on track.
Well, that just goes to show that these tools are wildly unpredictable. I've had bad experiences generating Go, whereas I've read many experiences of the opposite.
> I chalked it up to the LLMs not having a large enough corpus of training text.
I'm inclined to believe the opposite, actually. It's not so much about the size of the training data, but the quality of it. Garbage in, garbage out. Typst is still very young, and there's not much bad code in the wild.
And the language itself plays a large role. A simple language with less esoteric syntax and features should be easier to train on and generate than something more complex. This is why I think LLMs are notoriously bad at generating Rust code. There's plenty of it to train on, but Rust is a deep pit of complexity and unusual syntax. Though it helps when the language is strict and statically typed, so that the compiler can catch issues early. I would dread relying on generated Python code, despite of how popular and simple it is on the surface.