Reverse typesetting: reflowing page layouts where you don't have knowledge of th...

perihelions · 2025-04-28T20:59:26 1745873966

Err, here's a visual explanation of what I mean by this, from my REPL:

https://ibb.co/album/MDw79y?sort=name_asc

(The source example is from David Tong's physics lectures notes, that were featured on HN last week — https://news.ycombinator.com/item?id=43763223 )

Foreignborn · 2025-04-28T08:08:49 1745827729

i don’t quite understand, what makes it reverse typesetting?

my understanding is your typesetting books for responsive eink readers.

perihelions · 2025-04-28T08:23:27 1745828607

You're inferring the structure of the document from the printed result. If typesetting takes a set of layout directives and outputs a page, this is taking a finished page and guessing what layout directives could create it. Then you can take that inferred structure and reflow the page in a new layout.

froh · 2025-04-28T08:39:12 1745829552

so like ocr but not recognizing characters and words but recognizing the layouted structure and transforming it into content markup and layout markup?

perihelions · 2025-04-28T08:47:07 1745830027

That's a way to view it!

The reason I'm not falling back on OCR is because the general case is full of things, like math equations and inset graphics/diagrams, that can't be OCR'd. The only robust way to deal with those is to treat them as graphical atoms: "this bounding box can be moved around, but should not be split up into pieces".