Ask HN: Static site generator that can cope with octopus juggling geysers?

jerf · on Sept 6, 2022

I don't really understand the question. Blasting text values into files wholesale is what static site generators do. The web server is operating on the textual values on the disk. You shouldn't need "support" for this, you just use the static site generator to output the desired text into the mass of files it produces. You seem to already know what text you want on the disk, and that seems like 80% of the problem.

I mean, if you really want to add a custom function or two to Jekyll you can, but you hardly need to wait around for "support" for this.

You may be asking for a template that has this built in, but you're just as well off to take an existing template you like and modify the text it is generating than to wait around for someone else to do it.

If it seems like the static site generator doesn't "support" this it's because it's just so "what it does" that there isn't any documentation for how you'd specifically put SSI tags into a site, because that'd just like expecting specific documentation for how you put <h2> tags into a website. It's just what it does already.

iam-TJ · on Sept 6, 2022

The point here is NOT to have the site-generator (SSG) create complete index/overview pages that would then include all the titles+summaries, but instead to understand a <!--#include ...>, parse the document it points to, and be able to use the metadata for generating its indexes, tag clouds, or whatever BUT not include the document itself.

So the SSG will understand and use the per-page metadata to generate index-type pages but actual inclusion of content is delayed until it is requested from the web-server.

Because I have covered a huge number of categories over the years and many pages will be included in multiple indexes/overviews/tag-clouds, I do not want multiple index pages with copies of the same text.

athenot · on Sept 6, 2022

One way is to maintain 2 sets of files: a normalized one with no redundancies and just references, and another one with everything fully rendered and expanded. That second set is what you serve (bonus: it will be fully static); the first set is what you edit/maintain/search.

This is essentially a handmade caching system for your content.

Once you approach the problem that way, you can also consider other forms of caching, like having a dynamic, server-side generated site and a cache store sitting in front of it. That cache store can be some fancy caching system or it can be the output of a crawler than runs once a day against your dynamic site and outputs to a set of static html files that get served by your webserver.

iam-TJ · on Sept 6, 2022

Thanks - that may be a way to deal with my hybrid requirements; I have to make the solution work with a 30 year legacy of existing pages that make use of apache functionality in various ways.

Whatever I do has to apply incrementally as I (slowly) transfer everything into the new SSG controlled repository.

Normalised is definitely where I want to be for the canonical representation.

I guess there'd be two canonical representations actually; the SSG 'source' and the normalised hybrid HTML+SSI.

linux_is_nice · on Sept 6, 2022

That really sounds like you just want SSR instead.

iam-TJ · on Sept 6, 2022

Yes; initially that was what I was, and still am, considering.

I hesitate because some of my existing content uses mod-include and other apache httpd functionality in different ways for different purposes.

I considered segmenting the site so some parts are cached/proxied using SSR (Server Side Rendering for those that wonder) but then I'm introducing artificial barriers that in the future I may forget are there and trip over whilst trying to throw some quick PoC or demo up.

onion2k · on Sept 6, 2022

I honestly can't think of a reason why you'd go to all the effort of using a static site generator and not generate complete pages. What benefit is there in using mod_include?

iam-TJ · on Sept 6, 2022

De-duplication, or rather, not duplicating in the first place. Perfectionist requirement :)

onion2k · on Sept 6, 2022

De-duplication, or rather, not duplicating in the first place. Perfectionist requirement :)

I refuse to accept that a perfectionist would use Apache.

delfinom · on Sept 6, 2022

Or you know, spend the extra $0.001 on the storage to let it be duplicated and move on with your life. De-duplication here has zero net benefits unless you are talking a static site generator here where you somehow have tens of billions of pages to make the storage cost prohibitive

You are also killing the environment by pissing power away on each request that could have been preprocessed and served indefinitely.

pdpi · on Sept 6, 2022

The problem with perfectionism is that it makes you optimise for one metric at the exclusion of all else.

Here, the price you pay for avoiding duplication is having to do extra work at run time. The fact that you need to do any work at all in user space also blocks you from using zero-copy IO.

Sohcahtoa82 · on Sept 6, 2022

You're talking about de-duplicating HTML.

Unless you've got literally millions of pages, you're probably talking about wasting a few megabytes at the most. You're adding the complication of using SSI for no real reason beyond "I just want it that way."

vinaypai · on Sept 6, 2022

Seems like you are using a dynamic server-side language, but it's just in mod-include's SSI language instead of something more conventional.

iam-TJ · on Sept 6, 2022

Yes and No. The very limited functionality of mod-include and apache httpd expr-essions isn't Turing-complete and cannot be manipulated by client requests.

Secondly, as these are tightly integrated into httpd it turns out to be extremely efficient. I've been using mod-include in many ways for about 20 years.

Only files marked with execute attribute (+x) are handled by mod-include in my case.

pentestercrab · on Sept 6, 2022

One idea, maybe using file instead of virtual will avoid the bug and not require your added code.

    <!--#include file="issue-0001.html" -->

Also, be careful of any moves to supporting dynamic content as Server Side Includes can (sometimes?) lead to code exec even with IncludesNoExec.

iam-TJ · on Sept 6, 2022

How does this avoid the nesting bug? There are actually two bugs currently for my use-case:

1. The existing one where the code intended to detect nested includes fails to detect it

2. No way to know if nesting is happening, what level, or how many (that's the code I've added to mod_include)

alanbernstein · on Sept 6, 2022

I don't really understand what you're looking for, but the comments lead me to think of one thing that might be a little relevant, if not directly useful for you.

In mediawiki it's called transclusion. There is a markdown extension with similar functionality called multimarkdown. I looked into it briefly for a project once, but ended up going with basic markdown instead to keep maintenance and content writing as simple as possible.

https://en.m.wikipedia.org/wiki/Help:Transclusion

https://fletcherpenney.net/multimarkdown/

iam-TJ · on Sept 6, 2022

Thanks; transclusion is a good descriptive word for it although hard to grok until you read the explanation!

Multimarkdown sounds rather like pandoc which I use in other contexts.

xani_ · on Sept 6, 2022

Making static content that is tied to work only if you have specific web server with specific plugin seems... counter-productive at the very least

> I've already added code to mod-include to detect and conditionally react to nested includes after dealing with a bug in its existing handling[1].

and even more if you need to patch the web server to even make it work.

That being said jekyll include syntax is quite rich https://jekyllrb.com/docs/includes/ so you'd probably just do fine with it and minimal legwork

corobo · on Sept 6, 2022

I think this is how PHP was invented

iam-TJ · on Sept 6, 2022

I dumped PHP around 2005 :)

colanderman · on Sept 6, 2022

You can do this 100% statically using XSLT, which is processed client-side.

Honestly, I'm surprised web browsers continue to support XSLT, as I've never seen it used, but I've used it here [1] for a while seemingly without issue.

[1] https://www.bostonesperanto.org/

encryptluks2 · on Sept 6, 2022

Just use Hugo or some normal tool and move on.