Hacker News new | past | comments | ask | show | jobs | submit login
Ask HN: Static site generator that can cope with octopus juggling geysers?
15 points by iam-TJ on Sept 6, 2022 | hide | past | favorite | 23 comments
Intriguing title huh? It is the only way could think to succinctly describe what I need; and I suspect could be useful for other multi-disciplinary, generalist hackers that can turn their hand to any language or project and generate a lot of miscellaneous documenation, code, patches, and shell scripts.

Summary: Wanting a static site generator that (possibly via plug-ins) understands (or can be made to understand) Apache httpd mod-include conditional server-side-include (SSI) directives as an integral part of page content generation.

Over 20+ years I've generated thousands of hacking-focused small (and large) documentation of bug hunts, notes, transcripts, exposés, instructions, shell scripts and more. Most is either in Markdown, hand-edited HTML, or currently on-someone-elses-computer in the form of posts and comments across many sites (including HN) and I intend extracting those to Markdown locally to ensure I have a single source for everything and it cannot go AWOL.

I'm aiming to integrate them into a single static web site hosted by Apache httpd utilising mod-include[0] to do some clever server-side include abstract inclusion in index pages without any text duplication across multiple index pages, and with no use of dynamic server-side or client-side languages (so no PHP or Javascript).

The end result I want is each 'issue' in a single well-formed HTML5 page with semantic elements, embedded meta tags for classifying into one or more categories to enable generation of indexes, overview lists, tag clouds, or whatever.

The aim is to have abstracts (summaries) of issue pages included in the index/overview/category/tag-cloud pages via mod-include by only taking the title and first paragraph. In an index page e.g:

  <body class="index">
  ...
  <!--#include virtual="issue-0001.html" -->
  ...
  <!--#include virtual="issue-0002.html" -->
  ...
  <!--#include virtual="issue-0003.html" -->
  ...
and in the issue pages simply have some conditional server-side include code that determines if the page is being included in another - and if so only include the title and abstract text and not the entire HTML document.

The bonus for the index pages here is to use HTML5 semantic element "<details>" in order to have the browser automatically collapse the details text and show a 'reveal' icon next to it:

  <!--#if expr='v("SSL_LEVEL") -eq 0' -->
  <DOCTYPE html>
  ...
    <style>
     details:
    </style>
    </head>
    <body>
  <!--#endif -->
     <details <!--#if expr='v(SSL_LEVEL") -eq 0' -->open<!--#endif --> >
      <summary><h1>Issue title<h1></summary>
      <p>First paragraph of article containing the summary</p>
     </details>
  <!--#if expr='v("SSL_LEVEL") -eq 0' -->
   ...
   </body>
  </html>
  <!--#endif -->
This fragment adds the "open" attribute to "<details>" when the entire page is requested so the title and first para are shown by default. The index page has CSS to reduce the impact of the "<h1>" in the "<summary>"; e.g:

  .index details summary h1 {font-size: 14pt; font-weight: normal;}
I've already added code to mod-include to detect and conditionally react to nested includes after dealing with a bug in its existing handling[1].

I've recently chosen Jekyll for a variety of reasons; mostly because it focuses on generating HTML+CSS without any Javascript overload and has a large number of plug-ins and themes of various qualities to hand which may be a basis for me adding the feature I want. I'm open to changing if there's something that can (more easily) do what I require.

[0] https://httpd.apache.org/docs/current/mod/mod_include.html

[1] https://bz.apache.org/bugzilla/show_bug.cgi?id=66243




I don't really understand the question. Blasting text values into files wholesale is what static site generators do. The web server is operating on the textual values on the disk. You shouldn't need "support" for this, you just use the static site generator to output the desired text into the mass of files it produces. You seem to already know what text you want on the disk, and that seems like 80% of the problem.

I mean, if you really want to add a custom function or two to Jekyll you can, but you hardly need to wait around for "support" for this.

You may be asking for a template that has this built in, but you're just as well off to take an existing template you like and modify the text it is generating than to wait around for someone else to do it.

If it seems like the static site generator doesn't "support" this it's because it's just so "what it does" that there isn't any documentation for how you'd specifically put SSI tags into a site, because that'd just like expecting specific documentation for how you put <h2> tags into a website. It's just what it does already.


The point here is NOT to have the site-generator (SSG) create complete index/overview pages that would then include all the titles+summaries, but instead to understand a <!--#include ...>, parse the document it points to, and be able to use the metadata for generating its indexes, tag clouds, or whatever BUT not include the document itself.

So the SSG will understand and use the per-page metadata to generate index-type pages but actual inclusion of content is delayed until it is requested from the web-server.

Because I have covered a huge number of categories over the years and many pages will be included in multiple indexes/overviews/tag-clouds, I do not want multiple index pages with copies of the same text.


One way is to maintain 2 sets of files: a normalized one with no redundancies and just references, and another one with everything fully rendered and expanded. That second set is what you serve (bonus: it will be fully static); the first set is what you edit/maintain/search.

This is essentially a handmade caching system for your content.

Once you approach the problem that way, you can also consider other forms of caching, like having a dynamic, server-side generated site and a cache store sitting in front of it. That cache store can be some fancy caching system or it can be the output of a crawler than runs once a day against your dynamic site and outputs to a set of static html files that get served by your webserver.


Thanks - that may be a way to deal with my hybrid requirements; I have to make the solution work with a 30 year legacy of existing pages that make use of apache functionality in various ways.

Whatever I do has to apply incrementally as I (slowly) transfer everything into the new SSG controlled repository.

Normalised is definitely where I want to be for the canonical representation.

I guess there'd be two canonical representations actually; the SSG 'source' and the normalised hybrid HTML+SSI.


That really sounds like you just want SSR instead.


Yes; initially that was what I was, and still am, considering.

I hesitate because some of my existing content uses mod-include and other apache httpd functionality in different ways for different purposes.

I considered segmenting the site so some parts are cached/proxied using SSR (Server Side Rendering for those that wonder) but then I'm introducing artificial barriers that in the future I may forget are there and trip over whilst trying to throw some quick PoC or demo up.


I honestly can't think of a reason why you'd go to all the effort of using a static site generator and not generate complete pages. What benefit is there in using mod_include?


De-duplication, or rather, not duplicating in the first place. Perfectionist requirement :)


De-duplication, or rather, not duplicating in the first place. Perfectionist requirement :)

I refuse to accept that a perfectionist would use Apache.


Or you know, spend the extra $0.001 on the storage to let it be duplicated and move on with your life. De-duplication here has zero net benefits unless you are talking a static site generator here where you somehow have tens of billions of pages to make the storage cost prohibitive

You are also killing the environment by pissing power away on each request that could have been preprocessed and served indefinitely.


The problem with perfectionism is that it makes you optimise for one metric at the exclusion of all else.

Here, the price you pay for avoiding duplication is having to do extra work at run time. The fact that you need to do any work at all in user space also blocks you from using zero-copy IO.


You're talking about de-duplicating HTML.

Unless you've got literally millions of pages, you're probably talking about wasting a few megabytes at the most. You're adding the complication of using SSI for no real reason beyond "I just want it that way."


Seems like you are using a dynamic server-side language, but it's just in mod-include's SSI language instead of something more conventional.


Yes and No. The very limited functionality of mod-include and apache httpd expr-essions isn't Turing-complete and cannot be manipulated by client requests.

Secondly, as these are tightly integrated into httpd it turns out to be extremely efficient. I've been using mod-include in many ways for about 20 years.

Only files marked with execute attribute (+x) are handled by mod-include in my case.


One idea, maybe using file instead of virtual will avoid the bug and not require your added code.

    <!--#include file="issue-0001.html" -->
Also, be careful of any moves to supporting dynamic content as Server Side Includes can (sometimes?) lead to code exec even with IncludesNoExec.


How does this avoid the nesting bug? There are actually two bugs currently for my use-case:

1. The existing one where the code intended to detect nested includes fails to detect it

2. No way to know if nesting is happening, what level, or how many (that's the code I've added to mod_include)


I don't really understand what you're looking for, but the comments lead me to think of one thing that might be a little relevant, if not directly useful for you.

In mediawiki it's called transclusion. There is a markdown extension with similar functionality called multimarkdown. I looked into it briefly for a project once, but ended up going with basic markdown instead to keep maintenance and content writing as simple as possible.

https://en.m.wikipedia.org/wiki/Help:Transclusion

https://fletcherpenney.net/multimarkdown/


Thanks; transclusion is a good descriptive word for it although hard to grok until you read the explanation!

Multimarkdown sounds rather like pandoc which I use in other contexts.


Making static content that is tied to work only if you have specific web server with specific plugin seems... counter-productive at the very least

> I've already added code to mod-include to detect and conditionally react to nested includes after dealing with a bug in its existing handling[1].

and even more if you need to patch the web server to even make it work.

That being said jekyll include syntax is quite rich https://jekyllrb.com/docs/includes/ so you'd probably just do fine with it and minimal legwork


I think this is how PHP was invented


I dumped PHP around 2005 :)


You can do this 100% statically using XSLT, which is processed client-side.

Honestly, I'm surprised web browsers continue to support XSLT, as I've never seen it used, but I've used it here [1] for a while seemingly without issue.

[1] https://www.bostonesperanto.org/


Just use Hugo or some normal tool and move on.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: