I have a rule of keeping every page on wordsandbuttons.online under 64KB. And no...

Nyandalized · on Aug 1, 2019

With the help of transport compression like gzip, the total size can still be reduced by almost the same amount even if you don't minimize it.

vardump · on Aug 1, 2019

I once wrote (maybe 15-20 years ago?) an html output processor that tried to make it more compressible while still producing the exact same output. It did things like removed comments, transformed all tag names to lower case, sorted tag attributes and canonicalized values, collapsed whitespace (including line feeds).

And some more tricks I've forgotten (some DOM tree tricks, I think), mainly to introduce more repeated strings for LZ and unbalanced distribution (=less output bits) for Huffman. In other words, things that help gzip to compress even further.

Output was really small, most pages were transformed from gzipped sizes of 10-15 kB to 2-5 kB without graphics.

The pages loaded fast, pretty much instantly, because they could fit in the TCP initial window, avoiding extra roundtrips. Browser sent request and server sent all HTML in the initial window even before the first ACK arrived! I might have tweaked initial window to 10 packets or something (= enough for 14 kB or so), I don't remember these TCP details by heart anymore.

I wonder if anyone else is making this kind of HTML/CSS compressability optimizers anymore. Other than Javascript minimizers.

barryvan · on Aug 1, 2019

They are! Around five years ago I wrote a CSS minifier (creatively called CSSMin, available on GitHub, and still in use at the company I work for) which rewrote the CSS to optimise gzip compression. Although it never really took off, I think that some of the lessons from it have been rolled into some of the more modern CSS optimisation tools.

vardump · on Aug 1, 2019

It's important to understand minimizing does not necessarily produce the most compressible result. You need to give LZ repeating strings as much as possible while using as few different ASCII characters as possible with as unbalanced frequency distribution as possible.

hinkley · on Aug 1, 2019

I wrote (well, expanded) a similar tool for compressing Java Class files. I had a theory that suffix sorting would work slightly better because of the separators between fields, and it turned out to be worth another 1% final size versus prefix sorting.

vbezhenar · on Aug 1, 2019

I've found a cheap trick to compress Java software: extract every .jar file (those are zip archives) and compress the whole thing with a proper archiver (e.g. 7-zip). One example from my current project: original jar files: 18 MB expanded jar files: 37 MB compressed with WinRar: 10 MB

And that's just a little project. For big projects there could be hundreds of megabytes of dependencies. Nobody really cares about that...

Cthulhu_ · on Aug 1, 2019

It's a tradeoff; in a lot of cases, the size of a .jar doesn't really matter because it ends up on big web containers.

It does matter for e.g. Android apps though. But at the same time, the size of the eventual .jar is something that can be optimized by Google / the Android store as well, using what you just described for starters.

I know Apple's app store will optimize an app and its resources for the device that downloads it. As a developer you have to provide all image resources in three sizes / pixel densities for their classes of devices. They also support modular apps now, that download (and offload) resources on demand (e.g. level 2 and beyond of a game, have people get past level 1 first before downloading the rest).

hinkley · on Aug 1, 2019

It's true, but this was brought up as an anecdote/parallel.

Attributes in html have no fixed order, and neither do constants in a class file. There are multiple ways to reorder them that help out or hinder DEFLATE.

And also I was compressing the hell out of JAR files because they were going onto an embedded device, so 2k actually meant I could squeeze a few more features in.

hinkley · on Aug 1, 2019

There’s a lot of redundancy between class files in Java and zlib only has one feature for that and nobody uses it. It would require coordination that doesn’t really exist.

For transport, Sun built a dense archive format that can compress a whole tree of files at once. It normalizes the constant pool (a class file is nearly 50% constants).

Many Java applications run from the Jar file directly. You never decompress them. But you also only see something like 5:1 compression ratios.

Hitton · on Aug 1, 2019

That's extremely interesting. Would you happen to still have the code lying around? Or would you recommend some itroductory materials on this topic?

vardump · on Aug 1, 2019

I might still have it on some hard disk that's been unplugged in storage for ages. But probably long since lost. I wrote it by trying out different things and seeing how it affected gzipped size.

Just use some HTML parser and prune html comment nodes and empty elements when safe (for example removing even empty div is not!), collapse whitespace, etc. If majority of text nodes is in lower case, ensure also tags, attribute names etc. is as well. Ensure all attribute values are same way, say attr=5, but not attr='5' or attr="5". Etc. That's all there is to it.

It saved a lot already as a result of whitespace collapsing, which also removes high frequency chars like linefeeds, etc. leaving shorter huffman table entries for the data that actually matters.

Study how LZ77 and Huffman works.

okaleniuk · on Aug 1, 2019

Wow! That sounds fascinating.

masklinn · on Aug 1, 2019

If your page is static, it's even worth trying something like zopfli or advancecomp to maximise compression ratio in ways too expensive to do "online".

mr__y · on Aug 1, 2019

That's obviously true, however a minimized version will require less memory and slightly less cpu-cycles* to compress and, on the client side, it requires slightly less resources as well

I do realize how insignificant difference that would be * then again not much of a difference since the DOM tree itself would consume orders o magnitude more mem.

hinkley · on Aug 1, 2019

Probably not less memory. zlib is based on a design that dates back to an era where you might only have 250-350 kilobytes (not a typo) of RAM to work with, and it was never really extended beyond that. It has a window it keeps in memory and if your file is longer than that window, you hit peak memory and stay there (you might actually hit that window immediately. I've forgotten how that part works, but some chunks of memory are pre-allocated).

masklinn · on Aug 1, 2019

That's really DEFLATE, the sliding window of standard deflate is 32KB. Both compression and decompression have some overhead (compression more so as you might want to have index tables and whatnot to make finding matcher faster) but even with the worst possible intention there's only so much overhead you can add.

hinkley · on Aug 1, 2019

Level 9 uses 128k, if memory serves.

We’re talking about HTTP here, and gzip is the only reliably available compressed transport encoding.

On the plus side, because it is so resource constrained you have had it on your phone for ages, and might even see it on IoT devices.

masklinn · on Aug 2, 2019

> Level 9 uses 128k, if memory serves.

That's probably a misunderstanding or misremembering: the DEFLATE format can only encode distances of 32K (the proprietary DEFLATE64 allows 64K distances but not everything supports it).

hinkley · on Aug 1, 2019

Have we provided any tools that managers are capable of using to see page weight and explore on their own? Or are we making graphs and showing them charts?

Maybe we are missing a plugin with a mileage gauge.

pflenker · on Aug 1, 2019

As an Engineering Manager I had Engineers showing me graphs and charts, but I also know how to look up things on my own. But I don't think either case is wide spread.

hinkley · on Aug 1, 2019

The fact that you are on hacker news suggests you are the Engineer turned Manager instead of tech enthusiast who went to business school. Yes?

People are a little more reliable when they have the option to figure things out for themselves. I’m not sure entirely why that is. But if pressed, I’d conjecture it’s something to do with not wanting to be seen asking subordinates stupid questions, to the point of preferring to be ignorant or half blind instead. “Keep silent and let them suspect, or speak and remove all doubt.”