Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I remember my first LZW implementation. It was so amusing to me the the one case where something didn't exist in the dictionary yet, you knew what it was and could add it and keep going!


The famous k<w>k<w>k case... indeed.

I think, though, that most instruction texts do the reader a disservice by starting with LZW (and thus having to deal with the kwkwk case from the get go). They should start with LZ78, which is much more simple and elegant; then show how -- although unimportant asymptotically -- we can save some bits by using Welch's trick (W in LZW), and show how that gives rise to the special case which needs to be handled.

Somewhat unexpectedly (for me anyway), if you write the decompressor as a memory stream decompressor rather than as a dictionary decompressor -- that is, instead of putting <w>,k pairs into your decoder dictionary while decoding, you note the output <offset> of where that <w> first appeared - there is no special case, and the implementation goes back to being LZ78 elegant and uniform.




Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: