Hi! A few years ago I found myself wanting an equivalent of `column` that didn’t strip color codes. After I implemented it in Haskell, I found it was useful to use Nix to force statically linking against libraries like gmp to reduce startup time. Perhaps what I ended up doing might be helpful for you too: https://github.com/bts/columnate/blob/master/default.nix
Thank you for the suggestion, I'll give this a whirl! I've fussed around with `--ghc-options '-optl-static -fPIC'` and the like in years past without success.
I imagine for streaming tools like these it's pretty convenient. You don't have to manage buffers etc, just write code against a massive string and haskell takes care of streaming it for you and pulling in more data when needed.
There are libraries that handle it, but they probably have weird types, you can just use functions in the prelude to write a lot of these basic utilities.
For one thing, a "string" in Haskell by default is a linked list of unicode characters, so right out of the gate you've got big performance problems if you want to use strings. The exact way laziness is done also has serious performance consequences as well; when dealing with things as small as individual characters all the overhead looms large as a percentage basis. One of the major purposes of any of the several variants of ByteString is to bundle the bytes together, but that means you're back to dealing with chunks. Haskell does end up with a nice API that can abstract over the chunks but it still means you sometimes have to deal with chunks as chunks; if you turn them back into a normal Haskell "string" you lose all the performance advantages.
It can still come out fairly nice, but if you want performance it is definitely not just a matter of opening a file and pretending you've just got one big lazy string and you can just ignore all the details; some of the details still poke out.
It definitely has some sharp edges. One advantage is skipping computations (and the IO they'd need) that don't end up getting used, which let's you do some clever looking things/ ignore some details. That's hard to take advantage of in practice, I think.
The other advantage is just deferring IO. For instance in split or tee, you could decide that you need 500 output files and open all the handles together in order to pass them to another function that will consume them. I'd squint at someone who wrote `void process_fds(int fds[500]);`, but here it doesn't matter.
I think the scope of lazy constructs should usually be far less than that of strict constructs, so it's only in the cases where the librarified lazy abstractions don't fit that you need a rewrite. Lazy to strict isn't hard, but I don't want the performance and cognitive overhead of lazy-by-default.
Performance (execution, memory) is generally in the same ballpark as the BSD versions, with some caveats specific to utils that do lots of in place data manipulation.
cut comes to mind as an example, slicing and dicing lines into fields quickly without a ton of copies isn't easy. Using Streaming.ByteString generally makes a huge difference, but it's extremely difficult to use unless you get can your mind to meld with the types it wants. Picking it up again months later takes some serious effort.