Wow, hello! This is my repository. I'm happy to answer any questions.

faragon · 2024-11-03T13:53:24 1730642004

Very beautiful implementation of the awk interpreter in less than 600 lines!

https://github.com/Gandalf-/coreutils/blob/master/Coreutils/...

anardil · 2024-11-03T15:53:01 1730649181

Thank you! This is one of my favorites. User declared variables are next on the todo list, when I get back to it.

OskarS · 2024-11-03T18:17:02 1730657822

It is really gorgeously written Haskell. I’ve only dabbled in Haskell, but you’re really shetting my appetite for digging in deeper.

bts · 2024-11-03T14:13:35 1730643215

Hi! A few years ago I found myself wanting an equivalent of `column` that didn’t strip color codes. After I implemented it in Haskell, I found it was useful to use Nix to force statically linking against libraries like gmp to reduce startup time. Perhaps what I ended up doing might be helpful for you too: https://github.com/bts/columnate/blob/master/default.nix

anardil · 2024-11-03T16:46:54 1730652414

Thank you for the suggestion, I'll give this a whirl! I've fussed around with `--ghc-options '-optl-static -fPIC'` and the like in years past without success.

cosmic_quanta · 2024-11-03T13:56:26 1730642186

Could you speak to the advantages of Haskell's lazy IO? I only hear about its disadvantages usually

habitue · 2024-11-03T15:06:32 1730646392

I imagine for streaming tools like these it's pretty convenient. You don't have to manage buffers etc, just write code against a massive string and haskell takes care of streaming it for you and pulling in more data when needed.

There are libraries that handle it, but they probably have weird types, you can just use functions in the prelude to write a lot of these basic utilities.

jerf · 2024-11-04T02:55:19 1730688919

Unfortunately, while that may be the dream, it doesn't work out that way if you want good performance. If you look at the source you'll see that it uses things like https://hackage.haskell.org/package/streaming-bytestring-0.3... a lot.

For one thing, a "string" in Haskell by default is a linked list of unicode characters, so right out of the gate you've got big performance problems if you want to use strings. The exact way laziness is done also has serious performance consequences as well; when dealing with things as small as individual characters all the overhead looms large as a percentage basis. One of the major purposes of any of the several variants of ByteString is to bundle the bytes together, but that means you're back to dealing with chunks. Haskell does end up with a nice API that can abstract over the chunks but it still means you sometimes have to deal with chunks as chunks; if you turn them back into a normal Haskell "string" you lose all the performance advantages.

It can still come out fairly nice, but if you want performance it is definitely not just a matter of opening a file and pretending you've just got one big lazy string and you can just ignore all the details; some of the details still poke out.

habitue · 2024-11-04T15:37:14 1730734634

I mean, I'm aware of the downsides, the OP asked why someone might use it. Ease of use seems like a reasonable upside

anardil · 2024-11-03T16:00:11 1730649611

It definitely has some sharp edges. One advantage is skipping computations (and the IO they'd need) that don't end up getting used, which let's you do some clever looking things/ ignore some details. That's hard to take advantage of in practice, I think.

The other advantage is just deferring IO. For instance in split or tee, you could decide that you need 500 output files and open all the handles together in order to pass them to another function that will consume them. I'd squint at someone who wrote `void process_fds(int fds[500]);`, but here it doesn't matter.

mrkeen · 2024-11-03T16:07:28 1730650048

If your language doesn't give you laziness, you're reinventing it yourself with strict primitives each time.

vacuity · 2024-11-03T17:16:45 1730654205

On the other hand, when you don't want laziness you really won't like if it's present anyways.

weebull · 2024-11-03T17:29:02 1730654942

Lazy to strict is reasonably easy to do though. The problem is normally that once one bit goes strict, most other things implicitly do too.

Strict to lazy is normally a rewrite.

weebull · 2024-11-03T17:29:02 1730654942

Lazy to strict is reasonably easy to do though. The problem is normally that once one bit goes strict, most other things implicitly do too.

Strict to lazy is normally a rewrite.

vacuity · 2024-11-03T23:19:24 1730675964

I think the scope of lazy constructs should usually be far less than that of strict constructs, so it's only in the cases where the librarified lazy abstractions don't fit that you need a rewrite. Lazy to strict isn't hard, but I don't want the performance and cognitive overhead of lazy-by-default.

aeonik · 2024-11-03T12:47:47 1730638067

You specify "fast", can you elaborate on the performance of the collection? How does it compare to the standard core utils?

Great work, looks amazing.

anardil · 2024-11-03T15:52:23 1730649143

Performance (execution, memory) is generally in the same ballpark as the BSD versions, with some caveats specific to utils that do lots of in place data manipulation.

cut comes to mind as an example, slicing and dicing lines into fields quickly without a ton of copies isn't easy. Using Streaming.ByteString generally makes a huge difference, but it's extremely difficult to use unless you get can your mind to meld with the types it wants. Picking it up again months later takes some serious effort.

Vosporos · 2024-11-03T19:19:24 1730661564

Fantastic work, thank you so much!

anacrolix · 2024-11-04T13:11:35 1730725895

LOTR fan detected