Hacker Newsnew | past | comments | ask | show | jobs | submit | aktau's commentslogin

(I agree with other commenters' assessment about the importance of the authors complaints, and recommend others checkout the Go memory regions proposal.)

For those interested, here's an article where Miguel Young implements a Go arena: https://mcyoung.xyz/2025/04/21/go-arenas/. I couldn't find references to Go's own experimental arena API in this article. Which is a shame since it'd be if this knowledgeable author traded them off. IIUC, Miguels version and the Go experimental version do have some important differences even apart from the API. IIRC, the Go experimental version doesn't avoid garbage collection. It's main performance benefit is that the Go runtimes' view on allocated memory is decreased as soon as `arena.Free` is called. This delays triggering the garbage collector (meaning it will run less frequently, saving cycles).


It's not the original source, but https://github.com/dpjudas/SurrealEngine is an active reimplementation of UE1.


I have a bunch, but one that I rarely see mentioned but use all the time is memo(1) (https://github.com/aktau/dotfiles/blob/master/bin/memo).

It memoizes the command passed to it.

  $ memo curl https://some-expensive.com/api/call | jq . | awk '...'
Manually clearing it (for example if I know the underlying data has changed:

  $ memo -c curl https://some-expensive.com/api/call
In-pipeline memoization (includes the input in the hash of the lookup):

  $ cat input.txt | memo -s expensive-processor | awk '...'
This allows me to rapidly iterate on shell pipelines. The main goal is to minimize my development latency, but it also has positive effects on dependencies (avoiding redundant RPC calls). The classic way of doing this is storing something in temporary files:

  $ curl https://some-expensive.com/api/call > tmpfile
  $ cat tmpfile | jq . | awk '...'
But I find this awkward, and makes it harder than necessary to experiment with the expensive command itself.

  $ memo curl https://some-expensive.com/api/call | jq . | awk '...'
  $ memo curl --data "param1=value1" https://some-expennsive.com/api/call | jq . | awk '...'
Both of those will run curl once.

NOTE: Currently environment variables are not taken into account when hashing.


You're gonna absolutely love up (https://github.com/akavel/up).

If you pipe curl's output to it, you'll get a live playground where you can finesse the rest of your pipeline.

  $ curl https://some-expensive.com/api/call | up


up(1) looks really cool, I think I'll add it to my toolbox.

It looks like up(1) and memo(1) have similar use cases (or goals). I'll give it a try to see if I can appreciate its ergonomics. I suspect memo(1) will remain my mainstay:

  1. After executing a pipeline, I like to press the up arrow (heh) and edit. Surprisingly often I need to edit something that's *not* the last part, but somewhere in the middle. I find this cumbersome in default line editing mode, so I will often drop into my editor (^X^E) to edit the command.
  2. Up seems to create a shell command after completion. Avoiding the creation of extra files was one of my goals for memo(1). I'm sure some smart zsh/bash integration could be made that just returns the completed command after completing.


Another thing I built into memo(1) which I forgot to mention: automatic compression. memo(1) will use available (de)compressors (in order of preference: zstd, lz4, xz, gzip) to (de)compress stored contents. It's surprising how much disk space and IOPS can be saved this way due to redundancy.

I currently only have two memoized commands:

  $ for f in /tmp/memo/aktau/* ; do 
      ls -lh "$f" =(zstd -d < $f) 
    done
  -rw-r----- 1 aktau aktau  33K /tmp/memo/aktau/0742a9d8a34c37c0b5659f7a876833b6dad9ec689f8f5c6065d05f8a27d993c7bbcbfdc3a7337c3dba17886d6f6002e95a434e4629.zst
  -rw------- 1 aktau aktau 335K /tmp/zshSQRwR9

  -rw-r----- 1 aktau aktau  827 /tmp/memo/aktau/8373b3af893222f928447acd410779182882087c6f4e7a19605f5308174f523f8b3feecbc14e1295447f45b49d3f06da5da7e8d7a6.zst
  -rw------- 1 aktau aktau 7.4K /tmp/zshlpMMdo
That's roughly 10x compression ratio.


This is terrific! I curl to files and then pipe them, all the time. This will be a great help.

I wonder if we have gotten to the point where we can feed an LLM our bash history and it could suggest improvements to our workflow.


Interesting idea. And pretty easy to try.

If you do it, I'd love to hear your results.

In general, I wonder if we're at the point where an LLM watching you interact with your computer for twenty minutes can improve your workflow, suggest tools, etc. I imagine so, because when I think to ask how to do something, I often get an answer that is very useful, so I've automated/fixed far more things than in the past.


.

   #!/usr/bin/env bash
   #
   # memo(1), memoizes the output of your command-line, so you can do:
   #
   #  $ memo <some long running command> | ...
   #
   # Instead of
   #
   #  $ <some long running command> > tmpfile
   #  $ cat tmpfile | ...
   #  $ rm tmpfile
   
   to save output, sed can be used in the pipeline instead of tee
   for example,
   
   x=$(mktemp -u);
   test -p $x||mkfifo $x;
   zstd -19 < $x > tmpfile.zst &
   <long running command>|sed w$x|<rest of pipeline>;
   
   # You can even use it in the middle of a pipe if you know that the input is not
   # extremely long. Just supply the -s switch:
   #
   #  $ cat sitelist | memo -s parallel curl | grep "server:"
   
   grep can be replaced with sed and search results sent to stderr
   
   < sitelist curl ...|sed '/server:/w/dev/stderr'|zstd -19 >tmpfile.zst;
   
   or send search results to stderr and to some other file
   sed can save output to multiple files at a time
   
   < sitelist curl ...|sed -e '/server:/w/dev/stderr' -e "/server:/wresults.txt"|zstd -19 >tmpfile.zst;


Those commands are a (1) harder to grok and (2) do not actually use the memoized result (tmpfile.zst) to speed up a subsequent run.

Can you give a more complete example of how you would use this to speed up developing a pipeline?


If provide sample showing (a) input format of text and (b) desired output format of text, then perhaps can provide an example of how to do the text processing


15 years of Linux and I learn something new all the time...


Its why I keep coming back, now how do I remember to use this and not go back to using tmpfiles :)


I use Warp terminal for couple of years, and recently they embeeded AI into it. At first I was irritated, disabled it, but AI Agent is built in as an optional mode (Cmd-I to toggle). And I found myself using it more and more often for commands that I have no capacity or will to remember or dig through the man pages (from "figure out my IP address on wifi interface" to "make ffmpeg do this or that"). It's fast and can iterate over own errors, and now I can't resist using it regularly. Removes the need for "tools to memorize commands" entirely.


I've been using bkt (https://github.com/dimo414/bkt) for subprocess caching. It has some nice features, like providing a ttl for cache expiration. In-pipeline memoization looks nice, I'm not sure it supports that


I was not aware of bkt. Thanks for the link. It seems very similar to memo, and has more features:

  - Explicit TTL
  - Ability to include working directory et al. as context for the cache key.
There do appear to be downsides (from my PoV) as well:

  - It's a rust program, so it needs to be compiled (memo is a bash/zsh script and runs as-is).
  - There's no mention of transparent compression, either in the README or through simple source code search. I did find https://github.com/dimo414/bkt/issues/62 which mentions swappable backends. The fact that it uses some type of database instead of just the filesystem is not a positive for me, I prefer the state to be easy to introspect with common tools. I will often memo commands that output gigabytes of data, which is usually highly compressible. Transparent compression fixes that up. One could argue this could be avoided with a filesystem-level feature, like ZFS transparent compression. But I don't know how to detect that in a cross-FS fashion.
I opened https://github.com/dimo414/bkt/discussions/63 so the author of bkt can perhaps also participate.


Caching some API call because it is expensive and use cached data many months later because of bash suggestion :(


The default storage location for memo(1) output is /tmp/memo/${USER}. Most distributions either have some automatic periodic cleanup, and/or wipe it on restart.

Separately from that:

  - The invocation contains *memo* right in there, so you (the user) knows that it might memoize.
  - One uses memo(1) for commands that are generally slow. Rerunning your command that has a slow part and having it return in a millisecond while you weren't expecting it should make the spider-sense tingle.
In practice, this has never been a problem for me, and I've used this hacked together command for years.


i see no way to name the memo in your examples, so how do you refer to them later?

also, this seems a lot like an automated way to write shell scripts that you can pipe to and from. so why not use a shell script that won't surprise anyone instead of this, which might?


The name of the memo is the command that comes after it:

  $ memo my-complex-command --some-flag my-positional-arg-1
In this invocation, a hash (sha512) is taken of "my-complex-command --some-flag my-positional-arg-1", which is then stored in /tmp/memo/${USER}/{sha512hash}.zst (if you've got zstd installed, other compression extensions otherwise).


Dude, this is _awesome_. Thank you for sharing!


Glad you like it. Hope you get as much use of it as me.


> `curl ... | jq . | awk '...'`

Uhm, jq _is_ as powerful (more) as awk. You can use jq directly and skip awk.

(I know, old habits die hard, and learning functional programming languages is not easy.)


Yes, I know. I should've taken a different example. But it's also realistic in a way. When I'm doing one-offs, I will sometimes take shortcuts like this. I know awk fairly well, and I know enough of jq that I know invoking jq . pretty prints the inbound json on multiple lines. While I know I could create a proper jq expression, the combo will get me there quicker. Similarly I'll sometimes do:

  $ awk '...' | grep | ...
Because I'm too lazy to go back to the start of the awk invocation and add a match condition there. If I'm going to save it to a script, I'll clean it up. (And for jq, I gotta be honest that my starting point these days would probably be to show my contraption to an LLM and use its answer as a starting point, I don't use jq nearly enough to learn its language by memory.)


But there is a grain of truth to their commentary. "Modern" gets old fast, it probably shouldn't be used, just like "new" shouldn't be used in project/library names.

IMHO, removing "modern, flexible and scalable" would improve that line.

I've diagonally read through the README, and for my taste there is overuse of "soft" adjectives (simple, easy, ...) which makes me (an engineer) discard it as marketing-driven.


Thanks for that reference. Do you know if the JVM still uses thin locks? Did they migrate to thin locks? I ask because I found a 9 year old reference with a JVM calling futex: https://stackoverflow.com/questions/32262946/java-periodical....


OpenJDK has used thin locks since back when it was called HotSpot.

The fact that the JVM is hanging in futexes doesn’t meant anything for the purpose of this discussion. Futexes are _the_ OS locking primitive on Linux so I would expect modern JVM thin locks to bottom out in a futex syscall. That doesn’t mean that the JVM is “using” futexes in the sense of the original post.


> In theory compare-and-swap or the equivalent instruction pair load-exclusive/store-conditional are more universal, but in practice they should be avoided whenever high contention is expected. The high performance algorithms for accessing shared resources are all based on using only fetch-and-add, atomic exchange, atomic bit operations and load-acquire/store-release instructions.

> This fact has forced ... there were no atomic read-modify-write operations, so they have added all such operations in the first revision of the ISA, i.e. Armv8.1-A.

I'm not sure if you meant for these two paragraphs to be related, but asking too make sure:

  - Isn't compare-and-swap (CMPXCHG on x86) also read-modify-write, which in the first quoted paragraph you mention is slow?
  - I think I've benchmarked LOCK CMPXCHG vs LOCK OR before, with various configurations of reading/writing threads. I was almost sure it was going to be an optimization, and it ended up being inobservable. IIRC, some StackOverflow posts lead me to the notion that LOCK OR still needs to acquire ownership of the target address in memory (RMW). Do you have any more insights? Cases where LOCK OR is better? Or should I have used a different instruction to set a single bit atomically?


In terms of the relative cycle cost for instructions, the answer definitely has changed a lot over time.

As CAS has become more and more important as the world has scaled out, hardware companies have been more willing to favor "performance" in the cost / performance tradeoff. Meaning, it shouldn't surprise you if uncontended CAS as fast as a fetch-and-or, even if the later is obviously a much simpler operation logically.

But hardware platforms are a very diverse place.

Generally, if you can design your algorithm with a load-and-store, there's a very good chance you're going to deal with contention much better than w/ CAS. But, if the best you can do is use load-and-store but then have a retry loop if the value isn't right, that probably isn't going to be better.

For instance, I have a in-memory debugging "ring buffer" that keeps an "epoch"; threads logging to the ring buffer fetch-and-add themselves an epoch, then mod by the buffer size to find their slot.

Typically, the best performance will happen when I'm keeping one ring buffer per thread-- not too surprising, as there's no contention (but impact of page faults can potentially slow this down).

If the ring buffer is big enough that there's never a collision where a slow writer is still writing when the next cycle through the ring buffer happens, then the only contention is around the counter, and everything still tends to be pretty good, but the work the hardware has to do around syncing the value will 100% slow it down, despite the fact that there is no retries. If you don't use a big buffer, you have to do something different to get a true ring buffer, or you can lock each record, and send the fast writer back to get a new index if it sees a lock. The contention still has the effect of slowing things down either way.

The worst performance will come with the CAS operation though, because when lots of threads are debugging lots of things, there will be lots of retries.


Reminds me a bit of Steve Yegge's latest [^1]. He gives an LLM full control over his editor (Emacs) by allowing it to call eval (as I understand it). He doesn't talk about which guardrails (if any) he put on this.

[^1]: https://x.com/Steve_Yegge/status/1942336357650817235


I just wish he'd put that stuff on his blog rather than on Twitter.

I love his insights, but I'm not creating an account to see them.


Not sure if there is already quorum on what a solution for adding labels to non-point-in-time[^1] profiles like the heap profile without leaking looks like: https://go.dev/issue/23458.

[^1]: As opposed to profile that collect data only when activated, like the CPU profile. The heap profile is active from the beginning if `MemProfileRate` is set.


This goes straight into my reference list. Sandboxing a process is confusing on Linux.

I appreciate that the article focuses on approaches that drop privileges without having root oneself. I've seen landlock referenced at time (https://lwn.net/Articles/859908/), but never so clearly illustrated (the verbosity feels like Vulkan).

Out of curiosity, I'd wish even more approaches were compared, even if they require root. I was about to mention seccomp-bpf as an approach that requires root, but skimming the LWN article I posted above I find: "Like seccomp(), Landlock is an unprivileged sandboxing mechanism; it allows a process to confine itself". It seems like I was wrong, and seccomp could be compared/contrasted.


Absolutely, seccomp is also an unprivileged sandboxing mechanism in Linux. It does have the drawback however that the policies are defined in terms of system call numbers and their (register value) arguments, which complicates things, as it is a moving target.

The problem was also recently discussed at https://lssna2025.sched.com/event/1zam9/handling-new-syscall...


This is one area where Gerrit Code Review is (was? I don't know if it changed) is superior. It stores everything it knows about in git repositories (preferences in a separate meta git repository, comments, patches). With the right refspec, you can pull it all down and have a full backup.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: