I think they don't want PR_SET_PDEATHSIG but rather PR_SET_CHILD_SUBREAPER, whic...

timhh · 2025-02-24T09:32:28 1740389548

> PR_SET_CHILD_SUBREAPER

I wrote a tool that does just this: https://github.com/timmmm/anakin

If you run `anakin <some command>` it will kill any orphan processes that <some command> makes.

However is still isn't the true "orphans of this process must automatically die" option that everyone writing job control software wants - if `anakin` itself somehow crashes then the orphans can live again.

Still it was the best I could come up with that didn't need root.

badmintonbaseba · 2025-02-24T11:26:26 1740396386

The name of the tool is on point.

skissane · 2025-02-24T03:30:12 1740367812

> I think they don't want PR_SET_PDEATHSIG but rather PR_SET_CHILD_SUBREAPER, which I think would be both more correct than PDEATHSIG for letting them wait on grand-children / preventing grand-child-zombies, while also avoiding the issue they ran into here entirely.

PR_SET_PDEATHSIG automatically kills your children if you die, but unfortunately doesn’t extend to their descendants

As far as I’m aware, PR_SET_CHILD_SUBREAPER doesn’t do anything if you die. Assuming you yourself don’t crash, it can be used to help clean up orphaned descendant processes, by ensuring they reparent to you instead of init; but in the event you do crash, it doesn’t do anything to help.

PID namespaces do exactly what you want - if their init process dies it automatically kills all its descendants. However, they require privilege - unless you use an unprivileged user namespace - but those are frequently disabled, and even when enabled, using them potentially introduces a whole host of other issues

> Alternatively, if they want they could integrate with systemd

The problem is a lot of code runs in environments without systemd-e.g. code running in containers (Docker, K8S, etc), most containers don’t contain systemd. So any systemd-centric solution is only going to work for some people

Really, it would be great if Linux added some new process grouping construct which included the “kill all members of this group if its leader dies” semantic of PID namespaces without any of its other semantics. It is those other semantics (especially the new PID number semantics) which are the primary source of the security concerns, so a construct which offered only the “kill-if-leader-dies” semantic should be safe to allow for unprivileged access. (The one complexity is setuid/setgid/file capabilities - allowing an unprivileged process to effectively kill a privileged process at an arbitrary point in its execution is a security risk-plausible solutions include refuse to execute any setuid/setgid/caps executable, or else allow them to run but remove the process from this grouping when it executes one)

eqvinox · 2025-02-24T03:34:30 1740368070

> PR_SET_PDEATHSIG automatically kills your children if you die, but unfortunately doesn’t extend to their descendants

It indirectly does, unless you unset it the child dying will trigger another run of PDEATHSIG on the grandchildren, and so on. (The setting is retained across forks, as shown in the original article.)

kobzol · 2025-02-24T07:12:22 1740381142

It is sadly not propagated to grandchildren.

I tries the subreaper approach, but it doesn't help. The children are reparented to the worker, but when the worker dies, they are then just reparented to init, like normally.

TheDong · 2025-02-24T09:35:49 1740389749

You also need to specifically have the subreaper process call the "wait" syscall, and wait for all children, otherwise of course they'll end up reparented to init.

If you want to write a process manager, one of the process manager's responsibilities is waiting on its children.

ComputerGuru · 2025-02-24T14:19:28 1740406768

Just a nitpick: They don’t get reparented to init regardless of whether you call wait or not, so long as the parent process exists. They’ll be in a zombie state waiting to be reaped via a parent call to wait. Only if the parent dies/exits without reaping will they be reparented to init.

skissane · 2025-02-24T03:47:39 1740368859

> The setting is retained across forks, as shown in the original article

That’s not what the man page says:

> The parent-death signal setting is cleared for the child of a fork(2).

https://man7.org/linux/man-pages/man2/pr_set_pdeathsig.2cons...

Unless the man page is wrong?

zokier · 2025-02-24T06:46:26 1740379586

I wonder if this is difference between libc fork (which calls clone syscall) and kernel fork syscall.

skissane · 2025-02-24T08:01:38 1740384098

No, it isn’t. Neither glibc fork nor kernel fork syscall provide any special handling for PDEATHSIG beyond what clone syscall does.

eqvinox · 2025-02-24T14:45:08 1740408308

Yeah, I misremembered/misread and didn't check. Bleh.

(The article sets it after forking.)

vlovich123 · 2025-02-24T02:11:01 1740363061

> when the orphan terminates, it is the subreaper process that will receive a SIGCHLD signal and will be able to wait(2) on the process to discover its termination status

Seems like you don’t need a dedicated “always alive” thread if it’s being delivered to the process and tokio automatically does masking for threads so that you register for listening to signals using it’s asynchronous mechanisms & don’t have issues around signal safety which it abstracts away for you (i.e. as long as you’re handling the SIGCHILD signal somewhere or even just ignoring it as I don’t think they actually care?).

That being said, it’s not clear PR_SET_CHILD_SUBREAPER actually causes grand children to be killed when the reaper process dies which is the effect they’re looking for here (not the reverse where you reap forked children as they die). So you may need to spawn a dedicated reaper process rather than thread to manage the lifetime of children which is much more complicated.

eqvinox · 2025-02-24T03:37:15 1740368235

> That being said, it’s not clear PR_SET_CHILD_SUBREAPER actually causes grand children to be killed when the reaper process dies

CHILD_SUBREAPER kills neither children nor grandchildren. It's effect is in the other direction, inteded for sub-service-managers that want to keep track of all children. If the subreaper dies, children are reparented to the next subreaper up (or init).

TheDong · 2025-02-24T02:18:49 1740363529

Yeah, I was assuming they have something calling `wait` somewhere since they say "HyperQueue is essentially a process manager", and to me "process manager" implies pretty strongly "spawns and waits for processes".