Runsit – A process manager in Go

chubot · on June 21, 2014

FWIW I also started writing a init-like server in Go. One thing I ran into was that Go's APIs sort of coerce you into having an extra thread per process. runsit also has this issue. See line 489 of runsit.go, pasted below.

You can do it non-portably in Go by using os.ForkExec and Wait4(-1). The portable exec package assumes you will call Wait(pid), and not Wait(-1), which basically implies using a thread per process. Go's runtime isn't magic -- if you call libc/syscall wait(), an entire thread will be blocked, and the runtime can't use it for anything else. In this case this is the lifetime of an entire process, which is forever for server processes.

I'm pretty sure nobody would use a real PID 1 that burned a thread per process (systemd, upstart, etc.). But yes, for most use cases, in the grand scheme of things, it's probably not a big deal. I suppose Linux has an O(1) scheduler, although I'm not quite sure how this affects scheduling (interested in any comments).

But this goes to show that portable APIs are awkward and obscure for low level code. Better to use raw Unix APIs for something like an init server. Python and Java have similar problems.

IMO all interesting code nowadays is POSIX-like, so we should drop the pretension of portability and simplify our lives. Unix works.

    // run in its own goroutine
    func (in *TaskInstance) awaitDeath() {
      in.waitErr = in.cmd.Wait()  // ties up an OS thread for the lifetime of a process
      ...  
    }

pedrocr · on June 21, 2014

> I suppose Linux has an O(1) scheduler

Actually not anymore. The current CFS scheduler is no longer O(1) but O(logN):

https://en.wikipedia.org/wiki/Completely_Fair_Scheduler

robryk · on June 21, 2014

I wonder if it would make sense to make a "child poller" akin to the net poller:

Have one goroutine loop forever:

  * wait for SIGCHLD
  * take the childlock
  * do nonblocking Wait() until no unwaited child remains
  * release the childlock

The childlock would need to be taken for reading by os.Process.Kill and when a syscall that takes a PID of a child is called (after taking it we'd need to verify that the process we intend to touch isn't already dead).

chubot · on June 21, 2014

Yes, you can do that. As mentioned, it's not portable (which is fine with me).

However, you don't need to use goroutines (or threads). You do it as you would in C (and how all real PID 1 systems are written) -- with a single thread that starts processes, receives signals, and reaps children in a non-blocking fashion.

This style of program -- a program that needs to simultaneously wait for child processes/signals and fd events -- is quite awkward in Unix, but it definitely works when you get the idea.

To wait on a fd and a signal in a single threaded program, you would use the "self pipe trick" in classic Unix. In Linux, you can ask for a signal to be delivered over a file descriptor with fdsignal(). But AFAICT there is no real reason, and portability across Unix IS a good thing IMO (but not portability to completely different OS's like Windows; in that case I would write a completely separate program using their native APIs).

node.js actually does a great job making this API easy and efficient. It is probably the only runtime (Python/Ruby/JVM/etc.), that doesn't suffer from this problem doing "async processes" (i.e. a complement to async networking).

burke · on June 21, 2014

We have an init process that runs inside docker containers, and we took a bit of a different route, in listening on SIGCHLD and then doing non-blocking Wait4(0,...) until there are no more children to reap:

https://gist.github.com/burke/1c105378ac0629b39485

chubot · on June 21, 2014

I think that is basically the same thing. I haven't used Wait4(0), but it looks like it is the same as Wait4(-1), as long as you don't change the process group ID of any of the children?

In any case, you are not calling Wait4(<specific PID>), which is what implies the thread per process.

zemo · on June 21, 2014

A goroutine is not a thread. That only ties up that one goroutine.

chubot · on June 21, 2014

Your first statement is true; the second isn't.

Re-read what I wrote. If that doesn't convince you, then download and run the code. Run "pstree" on it and observe how many child processes and threads there are. You'll learn something useful about the relationship of the Go runtime to the OS.

zimbatm · on June 21, 2014

A thread is created for blocking syscalls. I don't know if it's a syscall under the hood here but he might be talking about that.

armenb · on June 21, 2014

Just an introduction: Currently I'm using runsit as an alternative for supervisord and I'm happy with it so far. Stdout and stderror of the processes can be queried with a very simple HTTP interface. Runsit watches a config directory for any changes and applies them immediately. Config files are in json format.

elithrar · on June 21, 2014

Mind sharing a (lightly commented) config? I've been using Supervisor to run my Go services for a long while (the built-in log rotation is one of the big attractions) but keen to try alternatives. I never found mmonit and other alternatives to be as comprehensive as Supervisor.

armenb · on June 21, 2014

For instance I use following config (nginx.json) for nginx:

{ "user": ["_env", "${USER}"], "cwd": "/var/www", "standardEnv": true, "numFiles": 1024, "binary": "/usr/sbin/nginx" }

And php-fpm.json:

{ "user": ["_env", "${USER}"], "cwd": "/usr/sbin", "standardEnv": true, "numFiles": 1024, "binary": "php-fpm", "args": [ "-F" ] }

Hope that helps

armenb · on June 21, 2014

Forgot to tell you that nginx shouldn't be daemonized:

  echo "daemon off;" >> /etc/nginx/nginx.conf

fcoury · on June 21, 2014

And the JSON parser is awesome, it gives user friendly error messages when the file has the wrong format. Very cool.

armenb · on June 21, 2014

Yes it is.

gwoo · on June 21, 2014

I built https://github.com/gwoo/goforever which has similar goals. I definitely like some of the ideas in runsit like automatic config watching. I still need to handle log rotation with something like https://github.com/natefinch/lumberjack

Thanks for putting this out there.

aktau · on June 21, 2014

I'm a big fan of Go and have a few Go binaries running on hundreds of clients (and a few servers) right now. They are being managed by Runit though. Since the names are so similar, did you get inspiration from Runit? And if so, what would be the main differences/advantages besides being portable to more platforms (Windows I presume)?

@chubot mentions the thread-per-process overhead. Runit does process-per-process so in that aspect Runsit should be a bit lighter. Then again, Runit is extremely tiny, its statically compiled binaries taking up next to nothing. The wait(-1) trick sounds good, but there must be a reason why for example Runit doesn't use it, since afaik Runit only runs on POSIX systems.

Keep up the good work!

EDIT: a web interface, that's pretty spiffy! (though I've made something similar work for Runit by querying the status of a service and serving up a dashboard, with a Go webserver of course).

kylered · on June 21, 2014

In case anyone is interested, we open sourced a process manager and web interface a few months ago.

https://github.com/VividCortex/pm https://github.com/VividCortex/pm-web

akerl_ · on June 21, 2014

Is there a site for this or some nature of docs?

I'm still searching for a process manager that I can love for use with Docker (I've played with runit and am currently playing with s6), but the total lack of readme or docs makes this link fairly unhelpful.

armenb · on June 21, 2014

Unfortunately there isn't much doc, but it's not that hard to set up. Take a look at run.sh file and config directory. Andrew Gerrand has a init script[1] in his fork for it which might be useful.

[1] https://github.com/nf/runsit/tree/master/doc/initd

stormbrew · on June 21, 2014

Huh. I have somehow never heard of s6 but it looks very promising. What did you find when using it for this purpose?

akerl_ · on June 21, 2014

My initial thoughts were less than positive, just because building it is a less-than-stellar process. Part of that is because it has a couple of deps that must also be compiled (skalibs and execline), and part is because it follows the slashpackage conventions (http://cr.yp.to/slashpackage.html).

Now that I've got it build, the next hurdle is making init scripts in execline. It's a fairly simple language, and I enjoy the premise behind it, where scripts should be clear and deterministic.

Overall, s6 provides a lot more helper tools for daemon management than runit did, so it looks like it's gonna be great for my use case. I've got an automated build set up to handle making a Docker image with s6 prepared:

https://registry.hub.docker.com/u/dock0/service/

skarnet · on June 25, 2014

Note that: - You do not have to follow the slashpackage convention to use or build s6. You can configure it out. - You do not have to write your init scripts in execline. Any scripting language, including the shell, will do; I use execline because it allows me to do things /bin/sh does not, or not easily. - There is a mailing-list, supervision@list.skarnet.org, dedicated to software like s6 and runit. If you have trouble building, installing or using such software, please subscribe and ask for help. I'm absolutely willing to make s6 easier to use, but I need the feedback.

Thanks for giving s6 a chance! You won't be disappointed - and if you are, make sure to let me know why.

srean · on June 21, 2014

Seems you are knowledgeable about these matters, I am not. So could you put in perspective why would runsit be interesting. I have run into several names that claim to handle this problem, you have mentioned s6 and runit, then there is monit, launchd, supervisor... Are there some glaring lacks that needs to be filled. Do they all do the same thing or do they have specialized niches ?

I guess I am asking for a lot, no need to indulge, but if you do, much appreciated.

@akerl_19 Thanks. I have but two upvotes to give.

akerl_ · on June 21, 2014

Unfortunately, I can't really say whether or not runsit is an interesting entry in the group of process management tools. While I'm very willing to dive into the code if I run into issues or am looking to customize things, the lack of any documentation or overview has prevented me from trying out runsit yet. Partially, this is for the obvious reason: it's difficult to get started with a tool that doesn't document how to use it. Additionally, the lack of documentation is a warning sign that this project won't be maintained in the long term as a community project. Which isn't to say that projects that begin without a strong focus on documentation or community cannot change in the future, but in my experience the vast majority of these projects end up being "a thing the author wrote for themselves, which becomes abandoned when they no longer use it".

As far as the variety of other tools:

launchd is an interesting concept but hasn't picked up a ton of adoption by things other than OSX.

I've used supervisord for some other projects, but never as the main init system, and the python dependency has thus-far prevented me from doing so: I like my init process to be statically compiled so that the initial system startup has as few deps as possible.

I really enjoyed runit; I'm only trying out s6 now because it has a few more helper tools for controlling process startup and management, and appears to have better support for log handling. Runit was very powerful as an ultra-light-weight init system, and both runit and s6 have very thorough docs, with runit having a large catalog of community-provided initscript examples.

sleepydog · on June 21, 2014

I used s6 quite a bit in the past, it's got some pretty good improvements on daemontools. Back then I wrote some RPM specs to build s6 against musl: https://github.com/droyo/rpmbuild . I haven't tried to build them recently.