We have an init process that runs inside docker containers, and we took a bit of a different route, in listening on SIGCHLD and then doing non-blocking Wait4(0,...) until there are no more children to reap:
I think that is basically the same thing. I haven't used Wait4(0), but it looks like it is the same as Wait4(-1), as long as you don't change the process group ID of any of the children?
In any case, you are not calling Wait4(<specific PID>), which is what implies the thread per process.
https://gist.github.com/burke/1c105378ac0629b39485