> This is not written anywhere explicitly in the docs.
It absolutely is. I'll use the Elixir docs as my source:
> A non-temporary child process may later be restarted by the supervisor.
And, further up in the docs when talking about the circumstances under which a supervisor will restart a child that has terminated: [0]
Restart values (:restart)
The :restart option controls what the supervisor should consider to be a
successful termination or not. If the termination is successful, the
supervisor won't restart the child. If the child process crashed, the
supervisor will start a new one.
The following restart values are supported in the :restart option:
:permanent - the child process is always restarted.
:temporary - the child process is never restarted, regardless of the
supervision strategy: any termination (even abnormal) is considered
successful.
:transient - the child process is restarted only if it terminates
abnormally, i.e., with an exit reason other than :normal, :shutdown, or
{:shutdown, term}.
For a more complete understanding of the exit reasons and their impact, see
the "Exit reasons and restarts" section.
And the "Exit reasons and restarts" section says: [1]
> A supervisor restarts a child process depending on its :restart configuration. For example, when :restart is set to :transient, the supervisor does not restart the child in case it exits with reason :normal, :shutdown or {:shutdown, term}.
You go on to say:
> But here I have synchronous function [to affect the state of a supervisor] with no indication or warnings that it’s a message.
Before I get into that, I have two questions for you:
1) How do you affect an Erlang or Elixir process without sending it a message? The docs for Processes [2] don't indicate any other way.
2) Have you never seen or written a function that does not return until it receives the response to an async operation?
Continuing on... from the top of the Supervisor docs, we see:
> A supervisor is a process which supervises other processes, which we refer to as child processes.
"A supervisor is a process...", straight off the bat. That's super clear and explicit, but I'll keep walking through the docs to show you how else this information is communicated to the reader.
If we read on, we see that the first argument to the 'stop_child/2' and 'delete_child/2' functions is of type 'supervisor()', which is defined as '@type supervisor() :: pid() | name() | {atom(), node()}'. What are these? Well, check the docs for how you start a Supervisor. [3] They say three interesting things:
1) The second argument to 'start_link/2' is of type 'option()', which is defined as '{:name, name()}', and 'name()' is defined as 'atom() | {:global, term()} | {:via, module(), term()}' . Keep those types in mind.
2) "If the supervisor and all child processes are successfully spawned (if the start function of each child process returns {:ok, child}, {:ok, child, info}, or :ignore), this function returns {:ok, pid}, where pid is the PID of the supervisor. If the supervisor is given a name and a process with the specified name already exists, the function returns {:error, {:already_started, pid}}, where pid is the PID of that process."
Notice how often it talks about "spawning" the supervisor and returning a PID, and saying that that PID is the PID of the supervisor you just created, or of a named supervisor that already exists.
3) "The options can also be used to register a supervisor name. The supported values are described under the "Name registration" section in the GenServer module docs."
Let's look at the "Name registration" section. [4] I'm not going to quote the whole thing because it'd be a nightmare to reformat sensibly, but the two key sections are
> Both start_link/3 and start/3 support the GenServer to register a name on start via the :name option. Registered names are also automatically cleaned up on termination. The supported values are: an atom ... {:global, term} ... {:via, module, term}...
and the last four items in the bulleted list in the section beginning with
> Once the server is started, the remaining functions in this module (call/3, cast/2, and friends) will also accept an atom, or any {:global, ...} or {:via, ...} tuples. In general, the following formats are supported:
Notice how those bullets match up to the 'name()' type that is passed in to supervisor:start_link/2, and connect that information with the fact that the docs for that function direct you here to learn about how you can register a name for your supervisor. Combine that information with the fact that the first argument to the "Tell the supervisor to do something" functions is of type 'supervisor()' and the fact that 'start_link' returns a PID, and it's really, really clear that a supervisor is another process that you can (optionally) name and refer to by name, rather than PID.
Once we understand that a supervisor is a process, and that the functions to instruct a supervisor to do things require the information required to contact a process, what other conclusion can we draw than "Communications with a supervisor is async, because communications with all processes are async."?
def start_new(name, config) do
# Logging set up
Supervisor.start_child(
name,
{ HandlerModule, config }
)
end
def replace_supervisor(name, config) do
Supervisor.terminate_child(name, HandlerModule) # Success
Supervisor.delete_child(name, HandlerModule) # Failure
start_new(name, config)
end
That is exact code. Success and failure were logged. Also (from Erlang's documentation)
> one_for_one - If one child process terminates and is to be restarted, only that child process is affected. This is the default restart strategy.
In terminate child you can read that (once again Erlang).
> If the supervisor is not simple_one_for_one, Id must be the child specification identifier. The process, if any, is terminated and, [[unless it is a temporary child, the child specification is kept by the supervisor]]. The child process can later be restarted by the supervisor.
Sorry, what happened after or during the call to delete_child/2 that caused you to consider it to have failed?
> So yeah, Elixir documentation is wrong.
I don't see what's wrong about the Elixir documentation. Walk me through it, please? Do remember that the default restart strategy for a supervisor is 'permanent', and that 'one_for_one' only ensures that the supervisor-initiated restart of one supervised child doesn't cause the supervisor to restart any other supervised children.
After tracing the code this is exactly what happened (in this code exactly):
1. Terminate child X
2. /Supervisor restarts X/
3. Delete child X {:error, :running}
4. Supervisor.start_child Y {:ok, PID}
5. /X and Y are both running/
As for incorrectness:
> the supervisor does not restart the child in case it exits with reason :normal, :shutdown or {:shutdown, term}.
`terminate_child` is sending shutdown and yet it's being restarted.
And to emphasise on use case. The child is connection handler. Service node changed. It NEEDS to be restarted on crash, but has to be replaced during handoff.
I believe you start to get into "huh?" mode with me. I have a treasure trove of those. (Btw., in Erlang repository there's plenty of notes mentioning THIS exact behavior and if I didn't overskim - even some bugs caused by it - you can search for terminate_child.
> It NEEDS to be restarted on crash, but has to be replaced during handoff.
I question why you're handing off things between supervisors. If this is something you actually need to do, then 'delete_child/2' so the supervisor doesn't restart the child, terminate the child yourself, and re-start the child on the new supervisor.
EDIT: Actually, no, you can't 'delete_child/2'. You need to change the supervisor type from 'permanent', to the type that does exactly what you say you need. I'll leave it to you to read the docs. /EDIT
> `terminate_child` is sending shutdown and yet it's being restarted.
Here's the context for that partial quote that you pulled from [0]:
> A supervisor restarts a child process depending on its :restart configuration. For example, when :restart is set to :transient, the supervisor does not restart the child in case it exits with reason :normal, :shutdown or {:shutdown, term}.
Re-read that first sentence that you chose to not quote. Then read about the ':restart' supervisor configuration and how it describes when a supervised child is and is not restarted. [1]
> I believe you start to get into "huh?" mode with me.
Yep. Selective quoting when it's trivial for your conversation partner to find the lies by omission definitely put me into "huh?" mode with you.
It absolutely is. I'll use the Elixir docs as my source:
> A non-temporary child process may later be restarted by the supervisor.
And, further up in the docs when talking about the circumstances under which a supervisor will restart a child that has terminated: [0]
And the "Exit reasons and restarts" section says: [1]> A supervisor restarts a child process depending on its :restart configuration. For example, when :restart is set to :transient, the supervisor does not restart the child in case it exits with reason :normal, :shutdown or {:shutdown, term}.
You go on to say:
> But here I have synchronous function [to affect the state of a supervisor] with no indication or warnings that it’s a message.
Before I get into that, I have two questions for you:
1) How do you affect an Erlang or Elixir process without sending it a message? The docs for Processes [2] don't indicate any other way.
2) Have you never seen or written a function that does not return until it receives the response to an async operation?
Continuing on... from the top of the Supervisor docs, we see:
> A supervisor is a process which supervises other processes, which we refer to as child processes.
"A supervisor is a process...", straight off the bat. That's super clear and explicit, but I'll keep walking through the docs to show you how else this information is communicated to the reader.
If we read on, we see that the first argument to the 'stop_child/2' and 'delete_child/2' functions is of type 'supervisor()', which is defined as '@type supervisor() :: pid() | name() | {atom(), node()}'. What are these? Well, check the docs for how you start a Supervisor. [3] They say three interesting things:
1) The second argument to 'start_link/2' is of type 'option()', which is defined as '{:name, name()}', and 'name()' is defined as 'atom() | {:global, term()} | {:via, module(), term()}' . Keep those types in mind.
2) "If the supervisor and all child processes are successfully spawned (if the start function of each child process returns {:ok, child}, {:ok, child, info}, or :ignore), this function returns {:ok, pid}, where pid is the PID of the supervisor. If the supervisor is given a name and a process with the specified name already exists, the function returns {:error, {:already_started, pid}}, where pid is the PID of that process."
Notice how often it talks about "spawning" the supervisor and returning a PID, and saying that that PID is the PID of the supervisor you just created, or of a named supervisor that already exists.
3) "The options can also be used to register a supervisor name. The supported values are described under the "Name registration" section in the GenServer module docs."
Let's look at the "Name registration" section. [4] I'm not going to quote the whole thing because it'd be a nightmare to reformat sensibly, but the two key sections are
> Both start_link/3 and start/3 support the GenServer to register a name on start via the :name option. Registered names are also automatically cleaned up on termination. The supported values are: an atom ... {:global, term} ... {:via, module, term}...
and the last four items in the bulleted list in the section beginning with
> Once the server is started, the remaining functions in this module (call/3, cast/2, and friends) will also accept an atom, or any {:global, ...} or {:via, ...} tuples. In general, the following formats are supported:
Notice how those bullets match up to the 'name()' type that is passed in to supervisor:start_link/2, and connect that information with the fact that the docs for that function direct you here to learn about how you can register a name for your supervisor. Combine that information with the fact that the first argument to the "Tell the supervisor to do something" functions is of type 'supervisor()' and the fact that 'start_link' returns a PID, and it's really, really clear that a supervisor is another process that you can (optionally) name and refer to by name, rather than PID.
Once we understand that a supervisor is a process, and that the functions to instruct a supervisor to do things require the information required to contact a process, what other conclusion can we draw than "Communications with a supervisor is async, because communications with all processes are async."?
[0] <https://hexdocs.pm/elixir/1.18.3/Supervisor.html#module-rest...>
[1] <https://hexdocs.pm/elixir/1.18.3/Supervisor.html#module-exit...>
[2] <https://hexdocs.pm/elixir/1.18.3/processes.html>
[3] <https://hexdocs.pm/elixir/1.18.3/Supervisor.html#start_link/...>
[4] <https://hexdocs.pm/elixir/1.18.3/GenServer.html#module-name-...>