Erlang: Making reliable distributed systems in the presence of software errors [pdf] (2003)

andrelaszlo · 2024-04-27T11:55:05 1714218905

While reading The Server Chose Violence (https://news.ycombinator.com/item?id=40178652) I kept drawing parallels to Erlang, and it's even mentioned in a footnote. Anyway, I thought I should share Joe Armstrong's paper on Erlang, it's a great read!

Fault isolation:

Biffle: "Hubris uses a small, application-independent kernel, and puts most of the code — drivers, application logic, network stack, etc. — in separately compiled isolated tasks. These tasks can communicate with each other using a cross-task messaging system (inter-process communication, or IPC)."

Armstrong (2.3 Philosophy): "We need to isolate all the code that runs in order to achieve a goal in such a way that we can detect if any errors occurred when trying to achieve a goal. Also, when we are trying to simultaneously achieve multiple goals we do not want a sodware error occurring in one part of the system to propagate to another part of the system. [...] Our applications are structured using large numbers of communicating parallel processes."

Error handling:

Armstrong (4.4 Let it crash): "The defensive code detracts from the pure case and confuses the reader—the diagnostic is often no better than the diagnostic which the compiler supplies automatically"

Biffle: "Early in the system’s design, I decided not to permit recoverable/resumable faults. That is, when a program takes a fault — whether it’s hardware or synthetic — the task is dead. It can run no further instructions. There is no way to “fix” the problem and resume the task. This was a conscious choice to avoid some subtle failure modes and simplify reasoning about the system."