Why Erlang/Elixir? A Fanboy's Argument for Scalable, Distributed, Concurrent Programming While Staying Sane

April 23, 2021 · 11 min read

I think that it isn't a well kept secret that I'm a fanboy of Elixir. Almost half of my blog posts (at point of writing) are related to Elixir and its ecosystem. But, how exactly did I get to this point? And why? In this post, let's deep dive into what makes Erlang/Elixir so special and why the latest innovations out of this ecosystem keeps me coming back for more.

The What

What is Erlang?

Erlang is functional programming language that made its debut in 1986 in the telecoms industry, due to the need for distributed, fault-tolerant, and highly available telecoms systems at Ericsson. As a result of this initial use case, the base suite of libraries that make up the core of Erlang was named the Open Telecom Platform (OTP), and the name persists till this date, even though Erlang and sibling languages are used beyond the telecoms arena.

The BEAM

When we run Erlang code, we need to run it on the BEAM, a virtual machine that compiles Erlang code into bytecode and then executes it. At runtime, the interpreter maps each instruction to executable C-code, making instruction dispatching fast.

Functional Programming

Erlang code is functional, meaning that there is no shared state, mutability, or side effects. Instead of creating classes to encapsulate our logic and state, we compose pure functions to do the exact same.

The design choice to prevent shared state is to allow for Erlang processes to fail without affecting other processes. No shared memory means that one failure cannot cause another.

As Joe Armstrong, one of the creators of Erlang states:

Object Oriented Programming, this is the art of hiding side effects. If you can do it successfully you can write this in program. If you do it unsuccessfully, you get in a sucking mess.

from 26 Years with Erlang, September 22, 2014, Joe Armstrong

It is safe to say that he definitely wasn't a fan of Object Oriented Programming. By preventing side-effects, we essentially isolate our state and the way we can update it to a new state

Stateful Processes

Given that Erlang is functional, how exactly do we hold state in our programs? Through processes. Processes holds a programs state to memory, and can only be updated by itself. The immutability feature of Erlang means that each time we update the state, the entire state is copied, meaning that it is always predictable.

Side note: Processes is the way to encapsulate the updating of state in Erlang, but the parallel form in Haskell is called monads.

Concurrent Programming and the Actor Model

Each lightweight process holds its own state, running its own code instructions. This intuitively means that we are able to write concurrent programs, with processes working in isolation of each other.

However, this gives rise to the problem of how these isolated processes will communicate. Obviously, sharing some sort of state between the two processes is not feasible and would go against the idea of state immutability and no side effects. The answer to this problem is the Actor Model, which states that "everything is an actor". In this case, each processs is considered an actor entity.

When actors receive a message, it can concurrently do either of the following:

send messages to other actors
create more actors
designate message handling behaviour

This means that in order to communicate between different processes, we send messages between them. Processes then receive these messages in an inbox, and will handle them before sending more messages, either to itself or to others.

The Why

Now that we know the mechanics of Erlang and what it offers, this section deals with why exactly we will reach for Erlang for building applications.

Fault Tolerance

The cornerstones of fault tolerant programming are to isolate errors to make sure if one thing crash it does not crash anything else. That's what the processes in operating system do.

from Faults, Scaling and Erlang concurrency, September 24, 2014, Joe Armstrong

When writing our program, we want to ensure that if something fails, it does not bring down the entire system. Not only that, but we want it to be able to gracefully recover from unexpected failures automatically, by recovering to a known working state.

However, in order to properly recover from a failure, there must be something that watches out for that failure, and can then perform error recovery for that crash. In Erlang, any process that crashes from unhandled errors will notify any monitoring processes, allowing us to restart the crashed processes from a known working state if needed. This way, our applications can run forever due to the way we can recover from processes that crash.

This error recovery mechanism gave birth to Erlang's "let it crash" philosophy, where we would rather allow the process to crash than to program defensively. Note that whether processes crash does not actually affect the stability of the BEAM. Processes that crash will not affect the underlying Erlang virtual machine. The BEAM will simply trudge along and continue running and scheduling all remaining processes for computation.

Scalability

Process Scaling with a Single Node

At a micro level, we are able to scale computations easily by spawning multiple processes to concurrently run code instructions. As there is no limit to how many processes that can be spawned, we can shift our mindset from a sequential mindset (where incoming requests/messages are handled sequentially, one after the other), to a concurrent mindset (where a new process is spawned for each new incoming request).

This means that to handle an increase in volume, all we need to do is to create more processes to match that volume. The Erlang runtime will handle the tricky part of scheduling tasks for computation in the most efficient manner possible.

Node Scaling with Distributed Erlang

Of course, a single machine is usually insufficient for humungous workloads, and we will often need to horizontally scale in order to effectively tackle monumental volumes of requests. In such a case, the answer is still exactly the same, for each request received, we will spawn a new process to handle it. However, we will instead spawn this process on a separate machine that currently does not have as much computational workload. This is achieved through distributed Erlang, where each Erlang Run-Time System is capable of communicating with other nodes. As each nodes and each process are given unique identifiers, we can send spawn and send messages between nodes seamlessly, distributing the workload across multiple machines.

As alluded to in the section on fault tolerance, we can use this process monitoring capabilities to handle the situation where an Erlang node dies (whether through OS-level failure or otherwise), and reactively "move" them by spawning new processes on a different node. This functionality is provided through community libraries like swarm and horde. This improves on the fault tolerance of the Erlang cluster at a multi-node level instead of the multi-process level described above.

Other Cool Erlang Features

Besides the above selling points, there are also other features that are of note:

Hot code replacement, where old code is replaced with newer code at runtime with zero downtime. This contributes to the Erlang VM's high-availability capabilities.
Ports, which allows the Erlang VM to communicate with external applications and other programming languages.
Native Implemented Functions allows for performance optimizations in areas which Erlang is not optimized for. One great example is the community library rustler for using Rust to expose memory-safe functions. Discord recently used this library to scale to 11 million concurrent users

Relationship with Kubernetes

Kubernetes is a technology that allows for the provision and orchestration of containerized applications.

Although some may think that using an Erlang cluster is mutually exclusive with a Kubernetes cluster, that is far from the truth. This by itself is a large topic of exploration beyond the scope of this post, and has already been covered by the great Jose Valim himself on the Platformatec blog.

The key points are:

Kubernetes provides self-healing at the node level, while the Erlang VM provides self-healing at the application level.
You can use Kubernetes' deployment mechanisms instead of Erlang's Hot Code Swapping
We can leverage Kubernetes' registry to provide auto-clustering capabilities to our Erlang cluster, a feature provided by the community library libcluster.

The Elixir Renaissance

Okay, so we have established that Erlang, the Erlang VM, and distributed Erlang are all great pieces of tech, but we also need to remember that the language is extremely old. Like, almost 40 years old (38 years at time of writing). This means that much of the syntax, paradigms and language design has not really changed much over the years. Then, in 2011, the Elixir programming language was create to build upon the solid foundation that Erlang has provided, compiling to BEAM byte code and interoperating with Erlang as well. This means that Elixir is capable of using ALL of the Erlang ecosystem and features.

Not only that, but the language draws much syntax inspiration from Ruby, resulting in a much more readable and simple syntax as compared to Erlang.

Elixir provides many improvements, such as meta-programming capabilities through abstract syntax tree maniupulation (resulting in zero-cost code generation capabilities and macros), in-built tooling for tests, documentation, releases, formatting, debugging, and lots more.

The meta-programming capabilities has allowed for many libraries to implement their own domain-specific language within the language itself through the use of macros.

Not only that, but because the language builds upon the shoulders of giants, the Elixir language is actually considered feature complete as of version 1.9. This is a huge boon for backwards campatability, code maintenance, and library churn.

But the language and its awesome tooling isn't the only thing that is drawing people to Elixir...

IoT Clustering and Communication

The Nerves Project allows for the deployment of embedded software to Internet-of-Things devices, giving these devices all the benefits of Erlang clustering and inter-cluster communication.

Phoenix LiveView

Fresh out of the oven, the latest cutting edge feature of the Phoenix web framework for Elixir allows for server-controlled DOM manipulation. In essence, it provides HTML diffing to send only the minimum required DOM updates over websockets, reducing the need for front-end client side code. Client events are also sent over the wire, resulting in seamless navigation that simulates a Single Page Application.

Internally, each active request creates an Erlang process, which allows for us to hold and update state base on the user's interactions (much like how Redux does it on frontend-only clients). This hybrid approach allows for more advanced server abilities and vastly reduces the required client state-related JavaScript code.

Furthermore, the API is completely inter-operable with modern JavaScript libraries, which means that you don't have to throw away all of your React/Vue/frontend-framework-du-jour code, you can interact with the server completely through JavaScript as well.

This idea is not new, with many other competing libraries and frameworks, such as Hotwire by Basecamp, intercooler.js and successor htmx, and Laravel's LiveWire for PHP. However, Phoenix LiveView leverages and builds upon many of the existing Phoenix concepts and capabilities, which focuses on developer productivity and speed of shipping.

Numerical Elixir

This one is hot off the press, but Numerical Elixir provides multi-dimensional arrays (tensors) to Elixir, providing a gateway for future data-oriented libraries in Elixir and making deep learning within the Elixir ecosystem possible. Notable libraries are Torchx, a LibTorch client, and EXLA, a client for Google's XLA (Accelerated Linear Algebra).

How to Adopt Erlang/Elixir?

If this article has swayed you in any way or form, you should understand that you don't have to start from scratch, especially if you have lots of code in different languages. What Elixir excels at is acting as a glue between languages. This is accomplished through Ports, community libraries, such as:

ErlPort, which provides an interface for calling Python and Ruby code (and there are forks that have achieved the calling of Java code)
Pyrlang, which is a project that implements the Erlang protocol in Python and allows Python code to interact with an Erlang cluster.

Not only that, but the Elixir language guide is extremely clear and well written, which makes picking up the language a breeze.

I could write more and more and more about Elixir, but I'll save that for other blog posts. Otherwise, this article would transform into a 10k word monstrosity.

The What​

What is Erlang?​

The BEAM​

Functional Programming​

Stateful Processes​

Concurrent Programming and the Actor Model​

The Why​

Fault Tolerance​

Scalability​

Process Scaling with a Single Node​

Node Scaling with Distributed Erlang​

Other Cool Erlang Features​

Relationship with Kubernetes​

The Elixir Renaissance​

IoT Clustering and Communication​

Phoenix LiveView​

Numerical Elixir​

How to Adopt Erlang/Elixir?​