Mixing Python with Elixir with Export (Erlport)

Last updated on May 17, 2020

Adding Python into your Elixir web application is a very tantalizing proposition. Elixir isn't that great at crunching numbers, but by leveraging the data science powers of Python, we can add in some machine learning magic into our applications.

There are two main ways to do this:

  1. Using Erlang's port protocol and Elixir's equivalent to interface with the external system
  2. Spin up an Erlang node that implements python (a la the Pyrlang project)

In this guide, we'll focus on the first method. As always, before embarking on a project, always check for prior art. In our case, we have the Erlport project, that attempts to provide a nice interface for spinning up Python and Ruby projects. However, if we were to check the github project source code, we can see that this project is on life-support and has not had many pull requests merged in for quite a long time (at least 4 years). That doesn't totally discount this project, though. Checking out the fork network, we can see that there are many individuals still utilizing this project, with some even implementing other languages like Java.

Digging a little deeper, we also discover that some community members have taken it upon themselves to create a separate erlport organization (mentioned here) to merge in the pull requests.

Ok, we've got that settled. We can still use erlport as our solution to integrating Python. However, erlport is still not very Elixir-friendly. Doing a search on hex.pm reveals that there is the Export project, that provides a nice wrapper around the erlport project. Note that the installation of erlport is quite a hassle (as it requires the building of binaries using make), and the Export project nicely wraps the dependency without us having to manually build the project from source. It also includes a macro for syntactic sugar when calling Python code from Elixir.

I will be focusing on calling Python code synchronously. For long-running tasks, you are actually able to call code asynchronously, but that is definitely more advanced and involves message passing between the Python process and the Elixir process, hence we will leave that for another day.

Let's go!

(Optional) Create a Test Suite

We must obey the testing goat. Create a unit test file in your test directory like so to define our PyWorker GenServer implementation.

defmodule MyApp.PyWorkerTest do
  use ExUnit.Case, async: true
  alias MyApp.PyWorker

  test "starts up a python process" do
    assert {:ok, %{py: pid}} = PyWorker.init(%{})
    assert pid
  end
  test "duplicate/1 performs duplication of text" do 
    # Python code always returns charlists instead of strings
    assert 'texttext' = PyWorker.duplicate("text")
  end
end

This defines two tests. First, we test that a python process is started up correctly when the PyWorker is initialized. Secondly, we test that the duplicate/1 function correctly duplicates the test.

Running our tests now mix test should cause both tests to fail.

Create a GenServer to Wrap the Python Process

The way erlport (and consequently Export) interfaces with the Python interpreter is through a Python process. This Python process is started with Export.Python.start/1 and Export.Python.start_link/1, where an Elixir process identifier is returned. This PID needs to be stored somewhere, hence we will utilize a GenServer to store this state.

Create the following file in lib/py_worker.ex:

# lib/py_worker.ex
defmodule MyApp.PyWorker do
  use GenServer
  use Export.Python
  # optional, omit if adding this to a supervision tree
  def start_link(_) do
    GenServer.start_link(__MODULE__, %{}, name: __MODULE__)
  end

  def duplicate(text) do
    GenServer.call(__MODULE__, {:duplicate, text})
  end

  # server
  def init(state) do
    priv_path = Path.join(:code.priv_dir(:my_app), "python")
    {:ok, py} = Python.start_link(python_path: priv_path)
    {:ok, Map.put(state, :py, py)}
  end

  def handle_call({:duplicate, text}, _from, %{py: py} = state) do
    raw = Python.call(py, "my_module", "duplicate", [text])
    {:reply, result, state}
  end

  def terminate(_reason, %{py: py} = state) do
    Python.stop(py)
    :ok
  end
end

Wow that's a lot of code. Let's go through this line by line:

  • We will use the Export.Python macro. This will alias and require the Export.Python module.
  • The PyWorker.start_link/1 function will be called when we add this GenServer to a supervision tree. We also register the name of the process as the module name, so that we can reference it in the future without constantly passing process IDs. This assumes that you are only having one GenServer process in your supervision tree. In the unlikely event that you do not require this process to be supervised, you can omit this callback.
  • We define a duplicate/1 function that makes a synchronous call to the process, that will be handled by the handle_call/3 callback defined further below.
  • The init callback initializes the Python process and stores its proces id under the :py key in the GenServer's state. Note that we have to use :code.priv_dir/1 in order to retrieve the private directory folder in our release. We cannot simply use Path.expand("priv/python"), as this would not be available in our application's release distribution. The :python_path option is the module search path that python will search for. We will create this priv/python directory later on.
    • Also note that we use Export.Python.start_link/1, which will link the created Python process to this current PyWorker process. This is desirable, as it will ensure that if either process crashes and dies, the PyWorker process will be restarted. As the saying goes, let it crash.
  • Implement the handle_call/3 callback, which is called by the duplicate/1 function declared above. We pattern match on the state to obtain the Python process ID, and then take use it to call the duplicate function in the my_module python module. The list is passed to this function call, with the text variable being placed as the first positional argument.
  • We implement the terminate/2 callback, to gracefully stop the python process (so that we don't have orphaned processes running around our machine).

Implement the Python Interface Module

We will create an interface module that will declare or import the relevant python functions that we want. As you may have inferred from above, you can have multiple namespace modules for different Python contexts. That is up to you to decide, and there are no hard rules for this.

Create the following file in your priv directory:

# priv/python/my_module.py
def duplicate(text):
    return text + text

We define a simple function to duplicate our text through string concatenation. Of course, you can do more complex stuff like text cleaning, etc. I suggest using a simple function for wiring things up first.

Checking Our Tests

Now, since our python code is implemented, let's try running the tests again with mix test. You should have two passing tests now!

Installing Python Dependencies

In production use, you will likely utilize some python dependencies and you'll need to include them in your distribution.

Simply ensure that you have your Python virtual environment activated before running your test suites or production code. When deploying your distribution, ensure that you have your dependencies installed through requirements.txt beforehand.

If you need to customize the python executable path, use the :python option for Export.Python.

Wrapping Up

You should now have working python code inside your Elixir application!

Some things to note:

Results returned by the Python process are charlists. This includes map keys. For example:

iex> duplicate_return_as_map("text")
%{'result' => 'texttext'}

I recommend manually converting map keys into atom keys using a for/into/do comprehension, to standardize your interfaces.

You can send async messages between each process. This can be done through the erlport python library, which implements the message passing functionality for the Python process. This is ideal for long-running work, such as running data through a cleaning pipeline, or for performing model training. You can listen for messages from the Python process by implementing the handle_info/2 callback.

For large beam clusters, opt for Pyrlang. As mentioned above, Pyrlang is an erlang node that executes python. This project is currently being actively developed, unlike erlport which is on life-support. However, depending on your application architecture, you may not need or want to manage mutliple nodes. In that case, opt for the more light-weight erlport (Export). The project may very well pick up steam again as more developers add different languages.

That's all for now, hope this helps!