16 posts tagged with "elixir"

Erlang Global Name Server Bootloop and prevent_overlapping_partitions

December 15, 2025 · 3 min read

👋 I'm a dev at Supabase

I work on logging and analytics, and manage the underlying service that Supabase Logs and Logflare. The service does over 7 billion requests each day with traffic constantly growing, and these devlog posts talk a bit about my day-to-day open source dev work.

It serves as some insight into what one can expect when working on high availability software, with real code snippets and PRs too. Enjoy!😊

When working with distributed Erlang applications at Logflare/Supabase, we encountered an interesting issue where the :global_name_server would become overwhelmed with messages, leading to a boot loop situation. This issue is particularly relevant when dealing with the prevent_overlapping_partitions feature.

Understanding the Boot Loop

The boot loop behaviour comes about when the global name server becomes overwhelmed with messages in scenarios involving network partitions, where many nodes are connecting or disconnecting simultaneously. This can create a cascade effect where:

The global name server receives too many messages
Message processing delays lead to timeouts
Node reconnection attempts trigger more messages
GOTO 1

This behaviour is closely related to OTP issue #9117, and within the issue, I highlighted several main potential factors that could be causing the issue depsite the throw fix that Rickard Green had implemented.

We also observed that this behaviour occurs even when not using :global at all. For Logflare, we had migrated our distributed name registration workloads to use the wonderful :syn library. Hence, this bug is more related to the core syncing protocol of :global.

The `throw` in `restart_connect()`

When the :global server attempts to connect to a new node, it will perform a lock to sync the registered names between each node. In the syncing protocol, the :global server will perform a check to verify that the node is not already attempting to perform a sync (indicated by the pending state) within the server. If it is already attempting a sync, it will instead cancel the connection attempt and retry the connection.

Without a throw, it will result in a deadlock situation, where the :global server will wait forever for the node to complete the sync.

`prevent_overlapping_partitions` to the rescue

As documented in :global:

As of OTP 25, global will by default prevent overlapping partitions due to network issues by actively disconnecting from nodes that reports that they have lost connections to other nodes. This will cause fully connected partitions to form instead of leaving the network in a state with overlapping partitions.

This means that :global by default will actively disconnect from nodes that report that they have lost connections to other nodes. For small clusters, this is generally a good feature to have so that the cluster can quickly recover from network issues. However, for large clusters, this can cause a lot of unnecessary disconnections and can lead to the above boot loop issue.

As of time of writing, disabling the prevent_overlapping_partitions feature has allowed our cluster to overcome this boot loop issue by preventing a flood of disconnection messages across clusters. However, this flag needs to be used with caution when using the :global server for name registration, as it may result in inconsistencies if there are overlapping partitions and mutliple instances of the same name are registered. Application code needs to be able to handle this case.

Monitoring strategies

When dealing with large clusters, I would recommend implementing monitoring for:

global name server message queue length -- the main indicator of the issue
memory usage of the global name server -- a secondary indicator of long message queues

Tracing the :global server callbacks at runtime is also a good way to debug the issue, though it is usually not easy as the time window before the node goes out-of-memory is usually very short.

I explain this in more detail in my post on understanding Erlang's :global prevent_overlapping_partitions Option.

Elixir Syn: Custom Event Handler Callbacks for Process Conflict Resolution

December 15, 2025 · 5 min read

👋 I'm a dev at Supabase

It serves as some insight into what one can expect when working on high availability software, with real code snippets and PRs too. Enjoy!😊

The :syn library provides a distributed process registry for Elixir applications, offering an alternative to :global for name registration across clusters. It allows you to define custom event handler callbacks to handle process conflicts and registration scenarios.

The out-of-the-box features will largely suit majority of use cases, but there are a few important behaviours to consider:

:syn will always default to keeping the most recently registered process. This may result in older state being lost due to the conflict resolution.
:syn by defualt has millisecond precision when comparing process recency. In clustered setups with high number of nodes, this may result in conflicts being resolved incorrectly without a deterministic resolution strategy.
The moment a custom event handler callback is implemented, it will override the default behaviour of :syn and all process conflicts MUST be resolved and handled within the callback. :syn will not perfom any cleanup of processes post-callback, hence it is very important to terminate all unwanted processes within the callback to prevent memory leaks or other unexpected behaviour.

Understanding Syn Event Handlers

When multiple processes attempt to register with the same name across a distributed cluster, :syn provides custom event handlers to resolve these conflicts. These handlers are useful for process migration between nodes, network partition recovery, supervisor restart scenarios, and cases where high-precision timestamp-based conflict resolution is needed.

Let's explore a few scenarios where custom event handlers can be useful.

Killing Processes and Supervisors

In scenarios where you want to ensure only one process exists for a given name, you might want to terminate conflicting processes or their supervisors.

defmodule MyApp.SynEventHandler do
  @behaviour :syn_event_handler

  def on_process_registered(scope, name, pid, meta) do
    # Process successfully registered
    :ok
  end

  def on_process_unregistered(scope, name, pid, meta, reason) do
    # Process unregistered
    :ok
  end

  def on_registry_conflict(scope, name, {pid1, meta1}, {pid2, meta2}) do
    # Kill the newer process and its supervisor
    case compare_registration_priority(meta1, meta2) do
      :keep_first ->
        terminate_process_and_supervisor(pid2)
        {pid1, meta1}

      :keep_second ->
        terminate_process_and_supervisor(pid1)
        {pid2, meta2}
    end
  end

  defp terminate_process_and_supervisor(pid) do
    # Find and terminate the supervisor
    case find_supervisor(pid) do
      {:ok, supervisor_pid} ->
        Supervisor.terminate_child(supervisor_pid, pid)
      :error ->
        try_to_stop_process(pid)
    end
  end

  @doc """
  Tries to stop a process gracefully. If it fails, it sends a signal to the process.
  """
  @spec try_to_stop_process(pid(), atom(), atom()) :: :ok | :noop
  defp try_to_stop_process(pid, signal \\ :shutdown, force_signal \\ :kill) do
    GenServer.stop(pid, signal, 5_000)
    :ok
  rescue
    _ ->
      Process.exit(pid, force_signal)
      :ok
  catch
    :exit, _ ->
      :noop
  end

  defp find_supervisor(pid) do
    # Implementation to find the supervisor of a given process
    # This could involve walking the supervision tree
  end

  defp compare_registration_priority(meta1, meta2) do
    # Custom logic to determine which process should be kept
    # Could be based on node priority, timestamps, etc.
  end
end

Keeping the Original Process

Sometimes you want to preserve the original process and reject new registration attempts:

defmodule MyApp.KeepOriginalHandler do
  @behaviour :syn_event_handler

  def on_registry_conflict(scope, name, {pid1, _meta1, timestamp1}, {pid2, _meta2, timestamp2}) do
    # Always keep the first registered process
    # this is in millisecond precision
    if timestamp1 < timestamp2 do
      Logger.info("Keeping original process #{inspect(pid1)} for #{name}")
      pid1
    else
      Logger.info("Keeping original process #{inspect(pid2)} for #{name}")
      pid2
    end
  end
end

However, what if we somehow have a situation where the timestamps are exactly the same (no matter how unlikely it is)? We can use nanosecond timestamps stored in process metadata to resolve the conflict with higher precision.

Nanosecond Timestamp Resolution

First, register processes with nanosecond timestamp metadata:

defmodule MyApp.MyProcess do
  @doc """
  Registers a process with nanosecond timestamp metadata for high-precision conflict resolution.
  """
  def start_link(some_args) do
    nanosecond_timestamp = System.os_time(:nanosecond)
    GenServer.start_link(__MODULE__, some_arg, name: {:via, :syn, {:my_scope, __MODULE__, %{timestamp: nanosecond_timestamp}}})
  end
end

Then implement the event handler with fallback to syn's built-in millisecond timestamp when metadata isn't available:

defmodule MyApp.SynEventHandler do
  @moduledoc """
  Event handler for syn. Always keeps the oldest process.
  """
  @behaviour :syn_event_handler

  require Logger

  @impl true
  def resolve_registry_conflict(scope, name, pid_meta1, pid_meta2) do
    {original, to_stop} = keep_original(pid_meta1, pid_meta2)

    # Only stop process if we're the local node responsible for it
    if node() == node(to_stop) do
      {pid1, _meta1, _} = pid_meta1
      {pid2, _meta2, _} = pid_meta2

      try_to_stop_process(to_stop, :shutdown, :kill)
    end

    original
  end

  # Use nanosecond-precision timestamp from metadata when available
  defp keep_original(
         {pid1, %{timestamp: timestamp1}, _syn_timestamp1},
         {pid2, %{timestamp: timestamp2}, _syn_timestamp2}
       ) do
    if timestamp1 < timestamp2, do: {pid1, pid2}, else: {pid2, pid1}
  end

  # Fallback to syn's built-in millisecond timestamp when metadata isn't present
  defp keep_original(
         {pid1, _meta1, syn_timestamp1},
         {pid2, _meta2, syn_timestamp2}
       ) do
    if syn_timestamp1 < syn_timestamp2, do: {pid1, pid2}, else: {pid2, pid1}
  end

  defp try_to_stop_process(pid, signal, force_signal) do
    GenServer.stop(pid, signal, 5_000)
  rescue
    _ -> Process.exit(pid, force_signal)
  catch
    :exit, _ -> :noop
  end
end

Configuration and Usage of a Custom Event Handler

Configure your syn event handler in your application:

# In your application.ex or config
def start(_type, _args) do
  children = [
    # Other children...
    {:syn, [
      event_handler: MyApp.SynEventHandler,
      # other syn options
    ]}
  ]

  Supervisor.start_link(children, strategy: :one_for_one)
end

# Register with timestamp metadata
:syn.register(:my_scope, "unique_name", self(), %{
  registered_at: System.monotonic_time(),
  nano_timestamp: :erlang.monotonic_time(:nanosecond),
  node: Node.self(),
  priority: 1
})

Best Practices

Always include timestamps in metadata for conflict resolution
Handle supervisor relationships carefully when terminating processes
Use monotonic time for reliable ordering across nodes
Log conflict resolutions for debugging and monitoring
Test partition scenarios thoroughly

Monitoring and Observability

Monitor syn registry conflicts and resolutions:

# Add telemetry events in your event handler
def on_registry_conflict(scope, name, proc1, proc2) do
  :telemetry.execute(
    [:syn, :conflict, :resolved],
    %{count: 1},
    %{scope: scope, name: name}
  )

  # ... conflict resolution logic
end

The :syn library's event handler system enables you to manage distributed process registration conflicts, resulting in robust and predictable behavior in complex distributed systems.

Elixir Syn: Partitioning Scopes with phash2 to Reduce Message Queue Buildup

March 27, 2025 · 3 min read

👋 I'm a dev at Supabase

It serves as some insight into what one can expect when working on high availability software, with real code snippets and PRs too. Enjoy!😊

When working with large distributed Elixir clusters at Logflare/Supabase, we encountered situations where the syn_gen_scope gen_server process would become overwhelmed with messages during cross-cluster synchronization. This happens particularly when thousands of processes register under a single scope, causing the message queue to grow significantly and impacting cluster discovery and synchronization performance.

Under the hood, each :syn scope runs as a single gen_server process (see syn_gen_scope.erl). This process handles node discovery, state synchronization when nodes join or rejoin, and broadcasting updates across the cluster. All of these operations funnel through one process per scope. When thousands of processes register under a single scope, this gen_server has to handle every discovery request, every sync acknowledgment, and every broadcast. Its message queue grows, and cluster synchronization slows down.

One way to deal with this is to partition scopes by creating multiple scope processes and using phash2 to consistently hash a term (such a resource identifier) to determine which partition scope to use. This splits the synchronization load across multiple syn_gen_scope processes and helps to increase processing throughput by allowing all cores to process the messages, reducing the message queue length on any single scope process and improving overall :syn stability.

The phash2 function provides a consistent hash that will always map the same term to the same partition, ensuring that registration and lookup operations use the same scope across all nodes in the cluster. This consistency is critical for distributed systems where processes on different nodes need to agree on which scope contains a particular registration.

Here's how we configure partitioned scopes based on the number of schedulers available on the system:

# runtime.exs
# explicitly set the atom scopes during application startup
syn_my_scope_partitions =
  for n <- 0..System.schedulers_online(), do: "my_scope_#{n}" |> String.to_atom()

config :syn,
  scopes: [:other_scopes] ++ syn_my_scope_partitions

This creates scopes like :my_scope_0, :my_scope_1, up to :my_scope_N where N matches the number of schedulers. Matching the partition count to schedulers helps ensure good distribution across available CPU cores, and splits up messages across multiple syn_gen_scope processes.

To use these partitioned scopes, we use :via tuples with GenServer.start_link/3. The format {:via, :syn, {scope, name, meta}} lets Syn handle registration automatically: the process registers on start and unregisters on termination.

defmodule Logflare.Endpoint do
  use GenServer

  @partition_count System.schedulers_online() + 1

  def start_link(identifier) do
    GenServer.start_link(__MODULE__, identifier, name: via(identifier))
  end

  def get_info(identifier) do
    GenServer.call(via(identifier), :get_info)
  end

  defp via(identifier) do
    scope = :"my_scope_#{:erlang.phash2(identifier, @partition_count)}"
    {:via, :syn, {scope, identifier}}
  end

  @impl true
  def init(identifier), do: {:ok, %{identifier: identifier}}

  @impl true
  def handle_call(:get_info, _from, state), do: {:reply, state, state}
end

Since phash2 is deterministic, lookups from any node in the cluster resolve to the correct partition scope.

The benefits of this partitioning approach are significant. By distributing registrations across multiple scope processes, each syn_gen_scope gen_server handles a fraction of the total synchronization load. This reduces message queue buildup and improves the responsiveness of cluster synchronization operations, especially when dealing with large numbers of processes in the process registry.

Conclusion

Partitioning Syn scopes with phash2 provides a straightforward way to scale distributed process registration across large clusters, preventing message queue buildup and ensuring that cluster synchronization remains responsive even as the number of registered processes grows into the thousands or millions.

Supabase Devlog - Performance Optimizations with Garbage Collection Tweaking and Cachex Upgrades

November 12, 2024 · 3 min read

Ziinc

All-round Awesome Dude

👋 I'm a dev at Supabase

It serves as some insight into what one can expect when working on high availability software, with real code snippets and PRs too. Enjoy!😊

This week, I'm hunting down a few things that are plaguing Logflare, namely:

High memory usage over time
Sporadic memory spikes and system slowdowns

For the first one, the root causes were quite straightforward: High garbage collected

There were a few culprits:

RecentLogsServer - This module is tasked with updating a counter for total events ingested in the table, periodically updating it. However, due to the small change in state, there were very few minor GCs triggered, resulting in a major GC never getting triggered.
SearchQueryExecutor - This module is tasked with performing search queries as well as live tailing in the Logflare dashboard, and due to the amount of state that was kept and constantly updated, fullsweeps were not getting triggered, resulting in large buildups in garbage over time.

How the Erlang garbage collection works is really beyond the scope of this discussion, but a very detailed explanation is available in the official docs.

For the second issue, where the system would sporadically spike, the

This run quue spike would cause the VM to "lock up", causing a few downstream effects:

GenServer calls would start timing out and failing, as processes lock up and message queues build up, resulting in sudden spikes in memory.
Incoming requests would be served slowly, resulting in a large slowdown and high latency. Incoming request payloads will also consume memory.
Downstream API calls would get affected, as API calls would slow down, even non-ingest API calls.

However, run queue buildup is only just a symptom of the true problem, which required further diagnonsis and analysis.

Thankfully, we were able to narrow down the root cause of this run queue spike to the Cachex Courier.

The courier is responsible for handling much of the value retrieval of the main Cachex.fetch/4 function, and ensures deduplication of value retrival. However, it was possible that an error in the value retrieval would cause the process to lock up and stop responding to caller processes. This would then result in a flood of GenServer.call/3 failures, as calling processes would timeout. However, due to the throughput of request and data that Logflare handles (multiple billions of events a day), sudden large slowdowns in the ingestion pipeline would result in a snowball effect. This could be felt in more obvious downstream services, such as the Supabase dashboard, where certain heavily used API endpoints would fail sporadically.

It just so happened that this exact issue was patched in the latest Cachex v4.0.0 release, so upgrading to the latest version was sufficient.

The fix specifically involved adjusting the way that the value retrieval was performed such that it would spawn a linked process to perform the work instead of doing it within the process, while also ensuring that exits for the process were trapped. By trapping the exit, it could notify all callers that an error had occured and let the errors propagate upwards instead of blocking the caller until a timeout occurred.

The final Logflare adjustments can be found in these changes, which resulted in a 3.5x memory reduction from and a 5-7% CPU improvement at production workloads.

Impact on memory after tweaking

Impact on scheduler utilization

Adding spawn_opts to an Elixir/Erlang Supervisor

November 5, 2024 · 2 min read

Ziinc

All-round Awesome Dude

Quite surprisingly, Supervisors do not have an exposed option for taking a spawn_opt. spawn_opt are process options that are used to control the process behaviour when it comes to memory management, and can be incredibly useful when hunting down garbage build-up in processes.

The Backstory

This week in life at Supabase, we have some fun garbage collection optimization, and it mostly involves tweaking culprit process behaviours into clearing out their garbage in a timely manner.

Sometimes, garbage might build up as shown for a myriad of reasons, and we gotta take our massive major GC hammer to knock some sense into these processes that are stuck in a minor GC loop!

The Problem

So, Supervisors don't actually take a spawn_opt, so after digging around, the only real option was to use the :erlang.process_flag/3 function, which is wrapped by Process.flag/2.

We can achieve the the :fullsweep_after tweaking as so:

def init(_arg) do
  # trigger major GC after 5,000 minor GCs
  Process.flag(:fullsweep_after, 5_000)
  ...
end

One would think that it would be accepted by Supervisor.start_link/2, but it seems like it isn't at all, and I had to dig into the Elixir source code to find that out.

A Word on Task.Supervisor

Although the base Supervisor module doesn't accept the :spawn_opt option for its start_link/2 callback, the Task.Supervisor built-in module does accept it.

This can be see here where there is an explicit test case for this option passing.

Quite an interesting tidbit 😄

Understanding the Boot Loop​

The throw in restart_connect()​

prevent_overlapping_partitions to the rescue​

Monitoring strategies​

Understanding Syn Event Handlers​

Killing Processes and Supervisors​

Keeping the Original Process​

Nanosecond Timestamp Resolution​

Configuration and Usage of a Custom Event Handler​

Best Practices​

Monitoring and Observability​

Conclusion​

The Backstory​

The Problem​

A Word on Task.Supervisor​

Understanding the Boot Loop

The `throw` in `restart_connect()`

`prevent_overlapping_partitions` to the rescue

Monitoring strategies

Understanding Syn Event Handlers

Killing Processes and Supervisors

Keeping the Original Process

Nanosecond Timestamp Resolution

Configuration and Usage of a Custom Event Handler

Best Practices

Monitoring and Observability

Conclusion

The Backstory

The Problem

A Word on Task.Supervisor