Backend Development 13 min read

Optimizing Performance, Stability, and Edge Cases of Elixir‑gRPC Services in Production

This article shares Tubi’s experience using Elixir‑gRPC in production, covering performance optimizations, stability measures, HTTP/2 edge‑case handling, and practical code examples for efficient Protobuf processing; it also discusses Envoy sidecar integration, interceptor usage, and lessons learned from real‑world deployments.

Bitu Technology
Bitu Technology
Bitu Technology
Optimizing Performance, Stability, and Edge Cases of Elixir‑gRPC Services in Production

Tubi provides millions of movies and TV shows to users for free and leverages technology to improve the viewing experience. In recent years the team has combined Elixir ¹ and gRPC ² to build many critical services in production.

gRPC is a high‑performance RPC framework originating from Google. It uses Protobuf ³ to define interfaces and transports encoded data over HTTP/2, allowing teams to maintain stable API contracts.

Elixir is a modern functional language built on the Erlang VM, leveraging the Actor model and OTP⁴ to enable highly scalable and maintainable applications with a small engineering team.

As a primary author of Elixir‑gRPC ⁵, the author shares several practical lessons learned from using it in production.

Performance

A Protobuf message can contain dozens of fields, and a single request may carry many such messages. Encoding speed was a bottleneck, but code improvements increased decode performance by 120 % and encode performance by 30 %.

Because Erlang/Elixir handle binary data less flexibly than low‑level languages, unnecessary memory allocations can slow down Protobuf processing. Following Erlang binary‑handling best practices⁶ helps avoid this.

Example: parsing a binary file and summing the last 7 bits of each byte.

# For binary 0000 0001 0100 0000 1000 1000
# The result is 0b1 + 0b100_0000 + 0b1000 = 73
defmodule BinaryParseFast do
def parse(bin) do
parse(bin, 0)
end
# A byte beginning with 0 means the end, but still need to add the last 7 bits
def parse(<<0::1, x::7, _::bits>>, acc), do: acc + x
# A byte beginning with 1 means there is more data needed to be processed.
# and we need to add the last 7 bits
def parse(<<1::1, x::7, rest::bits>>, acc) do
parse(rest, acc + x)
end
end
defmodule BinaryParseSlow do
def parse(bin) do
parse(bin, 0)
end
def parse(bin, acc) do
case do_parse(bin) do
{:nofin, x, rest} ->
parse(rest, acc + x)
{:fin, x} ->
acc + x
end
end
def do_parse(<<0::1, x::7, _::bits>>) do
{:fin, x}
end
def do_parse(<<1::1, x::7, rest::bits>>) do
{:nofin, x, rest}
end
end

The fast version is quicker because Erlang optimizes binary pattern matching, avoiding repeated binary creation. A benchmark⁷ shows a two‑fold speed increase, and the Erlang compiler can emit warnings such as bin_opt_info to highlight binary allocation issues:

export ERL_COMPILER_OPTIONS=bin_opt_info
$ mix run binary_parse_slow.exs
warning: BINARY CREATED: binary is used in a term that is returned from the function
binary_parse_slow.exs:15
$ mix run binary_parse_fast.exs
warning: OPTIMIZED: match context reused
# This is good
binary_parse_fast.exs:18

These compiler hints indicate that the code has been optimized; otherwise developers should manually apply similar optimizations to improve Protobuf‑Elixir performance.

Elixir‑gRPC now also supports data compression. Although Protobuf messages are already smaller than JSON, compressing large string payloads (e.g., with gzip) can further reduce network overhead.

Stability

Before a service is exercised by massive user traffic, its stability is hard to gauge. However, any outage in an Elixir service can severely impact the viewing experience.

Erlang/OTP together with the cowboy HTTP server provides a solid foundation.

Elixir‑gRPC’s interoperability tests cover large responses, streaming, error handling, and more.

Extensive data‑set simulations have been run against the services.

Multiple critical services already run Elixir‑gRPC in production.

Envoy and Interceptors

Envoy ¹³ is often used as a sidecar to manage service‑to‑service communication, offering service discovery, load balancing, retries, etc. Tubi uses Envoy extensively, which reduces the need to implement these features in each service.

Envoy provides metrics such as request rate and latency, but lacks per‑gRPC‑method metrics. Elixir‑gRPC includes built‑in interceptors like the statsd and prometheus interceptors, and custom interceptors can add platform tags (e.g., FireTV, Web, iOS, Android).

Edge‑Case Handling

While cowboy and gun are excellent, their HTTP/2 support still has quirks. One issue is flow‑control handling: if a stream ends early, the connection window may become zero, blocking other streams until a new connection is created.

Another problem is improper handling of the GOAWAY frame by gun . GOAWAY is used to gracefully close a connection, but gun returns an error instead of continuing to process existing streams, causing client‑side failures.

These issues are illustrated with diagrams (omitted here) showing how windows are updated and how GOAWAY affects stream processing.

Conclusion

Combining gRPC with Elixir has enabled Tubi to build more services efficiently. While getting a project off the ground is straightforward, achieving production‑grade stability and performance is where the real learning happens.

As Tubi’s business grows, new challenges arise, but the team welcomes anyone interested in tackling them.

BackendperformancegRPCProtobufHTTP/2EnvoyElixir
Bitu Technology
Written by

Bitu Technology

Bitu Technology is the registered company of Tubi's China team. We are engineers passionate about leveraging advanced technology to improve lives, and we hope to use this channel to connect and advance together.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.