How Wix Scales 1400+ Microservices with Event‑Driven Kafka Patterns
This article explains how Wix engineers built a robust, event‑driven messaging infrastructure on Kafka that serves over 1,400 microservices, detailing six key patterns—including consumption‑projection, end‑to‑end event flows, in‑memory KV stores, scheduling, transactional events, and aggregation—that improve scalability, resilience, and operational simplicity.
1. Consumption and Projection
Targeting heavily used services that become bottlenecks , Wix created a materialized view for the MetaSite service, which stores site metadata (versions, owners, installed apps) accessed by many other services such as Wix Stores, Bookings, and Restaurants. The service receives over 1 M RPM requests, making it a performance hotspot.
To offload read traffic, a dedicated write‑only service streams all site metadata changes to a Kafka topic and builds a read‑only view containing only the "installed apps" context. A separate read‑only service then serves queries directly from this view, achieving read/write separation.
Effect
Streaming data to Kafka decouples MetaSite from consumers, dramatically reducing service and DB load. The materialized view provides an eventually consistent projection that satisfies client queries with far less pressure on the original service.
2. End‑to‑End Event‑Driven Flow
For simple business processes, Wix replaces the traditional request‑response model with a Kafka‑backed event stream combined with WebSocket notifications. This design persists messages in Kafka, enabling fault‑tolerant processing and eliminating the need for client polling.
Example: importing contacts. The Contacts Jobs service receives a CSV import request, creates a job event in Kafka, and immediately returns an HTTP response. The Jobs service then publishes the job request to Kafka; the Contacts Importer consumes the event, performs the import, and notifies a WebSocket service, which pushes progress updates to the browser.
Effect
The approach removes polling, reduces backend state, and scales effortlessly across data centers because each service operates independently.
3. In‑Memory KV Store
When low‑latency access to configuration data is required without a full relational table, Wix uses compressed Kafka topics as a persistent key/value store. Redis‑AOF provides similar capabilities, but Kafka offers durability and the ability for multiple consumers to read updates.
Two services—Wix Business Manager and Wix Bookings—share a compressed topic for country data; Bookings consumes updates and automatically adds corresponding time‑zone entries, demonstrating zero‑latency reads from the in‑memory KV store.
4. Schedule‑and‑Forget
For recurring jobs such as subscription renewals, Wix uses a custom Job Scheduler that invokes the Payments Subscriptions service via a pre‑configured REST endpoint. The scheduler repeatedly polls the subscription status until completion, but a more efficient pattern generates a Kafka request so that ordering and retries are handled by the consumer side.
Greyhound consumers support configurable blocking policies and exponential‑backoff retry intervals, while a dead‑letter queue captures messages that exhaust all retries for manual inspection.
5. Events in Transactions
When idempotency is hard to guarantee, Wix wraps the production of downstream events inside a Kafka transaction. For example, the Payments service emits an Order Purchase Completed event; the Checkout service consumes it, creates an Order Checkout Completed event, and publishes it transactionally so that downstream services (Delivery, Inventory, Invoices) see the event only after the transaction commits.
The Payments producer is configured as an idempotent producer, ensuring duplicate messages are discarded by the broker.
6. Event Aggregation
To know when a batch of jobs has finished, Wix uses a compressed Kafka topic as an atomic KV store. Each job completion writes a Job Completed event to the store; a consumer‑producer pair processes these events in order, incrementing a counter keyed by the import request ID. When the counter matches the total number of jobs, a WebSocket notification is sent to the user.
Exactly‑once processing
Kafka Streams API (or the existing producer/consumer infrastructure) provides the necessary grouping, reduction, and filtering to implement this pattern without race conditions.
Summary
The six patterns—consumption & projection, end‑to‑end event flow, in‑memory KV store, schedule‑and‑forget, transactional events, and event aggregation—share the same principle: using event‑driven design reduces boilerplate code, eliminates polling and lock contention, improves resilience, and makes scaling microservices as simple as adding more Kafka partitions and service instances.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Programmer DD
A tinkering programmer and author of "Spring Cloud Microservices in Action"
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
