How to Build NSQ Multi‑Data‑Center Deployment with Lookup‑Migrate

This article explains the design and implementation of NSQ dual‑ and multi‑data‑center architectures using a lookup‑migrate proxy, covering deployment scenarios, routing strategies, migration phases, JSON response transformations, and practical lessons learned for reliable message publishing and consumption across data centers.

Youzan Coder
Youzan Coder
Youzan Coder
How to Build NSQ Multi‑Data‑Center Deployment with Lookup‑Migrate

Overview

This article explains how Youzan extended NSQ to support dual‑ and multi‑data‑center deployments. The solution adds a proxy called lookup-migrate in front of the existing nsqlookupd service‑discovery component, allowing producers and consumers to operate transparently across multiple rooms (data centers) without any code changes in the business layer.

Scenario and Requirements

In a single‑room deployment, producers publish to an NSQ cluster and consumers subscribe via nsqlookupd. In a dual‑room scenario both production and consumption may be spread across two rooms, requiring:

Transparent failover when a room becomes unavailable.

Minimal latency impact for the business side.

Ability to switch topics between rooms without data loss.

Dual‑Data‑Center Design

The design keeps the existing nsqlookupd API but inserts lookup-migrate as a forward proxy. The proxy merges the lookup results from the two rooms and returns a combined list of nsqd nodes. Topics are mirrored in both rooms; messages are persisted only in the local room, so a failure of the local room does not lose already stored messages.

Phase 1 – Basic Dual‑Room Routing

All read/write traffic is routed through lookup-migrate to a single (primary) room, while the other room acts as a cold standby. This provides a simple failover path at the cost of cross‑room latency.

Phase 2 – Local Production with Dual‑Room Consumption

After Phase 1 stabilises, write requests are sent directly to the local nsqd nodes. lookup-migrate still returns nsqd nodes from both rooms for consumers, eliminating write latency while allowing consumers to read from the full set of messages.

Lookup‑Migrate Routing and Migration Strategy

The proxy distinguishes read and write operations by adding an access flag to the lookup request. It then aggregates the JSON response, merging the partitions (local room) and producers (remote room) arrays. Example merged response:

{
  "channels": ["default"],
  "meta": {"extend_support": true, "ordered": false, "partition_num": 1, "replica": 2},
  "partitions": {
    "0": {"broadcast_address": "11.0.0.1", "hostname": "nsq1", "http_port": 4151, "tcp_port": 4150, "version": "0.3.7-HA.1.9.5"},
    "1": {"broadcast_address": "21.0.0.2", "hostname": "nsq2", "http_port": 4151, "tcp_port": 4150, "version": "0.3.7-HA.1.9.5"}
  },
  "producers": [
    {"broadcast_address": "11.0.0.1", "hostname": "nsq1", "http_port": 4151, "tcp_port": 4150, "version": "0.3.7-HA.1.9.5"},
    {"broadcast_address": "21.0.0.2", "hostname": "nsq2", "http_port": 4151, "tcp_port": 4150, "version": "0.3.7-HA.1.9.5"}
  ]
}

Migration proceeds in three steps:

Proxy consumer lookup requests to both rooms’ nsqd nodes.

After the consumer establishes connections, proxy producer requests to the target nsqd (the destination room).

Detach the source‑room connections once the migration succeeds.

For ordered topics the production side is switched first; consumption is moved only after confirming that the source channel has no backlog.

Extension to Multi‑Data‑Center

When more than two rooms are required, the lookup‑migrate configuration uses three fields per topic: #C – an array of lookup addresses for consumers. #P – a single lookup address for producers. #D – a default mapping used when a topic does not have an explicit entry.

Example configuration:

{
  "topics": [
    {
      "topicA": {
        "#C": ["lookup_addr1", "lookup_addr2", "lookup_addr3"],
        "#P": "lookup_addr1",
        "#D": "default lookup_addr"
      }
    }
  ]
}

A lookup‑schema maps short names to full URLs, reducing configuration size:

{
  "lookupSchema": {
    "nsq1": "http://nsq1.example.com:4161",
    "nsq2": "http://nsq2.example.com:4161",
    "nsq3": "http://nsq3.example.com:4161"
  }
}

Practical Experience and Lessons Learned

Proxy deployment: Choose between transparent request forwarding (single port) or a dedicated reverse‑proxy that exposes the lookup-migrate port.

Deterministic ordering: Sort the list of lookup addresses before merging so that partition selection is stable and duplicate reconnects are avoided.

Partial failures: If a lookup request to one room fails, merge the successful results and still return a usable list to the client.

Discovery optimisation: Use listlookup to obtain all nsqlookupd instances in a cluster, enabling load‑balanced proxying.

Conclusion

The lookup-migrate approach provides a transparent, low‑cost way to run NSQ across dual‑ or multi‑data‑center topologies. It supports seamless failover, smooth migration of producers and consumers, and works for both ordered and unordered topics while keeping existing applications unchanged.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

BackendDistributed SystemsMessage Queuemulti‑datacenterNSQlookup-migrate
Youzan Coder
Written by

Youzan Coder

Official Youzan tech channel, delivering technical insights and occasional daily updates from the Youzan tech team.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.