How to Throttle Read and Write Traffic in an Elasticsearch Cluster
The article explains why native Elasticsearch throttling is insufficient, introduces node‑level traffic control provided by Infinilabs Gateway, shows detailed configuration examples, parameter meanings, FAQ solutions, advanced tuning tips, and performance comparisons to protect clusters from overload.
Why node‑level traffic control?
Limitations of native throttling
Elasticsearch’s built‑in throttling is weak; developers usually adjust bulk size and concurrent thread count, which only indirectly limits traffic. This cannot precisely control bandwidth—large documents can instantly saturate the network—and it lacks a global view, so overload on a single node can cause cluster‑wide performance swings.
Node‑level traffic control model
Infinilabs Gateway introduces a three‑dimensional node‑level throttling model based on four core parameters:
Traffic rate (QPS or bandwidth)
Concurrent connections
Abnormal circuit‑breaker
Wait‑queue management
Official documentation: https://docs.infinilabs.com/gateway/main/zh/docs/references/elasticsearch/
Configuration example (validated)
elasticsearch:
- name: prod
enabled: true
endpoint: https://127.0.0.1:9200
basic_auth:
username: elastic
password: changeme
traffic_control:
enabled: true
max_qps_per_node: 100 # per‑node max requests per second (test)
max_bytes_per_node: 104857600 # 100 MB/s per node
max_connection_per_node: 500 # max connections per node
max_wait_time_in_ms: 5000 # queue timeout 5 s
entry:
- name: my_es_entry
enabled: true
router: my_router
max_concurrency: 10000
network:
binding: 0.0.0.0:8000
tls:
enabled: true
- name: my_unsecure_es_entry
enabled: true
router: my_router
max_concurrency: 10000
network:
binding: 0.0.0.0:8001
tls:
enabled: false
router:
- name: my_router
enabled: trueParameter reference
max_qps_per_node(int): node‑level request‑rate ceiling; recommended value calculated based on node specifications. max_bytes_per_node (int): node‑level bandwidth limit in bytes; typical setting 70‑80 % of the network capacity. max_connection_per_node (int): limits concurrent connections to avoid connection storms; suggested CPU‑cores × 200. max_wait_time_in_ms (int): maximum time a request may stay in the buffer queue; default 10000 ms, can be reduced (e.g., 5000 ms) according to business tolerance.
Frequently asked questions
How to precisely control write rate?
traffic_control:
max_qps_per_node: 500 # limit bulk operations per second
max_bytes_per_node: 52428800 # limit to 50 MB/s write bandwidthThis combination prevents high‑frequency small‑document spikes and avoids sudden large‑document bursts.
What happens after throttling is triggered?
Requests enter a buffer queue (default maximum wait 10 s).
The traffic_control.max_wait_time_in_ms parameter defines the longest wait; setting it to 5000 yields a 5 s timeout.
Requests that exceed the timeout receive 429 Too Many Requests (customizable).
An abnormal circuit‑breaker can be combined with allow_access_when_master_not_found for node‑level failover.
Advanced tuning techniques
Dynamic weight allocation
Different workloads can be assigned differentiated quotas (example, not verified):
# Log cluster – focus on throughput
- name: logs-cluster
traffic_control:
max_bytes_per_node: 209715200
# Transaction cluster – focus on low latency
- name: order-cluster
traffic_control:
max_qps_per_node: 2000
max_wait_time_in_ms: 2000Monitoring integration
Metrics exposed by the gateway can be visualized in external dashboards to observe queue length, QPS, and bandwidth usage.
Effect comparison
Test scenario: a Python script generates high‑concurrency search requests (50 threads, 60 s) against index test_index_0618, targeting 1000 QPS. The script records successful and failed request counts, actual QPS, and achievement rate.
Without throttling the cluster exhibits spikes and occasional timeouts; after enabling node‑level throttling the QPS stabilizes, queue lengths stay within the configured limit, and latency improves.
Recommendation: perform baseline pressure testing in production, then iteratively fine‑tune the four parameters to match the observed capacity.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Mingyi World Elasticsearch
The leading WeChat public account for Elasticsearch fundamentals, advanced topics, and hands‑on practice. Join us to dive deep into the ELK Stack (Elasticsearch, Logstash, Kibana, Beats).
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
