High-Availability Cluster Rate Limiting Platform Based on Sentinel
This article introduces an online practice of cluster rate limiting based on Sentinel, providing a high-availability solution for Sentinel cluster rate limiting and implementing automatic allocation and failover of Token servers.
This article introduces an online practice of cluster rate limiting based on Sentinel, providing a high-availability solution for Sentinel cluster rate limiting and implementing automatic allocation and failover of Token servers.
Why is cluster rate limiting needed? Borrowing from official words: Suppose we want to limit the total QPS of a certain API for a user to 50, but there may be many machines (e.g., 100). In this case, we naturally think of finding a server to specifically count the total number of calls, and the other instances communicate with this server to determine whether they can call. This is the most basic way of cluster flow control.
Additionally, cluster flow control can solve the problem of uneven traffic leading to poor overall flow control effects. Assuming there are 10 machines in the cluster, we set the single-machine flow control threshold to 10 QPS for each machine. Ideally, the total flow control threshold for the entire cluster would be 100 QPS. However, in actual situations, traffic may be unevenly distributed to each machine, causing some machines to start limiting before the total amount is reached. Therefore, relying solely on single-machine dimensions for limitation cannot accurately limit the total traffic. Cluster flow control can accurately control the total number of calls for the entire cluster, combined with single-machine flow control as a backup, which can better play the effect of traffic control.
In the process of traffic governance, we also face these problems. The shortcomings of single-machine rate limiting bring some additional troubles, and some flow control scenarios cannot be completed through single-machine rate limiting, so the cluster rate limiting function has also become an urgent capability needed in traffic governance.
Sentinel cluster flow control is essentially the same as single-machine flow control, both requiring statistics on user-concerned indicators. The difference is that single-machine flow control is performed in each application instance, while cluster rate limiting has a dedicated instance for statistics.
There are two identities in cluster flow control: Token Client and Token Server. Token Client is the cluster flow control client, used to request tokens from the Token Server it belongs to. The cluster flow control server will return the result to the client, deciding whether to limit the flow. Token Server is the cluster flow control server, which processes requests from Token Clients and determines whether to issue tokens based on the configured cluster rules.
For Sentinel cluster flow control, the official provides two modes: standalone mode and embedded mode. Standalone mode (Alone): Deploy a separate machine as the Token Server. Embedded mode: Select a machine node in the entire microservice cluster as the Token Server.
Comparison of the two deployment methods: Standalone mode advantages include independent deployment of the rate limiting server, which will not affect business nodes. Disadvantages include the default single-machine deployment of the Token Server, lack of high availability solutions, and the need for independent deployment, which generates additional resource overhead and maintenance costs.
Based on Sentinel's secondary development, we have formed the platform architecture shown in the figure above, which is mainly divided into console, event center, Token Client, and Token Server.
The console is a visual interface where users can configure rules and view monitoring on the console, and it also implements traffic balancing of Token Servers and related alarm functions.
The purpose of introducing the event center is to lightweight the interaction between the console and Token Client and Token Server. Based on event notification, Token Server and Token Client take corresponding actions; events include configuration changes, node online/offline, etc.
Token Client is a Java jar package. It encapsulates Sentinel's basic functions, response to console events, automatic failover of Token Server cluster, status management of Token Server, reconnection after disconnection, etc.
Token Server is an independently deployed Java program that provides rate limiting and hotspot parameter rate limiting calculation functions externally. It is stateless and can be deployed on multiple Token Servers as needed to form a Token Server cluster. The Token Servers in the cluster are independent and do not affect each other.
High availability implementation of cluster rate limiting: The official only provides a single-machine version implementation for Token Server. As the core link of traffic governance, a single-machine service cannot meet the needs at all. So, how to achieve cluster deployment of the single-machine version of Token Server has become the first core issue of high availability.
Secondly, in the default implementation, whether creating or modifying cluster rate limiting rules, users need to manually set or modify the Token Server on the console to complete all cluster rate limiting configurations. Even if a Token Server fails, manual setting is required to complete the switch. How to automatically allocate Token Servers and achieve fault-free automatic failover capabilities has become the second core issue of high availability of cluster rate limiting.
Token Server cluster deployment solution: The core function of Token Server is to provide traffic calculation capabilities. Each Token Server needs to provide a sliding window for each requested resource to calculate whether the current resource request reaches the upper limit within the window and return the result to the requester. Therefore, it is a real-time computing scenario, and each resource is independent and does not affect each other.
For the above characteristics, we adopted a stateless, multi-node deployment solution. Each Token Server is an independent service that can provide traffic calculation capabilities externally and can achieve horizontal automatic scaling of cluster services as needed.
Automatic allocation and automatic failover of Token Server: This part of the function is mainly implemented in Token Client. When the application where Token Client is located starts, it will process the startup event, pull flow control rule information and available Token Server information from the Sentinel console.
There is a Tcp Channel Proxy layer in the SDK, which is responsible for establishing TCP connections with all Token Servers and maintaining an available TCP connection list internally. It then maintains and manages the status of each TCP connection based on TCP heartbeat.
When initiating a cluster rate limiting request, Token Client will use the hash algorithm to fix the rate limiting request of a resource to a certain Token Server by combining the resource flow control rule identifier with the length of the available TCP connection list, achieving automatic fixed allocation of Token Server. If a channel is abnormal, Tcp Channel Proxy will remove the abnormal channel from the available list. When the list changes, the corresponding hash algorithm result will also change, thereby achieving automatic failover of Token Server failures. At the same time, the background will try to reconnect the abnormal channel. If the reconnection is successful, the channel will be added back to the available list. If all Token Server nodes are abnormal, Token Client will automatically degrade to single-machine rate limiting until an event notification indicates that a Token Server is online, and then re-establish a TCP connection and restore to cluster rate limiting.
Through lightweight design, we have achieved high availability of cluster rate limiting services, and at the same time achieved horizontal automatic scaling of cluster services while ensuring the availability of cluster rate limiting services.
Pressure test: Token Server is the most important part of the entire cluster rate limiting. It bears the pressure of cluster rate limiting calculation, so this part is also the focus of adjustment and optimization. The following are some data in the testing process.
Test scenario: Single 8C16G Token Server scenario, pressure test through internal pressure test platform.
When the QPS is 200,000, the CPU performance of the Token Server machine is as follows. The CPU utilization is around 68%, and the highest load in 5 minutes exceeds 7.
When we increase the QPS to 240,000, the CPU has reached 75%, and the 5-minute load has also approached 10. According to this QPS, we conducted a continuous pressure test for half an hour, and the system performed well without any abnormalities.
In summary, when using an 8C16G machine, a single machine can process 200,000-level cluster flow control calculations per second. If there are requests with QPS exceeding this level, horizontal scaling of machines can be achieved by increasing machine configuration.
Through secondary development based on Sentinel, we have built an internal cluster rate limiting platform. At present, most of Soul's core services have introduced rate limiting components and are conducting traffic governance in various scenarios based on the rate limiting platform.
At the same time, while promoting the use of the rate limiting platform, we have also improved the core functions of cluster rate limiting and carried out a series of expansions and governance. For example: Token Server's self-governance of traffic, exception handling of visual configuration flow control, etc., while completing traffic governance, we are also exploring more application scenarios of this platform.
Soul Technical Team
Technical practice sharing from Soul
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.