Design and Architecture of a Scalable Host‑Based Intrusion Detection System (HIDS)
The paper presents a highly scalable, low‑overhead Host‑based Intrusion Detection System architecture designed for hundreds of thousands of servers, emphasizing cluster high‑availability, strong consistency via a CP‑oriented etcd backend, Go‑based agents with efficient resource management, modular sandboxing, and robust process monitoring to ensure reliable, secure operation at massive scale.
The article describes the design of a Host‑based Intrusion Detection System (HIDS) intended for deployment on hundreds of thousands to millions of servers in large IDC environments. It outlines the security motivations, the need for low‑overhead agents, and the functional requirements such as high‑availability, distributed configuration, user‑space detection, vulnerability scanning, secure configuration channels, authentication, change history, and self‑update capabilities.
Key architectural goals include cluster high availability, decentralised design, strong configuration consistency, partition tolerance, and the ability to handle massive scale while preserving CPU and memory resources.
Technical challenges are analysed, covering resource constraints, massive agent control latency, data consistency across partitions, network traffic impact, and log‑processing pressure.
The discussion of the CAP定理 leads to the selection of a CP‑oriented solution. After evaluating etcd, ZooKeeper, and Consul, the authors choose etcd for its consistency guarantees, rich client SDK, and proven use in projects such as Kubernetes.
Etcd key design is presented, e.g., /hids/server/config/{hostname}/master, /hids/agent/master/{hostname}, and /hids/agent/config/{hostname}/plugin/ID/conf_name, enabling fine‑grained configuration distribution and lease‑based agent health monitoring.
Go is selected as the implementation language because of its static compilation, low runtime overhead, strong concurrency model, and native compatibility with etcd client libraries.
The framework abstracts common functionality through sandboxed modules, an IConfig interface for configuration validation, optimized timer/clock mechanisms to reduce system calls, a Catcher for panic recovery, and standardized lifecycle interfaces (Init, Run, Shutdown).
Rate‑limiting, disk I/O management, and logging level strategies are described to control data flow and storage consumption. A retry mechanism with exponential back‑off ( 指数级回退) prevents snow‑balling reconnection attempts.
For process monitoring, three approaches are compared: kernel connector, hook‑based, and netlink. The low‑invasiveness cn_proc netlink solution is chosen, with fallback to user‑space collection for Docker environments.
Operational issues such as kernel netlink back‑pressure, memory growth due to frequent object allocation, and GC overhead are addressed by introducing a bounded queue and using sync.Pool for object reuse, reducing memory usage from >200 MB to ~15 MB in tests.
After a year of development, the system has been deployed on tens of thousands of servers, demonstrating stability while acknowledging remaining work on data completeness and fine‑grained operational controls.
Overall, the article provides a comprehensive view of large‑scale security agent architecture, emphasizing consistency, low invasiveness, and robust monitoring/alerting mechanisms.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Meituan Technology Team
Over 10,000 engineers powering China’s leading lifestyle services e‑commerce platform. Supporting hundreds of millions of consumers, millions of merchants across 2,000+ industries. This is the public channel for the tech teams behind Meituan, Dianping, Meituan Waimai, Meituan Select, and related services.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
