Performance Optimization and Monitoring of Kerberos KDC Service

The article examines Kerberos KDC performance issues on Meituan‑Dianping’s data platform, showing that PREAUTH halves throughput while RAID10 has little effect, and that a single‑CPU core limits QPS; deploying 40 processes and disabling PREAUTH raises throughput over tenfold, and a lock‑free shared‑memory monitoring module with the kstat tool provides real‑time metrics for troubleshooting.

Meituan Technology Team
Meituan Technology Team
Meituan Technology Team
Performance Optimization and Monitoring of Kerberos KDC Service

This article analyzes the performance bottlenecks of the Kerberos Key Distribution Center (KDC) used in Meituan‑Dianping’s data platform and proposes optimization and monitoring solutions.

Background : Kerberos provides strong authentication via shared‑key cryptography. The platform’s KDC, a core service for Hive, HDFS, YARN, etc., suffered from insufficient QPS and unreliable monitoring as traffic grew.

KDC Principle : The authentication flow consists of three stages – Client ↔ AS, Client ↔ TGS, and Client ↔ Service – each involving ticket exchange and session‑key encryption.

Main Optimization Work : Experiments focused on the AS and TGS stages, using the Grinder load‑testing tool. Variables included process count (single vs. 40 processes), PREAUTH attribute (enabled/disabled), and RAID10 storage.

AS Load Test Results (requests per second): • Single‑process, PREAUTH, no RAID: 49 • Single‑process, PREAUTH, RAID10: 53 • Single‑process, no PREAUTH, no RAID: 100 • Single‑process, no PREAUTH, RAID10: 104 • 40 processes, PREAUTH, no RAID: 115 • 40 processes, PREAUTH, RAID10: 990 • 40 processes, no PREAUTH, no RAID: 2000 • 40 processes, no PREAUTH, RAID10: 1985

Analysis shows that PREAUTH roughly halves QPS, RAID has little impact, and the bottleneck is a single CPU core. Multi‑process deployment fully utilizes CPU cores, raising throughput by ~16×.

TGS Load Test Results (requests per second): • Single‑process, PREAUTH, no RAID: 63 • Single‑process, PREAUTH, RAID10: 58 • Single‑process, no PREAUTH, no RAID: 66 • Single‑process, no PREAUTH, RAID10: 61 • 40 processes, PREAUTH, no RAID: 1303 • 40 processes, PREAUTH, RAID10: 1342 • 40 processes, no PREAUTH, no RAID: 1347 • 40 processes, no PREAUTH, RAID10: 1339

These results indicate that TGS processing is CPU‑bound but not affected by disk I/O, thanks to high BDB cache hit rates.

Monitoring Design : Since Kerberos implementations lack external metrics, a shared‑memory based monitoring module was added. Each KDC process writes its own metrics into a lock‑free slot, enabling real‑time per‑process observation with negligible overhead. The design is extensible; adding new metrics requires only three code changes.

Tool kstat : Provides two APIs – cumulative metric values for integration with Falcon monitoring, and per‑process instantaneous rates with second‑level granularity for on‑site troubleshooting.

Conclusion : Multi‑process KDC with PREAUTH disabled and sufficient BDB memory yields >10× performance improvement. RAID10 is recommended when PREAUTH is enabled to avoid disk‑write bottlenecks. Future work will explore TCP half‑/full‑connection queue tuning.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

performanceoptimizationAuthenticationKerberosKDC
Meituan Technology Team
Written by

Meituan Technology Team

Over 10,000 engineers powering China’s leading lifestyle services e‑commerce platform. Supporting hundreds of millions of consumers, millions of merchants across 2,000+ industries. This is the public channel for the tech teams behind Meituan, Dianping, Meituan Waimai, Meituan Select, and related services.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.