Big Data 17 min read

Kylin4 Deployment and Performance Optimizations at Youzan

Since 2018 Youzan has migrated all online services to Kylin4, addressing long cube rebuilds, single‑point cache, CPU spikes, and throttling gaps by adding batch segment builds, low‑priority concurrency controls, Redis‑based query caching, parquet skew mitigation, range‑query acceleration, and class‑loader optimizations, which together doubled query‑per‑second capacity to 150, cut latency by up to 50 % and reduced CPU usage.

Youzan Coder
Youzan Coder
Youzan Coder
Kylin4 Deployment and Performance Optimizations at Youzan

Since 2018, Youzan has been using Kylin4 for its data‑analytics platform, gradually migrating all online services to Kylin4 by 2021. The engine now supports almost every business module, including merchant back‑office, financial, traffic, inventory, fulfillment, supply‑chain and marketing reports, which require low latency and high stability.

Key pain points identified in the current production environment are:

Long‑running cube rebuilds or refreshes for large time ranges, lacking batch segment construction and automatic segment validity checks.

Single‑point query cache causing inconsistent performance.

Frequent CPU spikes caused by high‑concurrency queries, especially when many cubes share the same query workload.

Insufficient query throttling at the cube level.

Functional and stability optimizations :

Added a batch segment build/refresh feature with configurable granularity (by month or cube merge interval) and three operation types: BUILD , REFRESH , BUILD_OR_REFRESH . The logic automatically splits large time ranges into multiple segments and detects overlapping segments to decide whether to build or refresh.

Implemented low‑priority concurrency control for batch jobs, limiting Spark executor counts and parallelism to protect high‑priority tasks.

Provided an API to detect missing ready segments after batch operations.

Introduced cube‑level query throttling with dynamic SQL‑regex rules, offering administrator APIs for one‑click enable/disable of limits.

Security and resource‑usage tweaks :

Reduced BCryptPasswordEncoder iteration count from the default 10 to 4, cutting hash‑generation CPU cost.

Extended authentication cache expiration (property Kylin.server.auth-user-cache.expire-seconds ) from 300 s to a larger value to lower cache‑miss spikes.

Added monitoring for discarded segment merges to prevent small segments from accumulating.

Query cache redesign :

Moved from a local cache to a Redis‑based distributed cache. After health‑check‑driven fallback, cache hit rate increased from ~20 % (single‑node) to ~41 %, CPU usage dropped ~25 %, and query latency (RT) fell ~50 %.

Parquet storage skew mitigation :

Analyzed row‑group size mis‑configurations that caused task‑level data skew. Introduced configurable minimum validation passes and re‑built skewed row groups, reducing RT from 33 s to 1.2 s and I/O from 879 MB to 77 MB.

Range‑query acceleration :

Implemented automatic segment‑metadata matching for date predicates, allowing queries to use coarser‑grained cuboids. Benchmarks on a one‑year range showed QPS ↑ 40 %, RT ↓ 20 % (up to 50 % in hot spots), and I/O ↓ 70 %.

Classloader performance improvements :

Enabled parallel class‑loading locks in the custom TomcatClassLoader .

Cached negative class‑load results to skip parent‑loader lookups.

Singleton‑ized frequently thrown exceptions to avoid stack‑trace construction.

Replaced Class.forName with ClassLoader.loadClass in Janino‑based dynamic compilation, reducing global lock contention.

Post‑optimization measurements show QPS rising from 70 to 150 while keeping RT stable, effectively eliminating the high‑concurrency class‑loading bottleneck.

performance optimizationbig dataSparkSegmentCubeKylinquery caching
Youzan Coder
Written by

Youzan Coder

Official Youzan tech channel, delivering technical insights and occasional daily updates from the Youzan tech team.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.