Design and Evolution of NetEase Advertising Engine Platform
NetEase’s advertising engine platform evolved from a monolithic, high‑concurrency system handling over a billion daily requests into a layered, distributed architecture that unifies indexing, billing, user‑tagging, and monitoring services, leverages Elasticsearch and custom extensions for fast retrieval, and plans further upgrades such as a custom retrieval kernel and Go‑based services.
Platform Background
Internet advertising is the most common commercial model, covering almost all online products. NetEase's programmatic advertising team is responsible for monetizing NetEase's own traffic (news, mail, open courses, PC & WAP, etc.) and for integrating external ADX traffic such as Guangdiantong, Chuan Shan Jia, and Kuaishou. The advertising delivery engine, as the core system for traffic handling and ad decision‑making, has been continuously refactored to provide unified capabilities such as retrieval, billing, ranking, caching, logging, and distributed communication.
1.1 From 0 to 1 – System Construction
The initial phase focused on rapid layout, development, and launch, resulting in a high‑concurrency (over 1 billion requests per day), high‑availability, real‑time computation, and massive storage platform. Two main business lines emerged: brand advertising (CPM‑based) and performance advertising (click/ conversion‑based).
High concurrency: >1 billion ad requests daily.
High availability: essential for revenue stability.
Real‑time calculation: instant recall, billing, multi‑dimensional targeting, and algorithmic ranking.
Massive storage: user‑level behavior logs and ad process logs.
1.2 Platform‑wide System Refactoring
To avoid redundancy of the monolithic engine, the team abstracted common capabilities into a distributed architecture consisting of four layers: business layer, service layer, capability layer, and data layer.
Unified service layer offering indexing, billing, user‑tag, frequency control, ranking, etc.
Business lines plug in only the services they need, reducing duplicate development.
Full‑chain monitoring and alarm capabilities for the upper‑level business.
Technical accumulation enables parallel evolution of technology and business.
The platform was built on the “Easy Effect” engine, completing the construction, service provisioning, and migration within six months.
2.1 Index Service
The index service provides real‑time ad recall via both inverted and forward lookup. Version 1 (v1) introduced a queryNode (stateless, bitset‑based) and an updateNode (data synchronization, index building). The architecture allowed horizontal scaling and smooth capacity expansion.
Limitations of v1 (tight coupling with business, inflexible queries, performance bottlenecks for massive data) led to the development of version 2 (v2).
2.1.2 Index Service v2
v2 added an Elasticsearch cluster and introduced three key improvements:
Technical introduction: ES provides distributed storage, sharding, and persistence, solving the memory bottleneck of v1.
Open‑source extensions: Customized ES modules to meet specific ad‑retrieval requirements.
Business isolation: Business logic moved to a separate sync service, enhancing reusability.
Performance enhancements include parallel full‑sync (reducing sync time from 2 min to <30 s), hierarchical indexing (parent‑child document relationships), and query‑node‑side forward lookup that disables ES source storage and caches full data in memory, cutting average retrieval latency to ~10 ms.
2.2 User Tag Service (UTS)
UTS delivers real‑time user interest tags for precise ad targeting. It consolidates three data sources—DMP, crowd packs, and internal ad tags—into a unified service, decoupling business lines from data‑source management. Tags are stored in protobuf format, saving ~30 % space. The service supports configurable strategies (regular and recommendation) and integrates with a configuration center and AB‑testing platform.
2.3 Billing Service
The billing service handles budget filtering and effect recovery to control overspend and smooth consumption. It processes billions of requests daily, logs user actions to HDFS and Druid, and employs lock‑free designs and multi‑level caching to keep average latency around 1 ms.
3 Service Assurance and Monitoring
Given the revenue‑critical nature of ad delivery, the platform implements three monitoring categories via the Scout monitoring system:
System monitoring: module latency, CPU, memory.
Metric monitoring: core business KPIs (requests, wins, revenue) with real‑time alerts.
Business monitoring: detailed process‑level logs for troubleshooting and optimization.
Data is collected via Kafka, aggregated, stored, and alarmed based on importance levels.
4 Future Development
Planned evolutions include:
Index Service 3.0 – replace ES with a custom distributed retrieval kernel.
Billing Service 2.0 – introduce “competition rate” parameter for adaptive budget release.
Language upgrades – migrate performance‑critical services from Java to Go while retaining Java for business‑sensitive components.
The team emphasizes continuous technical upgrades, service‑level specialization, and a culture of pragmatic, cumulative engineering.
NetEase Media Technology Team
NetEase Media Technology Team
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.