Big Data 8 min read

Choosing the Right Open‑Source Big Data Stack for Advertising: Expert Insights

This article records a WeChat Q&A where industry experts discuss selecting open‑source big data solutions, advertising‑specific data scenarios, and share a practical lambda‑style platform architecture featuring Hadoop, Spark, Storm, Elasticsearch, Redis and MySQL.

Efficient Ops

Aug 27, 2015

Choosing the Right Open‑Source Big Data Stack for Advertising: Expert Insights

Key Questions and Answers

The article records a Q&A session from the "Efficient Operations" WeChat group that focused on big‑data practices in the advertising industry.

Q1: How to choose among many open‑source big‑data solutions?

In advertising, offline processing typically uses Hadoop, while real‑time tasks rely on Storm or Spark for graph computation. Common resource‑management frameworks include:

Mesos

YARN

Corona

Torca

Omega

Mesos

Originating from a UC Berkeley research project, Mesos is an Apache Incubator project used by companies such as Twitter. It follows a Master/Slave architecture where the Master stores lightweight state about frameworks and slaves, allowing easy recovery via Zookeeper.

Advantages: supports both short‑lived tasks and long‑running services, and its coarse‑grained resource allocation fits environments with multiple coexisting computation frameworks.

Drawback: the DRF scheduling algorithm focuses heavily on fairness and may ignore specific application needs.

YARN

YARN is Hadoop 2.0’s resource manager, quickly adopted by Hadoop components and offering many built‑in scheduling algorithms. Its ResourceManager handles task scheduling for all applications, but integrating traditional database workloads can be inefficient.

Corona

Corona, an open‑source next‑generation MapReduce framework from Facebook, shares design goals with YARN. In many Hadoop deployments, YARN and Mesos remain the primary choices.

Advertising‑specific big‑data stack

Advertising systems often combine:

Storm for billing and anti‑fraud real‑time calculations

Spark’s MLlib for machine‑learning tasks such as click‑through‑rate prediction, clustering, and collaborative filtering

The following diagram shows an internal DMP data‑processing architecture that incorporates Hadoop, Spark, Storm, Elasticsearch, Redis and MySQL.

Q2: Advertising industry big‑data use cases and challenges

Massive scale : millions of pages, billions of users, billions of ad‑transaction requests per day, with strict latency (e.g., 100 ms bid response).

Dynamic user targeting : user interests change rapidly, requiring timely profile updates to avoid irrelevant ad delivery.

Frequent context changes : varying user contexts and page content demand adaptive ad selection.

Q3: Company’s big‑data platform architecture (lambda‑style)

The platform consists of the following open‑source components:

Hadoop for offline reporting and user‑profile generation

Storm for low‑latency real‑time billing and anti‑fraud

Spark (MLlib) for machine‑learning tasks such as click‑through‑rate prediction

Elasticsearch for near‑real‑time indexing and time‑series queries

HBase and MySQL for final result storage and front‑end queries

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Advertising Big Data Data Platform Open-source Lambda architecture

Written by

Efficient Ops

This public account is maintained by Xiaotianguo and friends, regularly publishing widely-read original technical articles. We focus on operations transformation and accompany you throughout your operations career, growing together happily.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.