Big Data 13 min read

Fast Attribution Engine (FAE): A High‑Performance Distributed Computing Engine for User Behavior and Advertising Attribution

The article introduces Alibaba's Fast Attribution Engine (FAE), describing the technical challenges of user behavior and advertising attribution, its data model (AFile), system architecture, performance advantages over traditional OLAP solutions, and a range of application scenarios such as frequency analysis, crowd flow modeling, path, retention, funnel analysis, and visitor selection.

DataFunTalk
DataFunTalk
DataFunTalk
Fast Attribution Engine (FAE): A High‑Performance Distributed Computing Engine for User Behavior and Advertising Attribution

Speaker and Platform Pan Mingliang, Senior Technical Expert at Alibaba Advertising Data Platform, presented on behalf of DataFunTalk.

1. Technical Challenges in User Behavior Analysis

Marketing attribution requires distinguishing cause and effect among ad touchpoints (exposure, click, etc.) and desired outcomes (registration, purchase, etc.). The analysis must handle massive data volumes (hundreds of billions of events) with flexible query dimensions and interactive, near‑real‑time response.

2. Marketing Attribution Challenges

The attribution process involves five steps: condition filtering, behavior association (e.g., same‑product, same‑brand), weight distribution, dimensional aggregation, and result filtering. Traditional OLAP systems cannot meet the sub‑second performance required for these complex operations.

3. User Growth Analysis Challenges

Growth analysis asks "when", "where", "who", and "what" about user events, using models such as funnel, retention, path, portrait, and distribution. These analyses support product design, growth operations, and high‑level business decisions.

4. Characteristics of User Behavior Analysis

Individual‑centric: each visitor is analyzed independently.

Temporal: all analyses require a time sequence.

Flexible query patterns.

Huge data volume (hundreds of billions of impressions, tens of billions of conversions).

Interactive: combinatorial condition space demands millisecond‑level latency.

FAE Overview

FAE (Fast Attribution Engine), later renamed Fast Analytic Engine, is an MPP distributed engine built by Alibaba Momma for user behavior and advertising attribution. It can finish hundred‑billion‑scale attribution within seconds, supports multiple built‑in models, and allows user‑defined extensions. It offers three execution modes: ad‑hoc, offline, and near‑real‑time.

Compared with Google’s Mesa+Shasta (time‑slice incremental model, materialized view) and Baidu’s Palo (Impala‑based), FAE uses a patented AFile storage structure, handles far larger data volumes, provides richer models, and delivers higher QPS and extensibility.

FAE Data Model (AFile)

The AFile model stores events in a pre‑ordered, user‑ID‑partitioned format. Processing steps:

Ingest raw events into MySQL or other DBs.

Extract user data.

Group events by user ID.

Sort each group by timestamp.

Apply UDAF/MapReduce to compute analysis models.

This design avoids mixing different users' data during computation and enables fast causal queries.

FAE System Architecture

FAE consists of four core modules:

Master : stateless entry point that routes queries.

Importer : imports data from external sources (e.g., ODPS) and can be parallelized.

Merger : shards imported data and distributes shards to nodes.

Worker : computes queries on local AFile, runs models, aggregates results in a tree structure, and returns answers within milliseconds to seconds.

Supporting services include Redis for metadata management and MySQL for logs, plus monitoring and operations modules, forming a traditional MPP architecture. FAE runs on local‑disk mode or a distributed NAS (Pangu) mode for unlimited scaling.

Application Scenarios & Optimization

Behavior frequency analysis – counts per‑user actions, 10‑100× faster than conventional pipelines.

Crowd flow modeling – hierarchical user segmentation with compressed incremental storage, reducing data size by 10‑100×.

Path analysis – visualizes multi‑channel user journeys across ad placements.

Retention analysis – tracks post‑exposure actions over successive days.

Funnel analysis – measures conversion rates across stages (homepage → detail page → purchase).

Visitor selection based on behavior – filters visitors meeting specific event criteria for further analysis or targeting.

Q&A Highlights

Data import is incremental: daily batch for offline data and continuous streaming for real‑time data.

AFile preserves event order at insertion; offline and real‑time partitions are kept distinct.

Query type determines resource profile: cold queries are I/O‑bound, hot queries are CPU‑bound.

Thank you for attending.

Follow, like, and give a three‑click boost at the end of the article.

Big Datadistributed computingFAEMPP engineuser behavior analysisadvertising attribution
DataFunTalk
Written by

DataFunTalk

Dedicated to sharing and discussing big data and AI technology applications, aiming to empower a million data scientists. Regularly hosts live tech talks and curates articles on big data, recommendation/search algorithms, advertising algorithms, NLP, intelligent risk control, autonomous driving, and machine learning/deep learning.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.