Information Security 15 min read

Airbnb Data Privacy and Security Engineering – Data Protection Platform (DPP) Overview and Madoka Metadata System

Airbnb’s Data Protection Platform (DPP) combines automated discovery, classification, encryption and privacy‑orchestration services—Inspekt, Angmar, Cipher, Obliviate, Minister, and the Madoka metadata system—to continuously inventory petabyte‑scale MySQL, Hive and S3 assets, track ownership and security attributes, and enforce GDPR, PIPL and CCPA compliance.

Airbnb Technology Team
Airbnb Technology Team
Airbnb Technology Team
Airbnb Data Privacy and Security Engineering – Data Protection Platform (DPP) Overview and Madoka Metadata System

Welcome to the "Airbnb Data Privacy and Security Engineering" series, which is divided into three parts to explain how to build a powerful, automated, and scalable data security solution.

With increasing reports of data breaches and the emergence of international regulations, data governance and protection have become urgent and highly visible challenges.

Airbnb is committed to safeguarding user data while respecting privacy rights.

Data collection, storage, and transmission at Airbnb span multiple storage systems and infrastructure, making manual tracking of user and sensitive data flows impractical. This complexity drives the need for an ideal data‑security tool that supports the ecosystem’s data stores and meets the requirements of data development and automated protection.

This series will discuss how to create and maintain a Data Protection Platform (DPP) to address these challenges.

The first article quickly reviews the background and technical architecture of the DPP and provides an in‑depth look at the data inventory component – Madoka.

To meet international legal and security requirements, Airbnb decided to build a data‑protection platform that first understands the security and privacy risks associated with its data.

Airbnb stores petabyte‑scale data in MySQL, Hive, and S3. A centralized inventory system continuously tracks assets and stores security and privacy metadata, enabling stakeholders to assess risk.

Understanding the type of data stored in each asset is crucial for determining protection levels, especially under regulations such as GDPR, China’s PIPL, and California’s CCPA. A scalable data‑classification system scans and classifies assets to locate personal data elements (e.g., email addresses, messages, location information).

DPP aims to automate data protection across three key areas: data discovery, leakage prevention, and encryption.

Personal‑data discovery is the first step for privacy compliance. When new personal data is detected, DPP can automatically notify engineers and integrate with the privacy orchestration service to ensure deletion or return upon user request.

Data leakage often occurs when secrets (API keys, credentials) are logged or committed to code. DPP identifies such risks, alerts engineers, and uses encryption tools to mask newly discovered secrets.

Encryption is a core protection method; DPP provides encryption services and client libraries, enabling engineers to encrypt/decrypt sensitive information without manual key management.

The DPP architecture integrates several services:

Inspekt : continuous data‑classification service.

Angmar : key‑detection pipeline for code repositories.

Cipher : data‑encryption service offering a simple framework for developers.

Obliviate : privacy‑request orchestration service.

Minister : third‑party risk and privacy‑compliance service.

Madoka : metadata service that aggregates security and privacy attributes from various sources.

Data Protection Service : presentation layer that leverages Madoka metadata to trigger automated protection actions and notifications.

Madoka is a metadata system that maintains security and privacy metadata for all Airbnb data assets, providing a centralized repository for engineers and other stakeholders.

Madoka consists of a crawler and a backend service. The crawler runs daily, gathering metadata from internal sources such as GitHub, MySQL, S3, and Inspekt, then publishes it to an AWS SQS queue. The backend consumes the queue, resolves conflicts, stores the metadata in a database, and exposes APIs for other services.

Collected metadata includes asset lists, ownership information, and data‑level classifications, supporting both MySQL and S3 formats.

Asset List : For MySQL, the crawler enumerates clusters, databases, tables, columns, and data types via AWS APIs and JDBI. For S3, it gathers object listings using Terraform configuration and S3 inventory reports, capturing bucket names, object keys, and related metadata.

Ownership : Ownership metadata links assets to the owning service or team, derived from connection statistics for MySQL clusters and Terraform tags for S3 buckets. Team ownership is defined in Git service repositories.

Data Classification : Classification describes the type of data stored in an asset (e.g., personal data). Classification is sourced from Git schema annotations and the automated Inspekt tool. Conflicts between manual and suggested classifications are resolved in Madoka, with workflows for owners to confirm or correct classifications.

Madoka also records additional security and privacy attributes, such as whether an asset is encrypted with Cipher or integrated with Obliviate.

In summary, the first part of the series outlines the motivation for building DPP, describes its architecture, and provides a deep dive into the Madoka metadata component. The upcoming middle article will focus on large‑scale personal and sensitive data classification, and the final article will explore various security and privacy use cases enabled by DPP.

Automationdata protectionmetadata managementAirbnbdata classificationPrivacy EngineeringSecurity Platform
Airbnb Technology Team
Written by

Airbnb Technology Team

Official account of the Airbnb Technology Team, sharing Airbnb's tech innovations and real-world implementations, building a world where home is everywhere through technology.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.