Cloud Computing 15 min read

Inside Airbnb’s AWS Cloud Architecture and Data Stack

Airbnb’s engineering VP Mike Curtis explains how the company leverages Amazon Web Services, a Hadoop‑based big‑data platform, and custom tools like Aerosolve, Airflow, and Airpal to power its global marketplace, enabling rapid scaling, dynamic pricing, and personalized search through extensive cloud infrastructure and machine‑learning pipelines.

21CTO
21CTO
21CTO
Inside Airbnb’s AWS Cloud Architecture and Data Stack

Background

Mike Curtis, Vice President of Engineering at Airbnb, discussed the company’s rapid growth and the need to share its infrastructure insights with the tech community. Airbnb, founded in 2008 during the economic downturn, grew from a simple room‑rental service to a global platform operating in 190 countries and 40,000 cities with over 12 million listings.

Cloud Infrastructure

Airbnb runs entirely on Amazon Web Services (AWS). When the company launched two years after AWS, it avoided building its own data centers, allowing engineers to focus on product‑specific problems rather than hardware maintenance. Today Airbnb operates about 5,000 EC2 instances, with roughly 1,500 dedicated to application services and the remaining 3,500 used for analytics and machine‑learning workloads.

Data Platform and Big‑Data Stack

The core data platform is built on Hadoop. Airbnb initially used Amazon Elastic MapReduce, then migrated to a self‑managed Hadoop cluster after outgrowing EMR. The stack includes HDFS for storage, a Hive data warehouse, and the Presto SQL query engine for fast ad‑hoc analysis. For batch processing, MapReduce remains useful for long‑running queries.

Airbnb also developed internal tools such as Airpal (a UI for writing Presto queries) and Aerosolve , an open‑source machine‑learning engine that powers dynamic pricing recommendations. These tools are available on GitHub and integrate with Apache Spark for early‑stage experiments.

Machine Learning, Search, and Pricing

Machine learning drives search ranking, fraud detection, identity verification, and dynamic pricing. Airbnb’s recommendation engine evaluates hundreds of variables to present the top 5‑10 matches to users, dramatically reducing search time and improving conversion. Experiments show that a 5 % price adjustment suggested by the ML model can increase booking chances by fourfold.

Workflow Automation and Operations

Airbnb uses Apache Airflow for orchestrating ETL pipelines, connecting to HDFS, Hive, Presto, S3, MySQL, and PostgreSQL. Configuration management is handled with Chef, while Mesos was evaluated but not adopted for large‑scale production due to its abstraction layer. The company continuously monitors the cost of running on AWS versus an on‑premises data center, estimating a 20‑30 % higher expense for the latter.

Overall, Airbnb’s engineering strategy emphasizes leveraging cloud services for scalability, open‑source tools for flexibility, and machine‑learning models for personalized user experiences.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

data engineeringAWScloud architecture
21CTO
Written by

21CTO

21CTO (21CTO.com) offers developers community, training, and services, making it your go‑to learning and service platform.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.