Inside Airbnb’s AWS Cloud Architecture and Data Stack
Airbnb’s engineering VP Mike Curtis explains how the company leverages Amazon Web Services, a Hadoop‑based big‑data platform, and custom tools like Aerosolve, Airflow, and Airpal to power its global marketplace, enabling rapid scaling, dynamic pricing, and personalized search through extensive cloud infrastructure and machine‑learning pipelines.
Background
Mike Curtis, Vice President of Engineering at Airbnb, discussed the company’s rapid growth and the need to share its infrastructure insights with the tech community. Airbnb, founded in 2008 during the economic downturn, grew from a simple room‑rental service to a global platform operating in 190 countries and 40,000 cities with over 12 million listings.
Cloud Infrastructure
Airbnb runs entirely on Amazon Web Services (AWS). When the company launched two years after AWS, it avoided building its own data centers, allowing engineers to focus on product‑specific problems rather than hardware maintenance. Today Airbnb operates about 5,000 EC2 instances, with roughly 1,500 dedicated to application services and the remaining 3,500 used for analytics and machine‑learning workloads.
Data Platform and Big‑Data Stack
The core data platform is built on Hadoop. Airbnb initially used Amazon Elastic MapReduce, then migrated to a self‑managed Hadoop cluster after outgrowing EMR. The stack includes HDFS for storage, a Hive data warehouse, and the Presto SQL query engine for fast ad‑hoc analysis. For batch processing, MapReduce remains useful for long‑running queries.
Airbnb also developed internal tools such as Airpal (a UI for writing Presto queries) and Aerosolve , an open‑source machine‑learning engine that powers dynamic pricing recommendations. These tools are available on GitHub and integrate with Apache Spark for early‑stage experiments.
Machine Learning, Search, and Pricing
Machine learning drives search ranking, fraud detection, identity verification, and dynamic pricing. Airbnb’s recommendation engine evaluates hundreds of variables to present the top 5‑10 matches to users, dramatically reducing search time and improving conversion. Experiments show that a 5 % price adjustment suggested by the ML model can increase booking chances by fourfold.
Workflow Automation and Operations
Airbnb uses Apache Airflow for orchestrating ETL pipelines, connecting to HDFS, Hive, Presto, S3, MySQL, and PostgreSQL. Configuration management is handled with Chef, while Mesos was evaluated but not adopted for large‑scale production due to its abstraction layer. The company continuously monitors the cost of running on AWS versus an on‑premises data center, estimating a 20‑30 % higher expense for the latter.
Overall, Airbnb’s engineering strategy emphasizes leveraging cloud services for scalability, open‑source tools for flexibility, and machine‑learning models for personalized user experiences.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
21CTO
21CTO (21CTO.com) offers developers community, training, and services, making it your go‑to learning and service platform.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
