Big Data 4 min read

Airbnb OpenAir Conference: Open‑Source Tools Airpal, Aerosolve, and Airflow

At Airbnb’s inaugural OpenAir conference, the company unveiled three open‑source big‑data tools—Airpal, a Presto‑based visual SQL query engine; Aerosolve, an interpretable machine‑learning engine for pricing recommendations; and Airflow, an internal platform for orchestrating and monitoring data pipelines.

Qunar Tech Salon
Qunar Tech Salon
Qunar Tech Salon
Airbnb OpenAir Conference: Open‑Source Tools Airpal, Aerosolve, and Airflow

On July 5, Airbnb held its first OpenAir technology conference, focusing on data‑driven practices in its development process, and announced three open‑source big‑data tools.

Airpal

The first tool, Airpal, is Airbnb’s most popular internal data‑analysis platform and has gathered over 900 stars on GitHub. Airpal is built on Facebook’s Presto and provides a visual, distributed SQL query engine. Airbnb now stores roughly 1.5 PB of data; traditionally this data was queried with Hive, which has several drawbacks.

First, for small queries the MapReduce overhead of Hive is excessive—for example, a simple SELECT * FROM table LIMIT 10 triggers a full MapReduce job that can take half a minute just in the map phase. Presto, used by Airpal, avoids this latency and also offers a preview of table data.

Second, Hive’s command‑line interface is unfriendly to non‑technical users. Airpal’s graphical interface requires only SQL knowledge and can export results directly to CSV, making it convenient for analysts in finance or other departments. In practice, Airbnb data scientists still prefer Hive’s CLI, while engineers and product managers performing simple queries rely on Airpal.

Airpal also integrates with the company’s LDAP system, allowing employees to log in with their corporate credentials and automatically receive appropriate data‑access permissions.

Aerosolve

Aerosolve is the machine‑learning engine that powers Airbnb’s pricing recommendation system. Unlike traditional black‑box ML engines, Aerosolve is designed to be interpretable, enabling users to see which features most influence the model’s output.

For instance, the system can explain that a listing’s price is affected by the number of reviews and the count of three‑star reviews, showing that beyond a single review, additional reviews have diminishing impact, and an excess of three‑star reviews may even be detrimental.

Airflow

The foundation of big‑data work remains the data pipeline. Airflow is Airbnb’s internal tool for launching, ordering, and monitoring these pipelines.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

Big Datamachine learningopen‑sourceAirbnbdata pipelinesOpenAir
Qunar Tech Salon
Written by

Qunar Tech Salon

Qunar Tech Salon is a learning and exchange platform for Qunar engineers and industry peers. We share cutting-edge technology trends and topics, providing a free platform for mid-to-senior technical professionals to exchange and learn.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.