How Airbnb Builds Machine Learning Models to Detect Fraudulent Transactions
Airbnb’s trust and safety team uses a series of machine‑learning models—starting from defining the prediction target, through data sampling and feature engineering, to evaluating precision and recall—to identify and mitigate fraud risks such as chargebacks across its global peer‑to‑peer rental platform.
Founded in 2008 in San Francisco, Airbnb provides a global online platform for users to list and book unique accommodations, operating a peer‑to‑peer rental model that now spans 34,000 cities in 190 countries with over one million customers.
Despite secure payment flows, the P2P model remains vulnerable to fraud, including stolen cards, inaccurate listings, and chargebacks; to safeguard the community, Airbnb’s trust and safety data team has built various machine‑learning models to detect transaction risk.
The first step in model building is defining the prediction goal—e.g., scoring whether a (fictional) person is “bad.” Deciding who to score (new entrants only or all users) and how often to update scores fundamentally shapes downstream decisions.
Scoring only new entrants fails to track later changes, while updating scores after every minor event creates excessive workload; the optimal approach combines both by updating scores after significant events such as new alliances or asset changes.
Constructing the training set requires sampling the data appropriately. Row‑based sampling works for generic tabular data but can bias toward a single person’s records; therefore, person‑based sampling is preferred to ensure each individual contributes representative data.
Feature learning follows, involving normalization (e.g., converting absolute soldier counts to per‑year growth to compare leadership speed) and handling categorical attributes. One‑hot encoding vectorizes simple categories, while conditional‑probability coding (CP‑coding) better captures multi‑level hierarchical features by mapping each level to a numeric value and reducing noise through weighted averages.
Finally, model performance is evaluated using precision (the proportion of predicted “bad” persons who truly are bad) and recall (the proportion of actual “bad” persons correctly identified). Balancing these metrics, adding richer features, and optimizing tree pruning can improve both precision and recall.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
21CTO
21CTO (21CTO.com) offers developers community, training, and services, making it your go‑to learning and service platform.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
