Artificial Intelligence 17 min read

Applying End-to-End Deep Learning Models for Real Estate Agent Churn Prediction

This article reviews the evolution of end-to-end deep learning models, describes how they were adapted and optimized for a real‑estate broker churn‑warning scenario, and presents experimental results showing significant improvements in AUC, KS and lift over traditional classifiers.

Beike Product & Technology
Beike Product & Technology
Beike Product & Technology
Applying End-to-End Deep Learning Models for Real Estate Agent Churn Prediction

In recent years, deep‑learning algorithms have become dominant across many fields, and end‑to‑end models are now the standard in recommendation systems because they embed feature engineering within the model and can jointly train heterogeneous network structures. This paper explores how such models can be applied to predict the churn of real‑estate agents at Beike and how model architecture optimizations improve ranking performance.

1. End‑to‑End Model Review

End‑to‑end learning jointly optimizes all parameters of a neural network, eliminating separate pre‑training or feature‑engineering steps. The model receives raw inputs and directly outputs predictions, with errors back‑propagated through every layer until convergence.

1.1 GBDT+LR

Facebook (2014) proposed using Gradient Boosted Decision Trees (GBDT) to generate discrete leaf‑node features, which are then fed into a Logistic Regression (LR) model. Although GBDT and LR are trained independently, this approach paved the way for true end‑to‑end solutions.

1.2 Wide&Deep

Proposed by Google (2016), the Wide&Deep model combines a linear “wide” component for memorization with a deep neural network for generalization, enabling both fast handling of massive historical features and powerful representation learning.

The model’s wide part is essentially a linear model (often with feature crosses), while the deep part consists of an embedding layer followed by multiple hidden layers; the two parts are trained jointly.

1.3 DIEN (Deep Interest Evolution Network)

Building on Alibaba’s DIN model, DIEN (2019) introduces a behavior sequence layer, an interest extractor layer, and an interest evolution layer with attention‑augmented GRU (AUGRU) to capture temporal dependencies in user actions.

Because real‑estate agents generate clear time‑series activity data, DIEN’s sequence modeling is well‑suited, though the original architecture was simplified for production constraints.

2. Practice in the Churn‑Warning Scenario

Agent turnover is high, so predicting churn enables targeted retention actions. The prediction horizon is set to 30 days: agents who leave within 30 days after the observation date are labeled 1, otherwise 0. Positive‑negative sample ratio is roughly 1:10, avoiding the need for imbalance handling.

2.1 Sample Construction

Only active agents (excluding managers and new hires within the 30‑day window) are considered. Samples are split 7:3 into training and validation sets; a later 30‑day period serves as the test set.

2.2 Feature Selection

Features are divided into static information (e.g., age, gender, city, credit score, rank) and daily time‑series activity (e.g., house viewings, listings, deals). Business categories further split features into groups such as basic info, viewing, listing, client, communication, performance, and negative signals.

2.3 Feature Engineering

Because the end‑to‑end model can ingest raw numeric, categorical, and sequential data, feature engineering focuses on cleaning: missing‑value imputation (zero for activity, min‑value for scores, “other” for categories), outlier clipping (98 % or 95 % quantiles), normalization (StandardScaler preferred), low‑frequency category merging, and feature filtering (IV, chi‑square, correlation) leaving ~200 useful features.

For sequential data, each time slice inherits the overall label, turning the problem into a binary classification per slice.

2.4 Model Training and Optimization

The final architecture combines dense features for static numeric data, embedding‑plus‑GRU for static categorical data, and BatchNorm‑plus‑LSTM for time‑series data. Outputs are concatenated, passed through several fully‑connected ReLU layers, and finally a sigmoid for churn probability. Dropout is applied to mitigate over‑fitting; attention mechanisms and manual feature crosses were omitted after experiments.

Hyper‑parameter tuning (network depth, learning rate, L2 regularization, batch size, epochs) yielded noticeable gains in discrimination and generalization.

2.5 Model Performance

Online evaluation shows AUC = 0.84, KS = 0.52, and a clear lift over traditional classifiers, enabling operations to identify high‑risk agents for proactive retention.

PR curve and other metrics confirm the superiority of the end‑to‑end approach.

3. Summary and Future Work

The paper reviewed end‑to‑end model evolution, detailed its adaptation to the Beike agent churn‑warning use case, and demonstrated substantial ranking improvements. Future directions include discovering new data sources, segment‑wise modeling (by city, brand, tenure), incorporating attention and richer feature crosses, exploring newer network architectures, accelerating data collection, and improving model interpretability for business stakeholders.

feature engineeringdeep learningRecommendation systemstime serieschurn predictionend-to-end models
Beike Product & Technology
Written by

Beike Product & Technology

As Beike's official product and technology account, we are committed to building a platform for sharing Beike's product and technology insights, targeting internet/O2O developers and product professionals. We share high-quality original articles, tech salon events, and recruitment information weekly. Welcome to follow us.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.