Artificial Intelligence 9 min read

Predictive Modeling of Student Renewal and Refund Intentions Using Logistic Regression in Online Education

This article describes how logistic regression models are built, iterated, and applied to predict student renewal and refund behavior in an online school, detailing data collection, feature engineering, model training, evaluation, and how the predictions are used to recommend interventions for teachers.

Xueersi Online School Tech Team

Mar 13, 2020

Predictive Modeling of Student Renewal and Refund Intentions Using Logistic Regression in Online Education

With the rise of big data and the deepening of educational informatization, learning analytics and prediction have become hot topics in the education sector. This article explores how to predict student renewal and refund intentions using logistic regression (LR) models, and how the predictions can be recommended to tutoring teachers for early intervention.

Prediction Process

The workflow for renewal and refund prediction is illustrated in the diagram below.

Data Investigation

For renewal prediction, sample selection is based on different semester business performance, and feature selection involves comparing box plots of features across semesters. The following code (illustrated as an image) shows how to plot these features, where train_df is the sample dataframe, title_name is the plot title, and column_min is the feature list.

Two historical semester samples are shown, highlighting features with small distribution differences such as interact_answer_ratio and late_ratio .

Model Iteration

Selected features are fed into the LR model for subject‑wise training, yielding the performance shown below.

Analysis reveals feature redundancy (e.g., attendance and replay rates) and abnormal feature weights (e.g., sign‑in rate, early‑leave rate). After cleaning, three major feature categories remain.

Using these features for junior high mathematics, model iteration increased recall from 69.63% to 82.72% while keeping accuracy stable.

Model Prediction

Feature processing methods include:

Numeric features: scaling/normalization, statistics (max, min, mean, std), discretization, histogram distribution.

Categorical features: one‑hot encoding, hash encoding to embeddings, histogram mapping.

Temporal features: continuous values (duration, interval) and discrete values (time of day, day of week, weekday/weekend).

Text features: bag‑of‑words, n‑grams, TF‑IDF.

Composite features: concatenation, model fusion.

For refund prediction, feature slicing with one‑hot encoding is demonstrated. Numeric features are divided into quartiles; for example, attend_ratio is split into four bins.

Resulting one‑hot vectors are shown (e.g., attend_ratio=0.3 yields (0,1,0,0)).

Dataset splitting code returns the processed dataset for prediction.

Model training uses sklearn.linear_model.LogisticRegression, saving the trained model to a file.

Prediction applies the same one‑hot encoding to new data and feeds it to the saved model.

Cross‑validation results show an accuracy of 89.24% and recall of 91.86%.

Result Recommendation

For refund prediction, students with high predicted refund probability are recommended to the corresponding tutor, along with the top‑10 important LR features for interpretability. An example table shows each student’s return_pro and the hit features.

Recommendation Application

The renewal model is already deployed for high‑school tutors; students with low renewal intent are pushed to the tutor’s OA system with reasons for one‑on‑one communication.

Recommendation Effect

Click‑through rate (teacher clicks / teacher views) for a sample week exceeds 50%.

Future Plans

Future work will continue to empower key teaching metrics, such as early intervention for potential refunds and predicting conversion or expansion intentions, and will build segmentation models to support personalized tutoring services.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

machine learning feature engineering logistic regression Education Analytics student behavior prediction

Written by

Xueersi Online School Tech Team

The Xueersi Online School Tech Team, dedicated to innovating and promoting internet education technology.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.