How AI and Machine Learning Transform Investment Budget Forecasting

Based on a public‑cloud client’s real‑world project, this article details how combining AI large‑model prompting with machine‑learning techniques—first pure large‑model forecasts, then local weighted linear regression, and finally XGBoost—enables automated, accurate investment budget prediction and allocation, reducing analyst workload and scaling to millions of daily calls.

Alibaba Cloud Developer
Alibaba Cloud Developer
Alibaba Cloud Developer
How AI and Machine Learning Transform Investment Budget Forecasting

Public‑cloud client invests in overseas campaigns and wants to maximize returns by allocating budgets across multiple channels. Analysts need a system that predicts optimal daily budgets, reduces manual effort, and eventually automates the entire pre‑ and post‑investment workflow.

Business Investment Plan

Version 1: Large Model Prediction

The client provided sample data for a proof‑of‑concept. Using prompt engineering, the large language model was taught the historical investment returns and analyst habits to generate budget suggestions. Sample prompts include role definition, task description, constraints, and output format.

# Role
You are an investment analyst.
# Task
Given 7 days of historical variables A, B, …, output the budget for day 8.
# Constraints
If budget increases while variable A decreases, reduce the budget, etc.
# Output Format
Budget: xxx Yuan

Version 2: Locally Weighted Linear Regression + Large Model

When the client supplied realistic data with over 20 variables and millions of rows, pure prompting could not capture the complex relationships. A locally weighted linear regression (LWLR) model was introduced to predict the overall daily budget, which was then fed to the large model for channel‑level allocation.

Weight Calculation

For each query point, a Gaussian kernel assigns weights to training samples based on distance.

Weighted Loss Function

The loss emphasizes samples near the query point.

Closed‑form Solution

The optimal parameters are obtained via matrix operations.

Prediction Output

Version 3: XGBoost + Large Model

With fully real data containing many missing values, the team used XGBoost to predict the investment amount for each channel. The overall daily budget is the sum of these predictions, which is then adjusted with the LWLR output.

XGBoost Algorithm Principle

XGBoost builds an ensemble of decision trees using gradient boosting. The objective combines a loss term and regularization to control model complexity.

Second‑order Taylor Expansion

The loss is approximated by a second‑order Taylor expansion, keeping first‑order gradients and second‑order Hessians.

Tree Structure Learning

Splits are chosen greedily based on gain; leaf weights are solved in closed form.

Key Optimizations

Regularization (L1/L2) and leaf‑node limits reduce over‑fitting.

Second‑order gradient information accelerates convergence and improves accuracy.

Engineering tricks: block‑structured storage, weighted quantile sketch for split finding, sparse aware handling of missing values.

Feature Engineering

Time‑based and periodic features (year, month, quarter, week, day‑of‑week, sinusoidal encodings) and rolling averages of key metrics (revenue, influence, purchases, shares) over 3, 7, and 30 days are generated for XGBoost.

# Add date and periodic features
df['year'] = df['day'].dt.year
df['month'] = df['day'].dt.month
df['quarter'] = df['day'].dt.quarter
df['week'] = df['day'].dt.isocalendar().week
df['day_of_week'] = df['day'].dt.dayofweek
df['day_of_month'] = df['day'].dt.day
df['day_sin'] = np.sin(2*np.pi*df['day_of_week']/7)
df['day_cos'] = np.cos(2*np.pi*df['day_of_week']/7)
# Rolling mean features
for col in ['revenue','influence','purchases','shares']:
    df[f'{col}_mean_3'] = df[col].rolling(3, min_periods=1).mean()
    df[f'{col}_mean_7'] = df[col].rolling(7, min_periods=1).mean()
    df[f'{col}_mean_30'] = df[col].rolling(30, min_periods=1).mean()

Compute Adjustment Values and Coefficients

After training XGBoost, the sum of channel predictions gives a daily total. The difference between this total and the LWLR total is halved to form an adjustment value, which is then distributed across channels based on contribution ratios.

Per‑channel unit cost, recent 7‑day overall cost, and normalized adjustment coefficients are calculated (images omitted for brevity).

Large Model Allocates Adjustment Values per Adset

The large model learns analyst behavior and refines the adjusted budgets, ensuring that changes stay within predefined limits (e.g., not exceeding a certain percentage of the previous day's spend).

Conclusion

Large models still have limited capability in pure mathematical fitting for chaotic investment scenarios, but combining them with machine‑learning models such as LWLR and XGBoost markedly improves prediction accuracy and interpretability. The deployed solution is already running in production, delivering strong results and paving the way for broader adoption in similar domains.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

AIXGBoostInvestment ForecastingLocally Weighted Regression
Alibaba Cloud Developer
Written by

Alibaba Cloud Developer

Alibaba's official tech channel, featuring all of its technology innovations.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.