How a Rolling Random Forest Strategy Predicts Bitcoin’s Weekly Direction
This article explains a Python‑based rolling random‑forest classifier that uses a 30‑day training window and selected technical indicators to forecast whether Bitcoin’s price will rise or fall over the next seven days, detailing the methodology, code, back‑test results, and limitations.
Core Concepts
Random Forest Classifier
Random forest builds many independent decision trees; each tree is trained on a random subset of the data (bagging) and at each split considers a random subset of features, reducing correlation between trees. Final prediction is obtained by majority voting.
Rolling Prediction Evaluation
Rolling prediction simulates performance in a dynamic market by repeatedly retraining the model on a fixed‑size recent window and forecasting a future horizon. After each iteration the window slides forward one day.
Specific Steps
Define a training window (e.g., past 30 days).
Train the model on data within the window.
Predict the price direction for the next 7 days.
Slide the window forward by one day.
Repeat and aggregate predictions to compute overall metrics.
Python Implementation
Setup and Configuration
import pandas as pd
import numpy as np
import yfinance as yf
from sklearn.ensemble import RandomForestClassifier
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import (accuracy_score, precision_score, recall_score,
f1_score, roc_auc_score, confusion_matrix)
import warnings
TICKER = 'BTC-USD'
START_DATE = '2021-01-01'
PREDICTION_HORIZON = 7
TRAINING_WINDOW_DAYS = 30
TOP_FEATURES = [
'ROC_10', 'STOCHRSI_d', 'ADX_14', 'STOCHRSI_k', 'RSI_14',
'STOCH_k', 'ATR_14', 'EMA_20', 'STOCH_d', 'MACD',
'ULTOSC', 'BB_upper', 'SAR', 'Open_Close', 'MACD_hist'
]
# Random forest hyper‑parameters omitted for brevityData Loading and Target Definition
Historical price data are downloaded with yfinance. Technical indicators are computed for each row. The target variable is set to 1 if the price after PREDICTION_HORIZON days exceeds the current price, otherwise 0.
Rolling Prediction Loop
# --- Rolling prediction loop ---
all_predictions = []
all_actuals = []
all_probabilities = []
for i in range(start_index, end_index):
# 1. Extract current training and prediction windows
X_train_window = X_all_features.iloc[train_start_idx:train_end_idx]
Y_train_window = Y_all.iloc[train_start_idx:train_end_idx]
X_predict_point = X_all_features.iloc[[predict_feature_idx]]
# 2. Scale features
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train_window)
X_predict_scaled = scaler.transform(X_predict_point)
# 3. Train and predict
rf_model = RandomForestClassifier(
# hyper‑parameters here
)
rf_model.fit(X_train_scaled, Y_train_window)
# 4. Store results
all_predictions.append(rf_model.predict(X_predict_scaled)[0])
all_actuals.append(Y_all.iloc[actual_target_idx])
all_probabilities.append(rf_model.predict_proba(X_predict_scaled)[0, 1])Comprehensive Evaluation
After the loop, aggregated predictions and actuals are used to compute accuracy, precision, recall, F1, ROC‑AUC and the confusion matrix.
Results
Accuracy ≈ 68 % (baseline ≈ 51 %).
Precision ≈ 69 %.
Recall ≈ 68 %.
ROC‑AUC ≈ 0.75.
Confusion matrix [[122 57] / [61 128]].
These metrics indicate a statistically significant improvement over random guessing, with roughly two‑thirds of 7‑day direction predictions correct and balanced precision/recall.
Limitations
The script evaluates predictive ability only; it does not implement entry/exit signals, stop‑losses, or risk management.
Market non‑stationarity may cause degradation of performance on future data.
Practical deployment requires extensive testing, hyper‑parameter tuning, and feature analysis.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Data STUDIO
Click to receive the "Python Study Handbook"; reply "benefit" in the chat to get it. Data STUDIO focuses on original data science articles, centered on Python, covering machine learning, data analysis, visualization, MySQL and other practical knowledge and project case studies.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
