Logistic Regression vs KNN: Python Stock Trading Experiment

A Python enthusiast reproduces a Tsinghua University quantitative trading strategy, swapping K‑Nearest Neighbors for logistic regression, fetches three years of Moutai stock data, engineers features, trains and evaluates the model, and finds logistic regression slightly underperforms the original KNN benchmark.

Python Crawling & Data Mining
Python Crawling & Data Mining
Python Crawling & Data Mining
Logistic Regression vs KNN: Python Stock Trading Experiment

In this article the author, a Python enthusiast, reproduces a quantitative trading strategy from a Tsinghua University book, replacing the original K‑Nearest Neighbors model with a logistic regression model to compare performance.

import pandas as pd
import pandas_datareader.data as web
import numpy as np
from datetime import datetime, timedelta

Three years of daily data for Kweichow Moutai (ticker 600519.SS) are retrieved from Yahoo Finance.

end = datetime.date.today()
start = end - timedelta(days=365*3)
owB = web.DataReader('600519.ss', 'yahoo', start, end)

Feature engineering creates two price‑difference columns and a binary target indicating whether the next day’s closing price is higher (1) or lower (‑1).

owB['open-close'] = owB['Open'] - owB['Close']
owB['high-low'] = owB['High'] - owB['Low']
owB['target'] = np.where(owB['Close'].shift(-1) > owB['Close'], 1, -1)
owB = owB.dropna()
Stock data preview
Stock data preview

The dataset is split into features x and label y, then into training and test sets (80 % training).

x = owB[['open-close', 'high-low']]
y = owB['target']
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
x_train, x_test, y_train, y_test = train_test_split(x, y, train_size=0.8)

A logistic regression model is trained and evaluated.

lr = LogisticRegression()
lr.fit(x_train, y_train)
print(lr.score(x_train, y_train))
print(lr.score(x_test, y_test))

The training accuracy is 0.5439, while the test accuracy drops to 0.5137, slightly lower than the KNN benchmark reported in the book.

The author notes that the modest performance may stem from noisy data or the limited predictive power of simple price‑difference features for stock movements.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

machine learninglogistic regressionstock trading
Python Crawling & Data Mining
Written by

Python Crawling & Data Mining

Life's short, I code in Python. This channel shares Python web crawling, data mining, analysis, processing, visualization, automated testing, DevOps, big data, AI, cloud computing, machine learning tools, resources, news, technical articles, tutorial videos and learning materials. Join us!

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.