How Association Rules and Machine Learning Reveal Stock Market Industry Linkages

This report analyzes 2018 AMAC industry index data using association‑rule mining and several machine‑learning models (Apriori, KNN, Bayesian, decision tree, neural network) to uncover sector linkages, predict index and stock movements, compare model performance, and suggest future improvements.

Python Crawling & Data Mining
Python Crawling & Data Mining
Python Crawling & Data Mining
How Association Rules and Machine Learning Reveal Stock Market Industry Linkages

Project Background

In the securities market there is a large amount of historical transaction data. With the rise of big data, data‑mining techniques have attracted attention in the stock market. Based on literature, our group studied industry sector linkage, industry index rise/fall prediction, and individual stock price prediction using various machine‑learning algorithms.

1. Algorithm Introduction

Association rules analyze data to find relationships between events in massive datasets, uncovering valuable hidden associations. Because industry growth cycles and monetary/fiscal policies create different opportunities, industry linkage phenomena appear. Using association rules in industry sectors helps understand stock‑market industry linkage patterns.

We use support and confidence as metrics. Support measures the frequency of a rule, confidence measures its strength.

Apriori algorithm: first find all frequent itemsets with support above a minimum threshold, then generate association rules and filter strong rules by confidence.

2. Data Collection

We downloaded AMAC industry index historical daily returns from Tonghuashun IFIND for 2018‑01‑02 to 2018‑08‑29.

3. Data Cleaning

Since sector index daily changes are small (‑2% to 2%), we divided daily returns into six stages and processed the data in R.

4. Data Modeling

Core association‑rule code:

Using support 0.2 and confidence 0.8 we found two rules:

When the chemical product index change is ‑1% ≤ Δ < 0%, the construction index change is likely also ‑1% ≤ Δ < 0%.

When the paper industry index change is ‑1% ≤ Δ < 0%, the construction index change is likely also ‑1% ≤ Δ < 0%.

Relaxing to support 0.15 and confidence 0.8 yields 27 rules; the top 10 show significant industry linkage with bidirectional symmetry.

We then selected high‑frequency sectors (e.g., construction) and chose two stocks (Hai Bo Heavy Industry, Ya Xiang Integration) for price prediction.

Part 2: Machine Algorithms and Price Prediction

Algorithm Overview

Traditional generative models (ARMA, GARCH) require large, well‑distributed samples. Data‑driven models (neural networks, SVM, K‑NN, decision trees) have lower sample requirements and can perform nonlinear intelligent prediction.

KNN: finds the k nearest training samples and classifies by majority vote.

Bayesian model: predicts class probability based on Bayes theorem.

Decision tree: supervised learning algorithm that splits data to form classification rules.

Neural network: consists of input, hidden, and output layers; learns complex patterns.

2. Data Acquisition

Again using Tonghuashun IFIND to download data.

3. Data Cleaning

Selected seven market indicators (open, high, low, close, change, volume, turnover) as features; label is next‑day up/down (1/0). Standardized data in R and removed missing values.

4. Data Modeling

Randomly split training and test sets.

Train models using decision tree, Bayesian, KNN, and neural network.

Perform cross‑validation and tune hyper‑parameters.

Compare accuracy; neural network has highest training accuracy but overfitting is observed.

Prediction results show KNN performs best for the construction index (test accuracy 0.63), neural network performs best for Ya Xiang Integration, and decision tree performs worst.

Variable importance from the decision tree indicates the opening price is the most influential factor for Ya Xiang Integration.

Transaction amount is a key factor for Hai Bo Heavy Industry.

5. Conclusion

Machine‑learning algorithms achieve moderate performance in predicting index and stock movements; overall test accuracy is around 50‑60%. Neural network shows the highest training accuracy but suffers from overfitting; KNN and Bayesian models also perform well, while decision tree performs poorly. Future work should focus on improving neural‑network models, expanding data volume, and adding technical indicators.

Model Improvement

Data volume can be increased beyond 2018, include weekly/monthly patterns, and add technical indicators (Bollinger Bands, KDJ, PSY) to improve accuracy. Overfitting in neural networks needs to be addressed.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

data miningPredictionR languageassociation rulesstock marketindustry linkage
Python Crawling & Data Mining
Written by

Python Crawling & Data Mining

Life's short, I code in Python. This channel shares Python web crawling, data mining, analysis, processing, visualization, automated testing, DevOps, big data, AI, cloud computing, machine learning tools, resources, news, technical articles, tutorial videos and learning materials. Join us!

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.