How Association Rules and Machine Learning Reveal Stock Market Industry Linkages
This report analyzes 2018 AMAC industry index data using association‑rule mining and several machine‑learning models (Apriori, KNN, Bayesian, decision tree, neural network) to uncover sector linkages, predict index and stock movements, compare model performance, and suggest future improvements.
Project Background
In the securities market there is a large amount of historical transaction data. With the rise of big data, data‑mining techniques have attracted attention in the stock market. Based on literature, our group studied industry sector linkage, industry index rise/fall prediction, and individual stock price prediction using various machine‑learning algorithms.
1. Algorithm Introduction
Association rules analyze data to find relationships between events in massive datasets, uncovering valuable hidden associations. Because industry growth cycles and monetary/fiscal policies create different opportunities, industry linkage phenomena appear. Using association rules in industry sectors helps understand stock‑market industry linkage patterns.
We use support and confidence as metrics. Support measures the frequency of a rule, confidence measures its strength.
Apriori algorithm: first find all frequent itemsets with support above a minimum threshold, then generate association rules and filter strong rules by confidence.
2. Data Collection
We downloaded AMAC industry index historical daily returns from Tonghuashun IFIND for 2018‑01‑02 to 2018‑08‑29.
3. Data Cleaning
Since sector index daily changes are small (‑2% to 2%), we divided daily returns into six stages and processed the data in R.
4. Data Modeling
Core association‑rule code:
Using support 0.2 and confidence 0.8 we found two rules:
When the chemical product index change is ‑1% ≤ Δ < 0%, the construction index change is likely also ‑1% ≤ Δ < 0%.
When the paper industry index change is ‑1% ≤ Δ < 0%, the construction index change is likely also ‑1% ≤ Δ < 0%.
Relaxing to support 0.15 and confidence 0.8 yields 27 rules; the top 10 show significant industry linkage with bidirectional symmetry.
We then selected high‑frequency sectors (e.g., construction) and chose two stocks (Hai Bo Heavy Industry, Ya Xiang Integration) for price prediction.
Part 2: Machine Algorithms and Price Prediction
Algorithm Overview
Traditional generative models (ARMA, GARCH) require large, well‑distributed samples. Data‑driven models (neural networks, SVM, K‑NN, decision trees) have lower sample requirements and can perform nonlinear intelligent prediction.
KNN: finds the k nearest training samples and classifies by majority vote.
Bayesian model: predicts class probability based on Bayes theorem.
Decision tree: supervised learning algorithm that splits data to form classification rules.
Neural network: consists of input, hidden, and output layers; learns complex patterns.
2. Data Acquisition
Again using Tonghuashun IFIND to download data.
3. Data Cleaning
Selected seven market indicators (open, high, low, close, change, volume, turnover) as features; label is next‑day up/down (1/0). Standardized data in R and removed missing values.
4. Data Modeling
Randomly split training and test sets.
Train models using decision tree, Bayesian, KNN, and neural network.
Perform cross‑validation and tune hyper‑parameters.
Compare accuracy; neural network has highest training accuracy but overfitting is observed.
Prediction results show KNN performs best for the construction index (test accuracy 0.63), neural network performs best for Ya Xiang Integration, and decision tree performs worst.
Variable importance from the decision tree indicates the opening price is the most influential factor for Ya Xiang Integration.
Transaction amount is a key factor for Hai Bo Heavy Industry.
5. Conclusion
Machine‑learning algorithms achieve moderate performance in predicting index and stock movements; overall test accuracy is around 50‑60%. Neural network shows the highest training accuracy but suffers from overfitting; KNN and Bayesian models also perform well, while decision tree performs poorly. Future work should focus on improving neural‑network models, expanding data volume, and adding technical indicators.
Model Improvement
Data volume can be increased beyond 2018, include weekly/monthly patterns, and add technical indicators (Bollinger Bands, KDJ, PSY) to improve accuracy. Overfitting in neural networks needs to be addressed.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Python Crawling & Data Mining
Life's short, I code in Python. This channel shares Python web crawling, data mining, analysis, processing, visualization, automated testing, DevOps, big data, AI, cloud computing, machine learning tools, resources, news, technical articles, tutorial videos and learning materials. Join us!
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
