IKNet: Explainable Stock Price Forecasting with News Keywords and Technical Indicators

IKNet combines FinBERT‑derived news keywords with technical‑indicator time series, uses SHAP to quantify each feature's impact, and achieves a 32.9% RMSE reduction and 18.5% higher cumulative returns on the S&P 500 (2015‑2024) compared with RNN and Transformer baselines, while providing fine‑grained, context‑aware explanations of price movements.

Bighead's Algorithm Notes
Bighead's Algorithm Notes
Bighead's Algorithm Notes
IKNet: Explainable Stock Price Forecasting with News Keywords and Technical Indicators

Background

Accurate stock‑price prediction is essential for profit maximisation, asset‑allocation optimisation and risk management. Market dynamics are driven by geopolitics, macro‑economics and investor sentiment, creating non‑linear relationships that traditional linear models (ARIMA, GARCH, technical indicators) cannot capture. Machine‑learning models (SVM, Random Forest) handle non‑linearity but ignore temporal dependencies, while deep‑learning models (RNN, Transformer) focus on structured numeric inputs and struggle to integrate unstructured news text. Existing news‑driven approaches typically use document‑level sentiment scores or averaged embeddings, which hide the contribution of individual words and limit interpretability.

Problem Definition

Insufficient interpretability: document‑level sentiment or averaged embeddings prevent quantifying the effect of single keywords.

Inefficient information integration: fusion of structured technical indicators with unstructured news does not exploit semantic links between keywords and price movements.

Weak dynamic adaptability: difficulty in reliably capturing the impact of sudden external events (e.g., policy announcements, breaking news) during high‑volatility periods.

Method

FinBERT Keyword Extraction

FinBERT, a pre‑trained financial language model, processes each news article. For every token a significance score is computed as the gradient norm of the model output with respect to the token embedding: s_i = \|\partial p / \partial e_i\|. Tokens are ranked by average significance and the top‑n tokens are selected as keywords; their embeddings are recomputed with FinBERT.

Keyword Encoding Module

Each selected keyword embedding passes through an independent non‑linear projection layer (linear transformation → ReLU → Dropout) to preserve feature separability. A GRU then captures temporal dependencies among the keyword sequence, producing the news feature vector h_{news}.

Technical Indicator Encoding Module

Technical indicators are derived from Yahoo Finance OHLCV data (17 indicators such as SMA, RSI, MACD, Bollinger Bands) over the past T days. A bidirectional LSTM processes the indicator sequence, and average pooling yields the technical feature vector h_{price}.

Feature Fusion and Prediction

The news vector h_{news} and technical vector h_{price} are concatenated into h_{combined}, passed through a non‑linear projection layer (linear → ReLU → Dropout), and fed to a regression head that outputs the next‑day closing price.

SHAP Explainability Analysis

Kernel SHAP approximates the model output as a linear combination of input features (keywords + technical indicators): f(z) \approx \phi_0 + \sum_i \phi_i \cdot z_i, where \phi_i denotes the SHAP value indicating the contribution direction and magnitude of feature i. This provides fine‑grained attribution of price predictions to individual keywords and indicator components.

Experiments

Dataset and Settings

The evaluation uses S&P 500 data from 2015‑2024, covering stable, trade‑war and pandemic periods. News articles are collected from Google News, filtered to ~2,500 pieces, and full texts are extracted via HTML parsing. Technical indicators are computed from Yahoo Finance OHLCV data. A rolling‑window validation (3 years training + 1 year testing, 7‑fold) prevents data leakage.

Baselines

Compared baselines include traditional Ridge regression, sequence models (LSTM, Transformer, TCN), and news‑fusion models (FinBERT‑Attention‑LSTM, FinBERT‑Sentiment‑LSTM) that rely on document‑level sentiment or embeddings.

Results

Keyword Quantity Optimisation

Using 17 keywords yields the best RMSE (61.107) and SMAPE (1.340). Both more and fewer keywords degrade performance (e.g., 21 keywords increase RMSE to 48.906), indicating a trade‑off between information richness and model complexity.

Prediction Performance Comparison

IKNet consistently outperforms baselines across years; for 2024, RMSE = 58.006 and SMAPE = 0.850, a 32.9% RMSE reduction versus Ridge and a 33.8% reduction versus FinBERT‑Sentiment‑LSTM. Visualisation shows IKNet closely tracks actual prices during high‑volatility periods.

Input Ablation Study

Models using only technical data or only keywords perform worse. The full model (technical + keywords) reduces 2024 RMSE from 126.405 (single‑input) to 58.006. Diebold‑Mariano tests (p < 0.05) confirm the statistical significance of the fusion advantage.

Investment Return Analysis

IKNet achieves the highest cumulative profit and Sharpe ratio in most years (e.g., 2024 profit = 23.18 %, Sharpe = 1.806). During the 2022 high‑volatility period (HV = 73.79 %), IKNet loses only 5.579 % versus a Long‑only loss of 13.698 %.

SHAP Explainability Validation

Keywords such as “tumbled” and “plunged” obtain higher SHAP values than technical indicators like SMA deviation, confirming that news keywords drive predictions. Negative words (e.g., “layoffs”, “hurt”) substantially depress the predicted price, matching observed market drops (S&P 500 down 3.0 %).

deep learningSHAPstock predictionFinBERTnews keywordstechnical indicators
Bighead's Algorithm Notes
Written by

Bighead's Algorithm Notes

Focused on AI applications in the fintech sector

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.