FinKario: Event‑Enhanced Financial Knowledge Graphs Boost A‑Share Sharpe Ratio to 4.9
This article reviews the FinKario paper, which introduces an event‑augmented financial knowledge graph and a two‑stage RAG retrieval strategy that together enable real‑time knowledge updates and efficient integration of long‑form research reports, yielding a Sharpe ratio of 4.9 and outperforming baseline LLMs and institutional strategies in back‑testing.
Individual investors face information overload and limited analytical ability in a market dominated by institutions; stock research reports are essential but long, unstructured texts challenge large language models (LLMs). Moreover, market events such as earnings releases or policy changes evolve faster than traditional knowledge bases can be updated.
The paper defines two core problems: (1) how to automatically construct and continuously update a financial knowledge graph that captures rapid market events, and (2) how to enable LLMs to efficiently integrate the context of lengthy research reports.
FinKario dataset construction follows a four‑module pipeline. First, reports from Eastmoney covering Aug 2024–Mar 2025 are converted to standardized Markdown using the open‑source tool MinerU2, with non‑informative sections removed. Second, schema design creates an attribute sub‑graph (based on CFA handbooks and JPMorgan templates) and an event sub‑graph (using a top‑down approach derived from a Wisconsin academic‑report template and refined with the FIBO ontology). Third, knowledge filling extracts entities and relations for each timestamp via prompt‑driven LLM extraction, followed by entity normalization (e.g., merging “比亚迪股份” and “比亚迪汽车” into “比亚迪”), attribute completion from Tushare, and placeholder correction by re‑querying the original text. Fourth, a three‑stage quality‑control process ensures reliability of the combined graph.
FinKario‑RAG retrieval strategy vectorizes the graph in two steps: local embeddings of entities and relations are produced by a graph encoder Phi, and a read‑out function rho aggregates them into a global graph vector stored in a vector database. Retrieval proceeds in two stages: a coarse search encodes the user query q as h_q = Psi(q) and matches stock‑date vectors to obtain top‑ k_c candidates; a fine search then expands these candidates with related industry and market‑cap entities to produce top‑ k_f results, which are mapped back to a sub‑graph. Finally, an analysis LLM consumes the sub‑graph and the query to output a prediction label (up/down), confidence, and textual justification.
Experiments use Eastmoney reports, Tushare price data, and Wind industry classifications (Aug 2024–Mar 2025). Evaluation metrics include annualized return (ARR), volatility (VOL), Sharpe ratio (SR), maximum drawdown (MDD), Calmar ratio (CR) and prediction accuracy (ACC). Baselines comprise market indices, general LLMs (Qwen‑3‑8B, GPT‑4o‑mini), finance‑tuned LLMs (FinMA, FinGPT) and ten top‑tier broker strategies. Results show that FinKario‑RAG achieves cumulative net‑value growth that outpaces baselines, with ARR = 2.633 (30.8% higher than the best institutional strategy), SR = 4.926 (58.1% higher than the next best model), CR = 15.315 (+24.4%), and ACC = 0.581 (slightly above the best baselines). Ablation studies reveal that removing the event graph drops ARR by 87.2% and SR by 81.1%, confirming its central role, while removing the attribute graph reduces ARR by 15.3%, highlighting the complementary nature of the dual‑graph design. Retrieval comparisons demonstrate that FinKario‑RAG’s ARR and SR far exceed traditional RAG (ARR 0.377, SR 0.758) and LightRAG (ARR 0.821, SR 1.313).
Case study shows FinKario‑RAG concentrating allocations in high‑growth sectors such as electrical equipment, semiconductors, and healthcare, aligning with the tech‑driven rally in Feb 2025, whereas baseline models distribute across sectors and achieve lower returns. The system also provides precise entity identification (e.g., stock code 688139.SH) and interpretable rationales like “overseas market expansion boosts brand awareness,” outperforming the vague suggestions of other models.
Overall, the event‑enhanced knowledge graph and the two‑stage RAG pipeline enable real‑time, semantically rich financial knowledge retrieval, leading to a Sharpe ratio of 4.9 on A‑shares and demonstrating the value of integrating dynamic events into LLM‑driven investment analysis.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
