Exploring CSMD: A China‑Specific Multimodal Stock Dataset and the LightQuant Quantitative Framework
The article introduces CSMD, a high‑quality multimodal dataset built from Chinese financial news for the CSI‑300 and SSE‑50 stocks, describes LLM‑enhanced factor extraction and rigorous data validation, presents the modular LightQuant framework, and shows through extensive experiments that CSMD and LightQuant outperform existing resources such as CMIN‑CN in stock trend prediction and backtesting.
