Artificial Intelligence 28 min read

Intelligent Investment Research and Financial Sentiment Monitoring with NLP and Big Data

This article describes how advanced natural‑language‑processing, big‑data, and deep‑learning techniques are integrated into an end‑to‑end platform for financial asset management, covering large‑scale bid‑tender text analysis, few‑shot sentiment monitoring, model architectures, data‑enhancement methods, and practical deployment results.

DataFunTalk

May 18, 2020

Intelligent Investment Research and Financial Sentiment Monitoring with NLP and Big Data

The financial asset‑management industry faces massive unstructured data growth, making information asymmetry a key competitive factor. To address this, the authors present an end‑to‑end intelligent investment‑research platform that leverages natural‑language‑processing (NLP), big‑data pipelines, and deep‑learning models to transform raw textual sources into structured insights for investment decisions.

The system architecture consists of three layers: an application layer exposing over 30 NLP services, a component layer providing core algorithms such as tokenization, named‑entity recognition, and dependency parsing, and a corpus layer that supplies both generic and domain‑specific training data. This modular design enables rapid development of downstream applications.

Two flagship applications are detailed: (1) a large‑scale bid‑tender text analysis system that automatically crawls, extracts, and structures millions of procurement documents, achieving 98% title extraction accuracy and 96% body extraction accuracy; (2) a financial sentiment‑monitoring system for WeChat groups that classifies messages into eleven categories, aggregates information without loss, and supports multi‑dimensional hotspot analysis.

Key model innovations include a tag‑embedding technique that learns distributed representations of HTML tags, an improved Transformer‑based named‑entity recognizer with bigram embeddings and relative positional attention, and a lightweight three‑layer CNN classifier enhanced by position embeddings. For few‑shot scenarios, the authors employ a two‑stage pipeline: a generic NER model followed by a domain‑specific CNN that together achieve high recall (0.97) and precision (0.96) with minimal labeled data.

Extensive experiments demonstrate that text‑augmentation methods (back‑translation, EDA, non‑core‑word replacement) consistently improve F1 scores by 5–9 points, especially when training data are scarce. A three‑stage training strategy—pre‑training word embeddings on billions of tokens, iteratively expanding the training set with high‑confidence predictions, and fine‑tuning on the target data—further boosts performance, yielding up to 48 percentage‑point gains in low‑resource settings.

The paper concludes with reflections on the importance of close collaboration between technical and domain experts, the practicality of lightweight models over heavyweight Transformers for most financial NLP tasks, and future directions such as GPT‑based augmentation and cross‑modal learning.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Big Data NLP few-shot learning text-mining financial AI

Written by

DataFunTalk

Dedicated to sharing and discussing big data and AI technology applications, aiming to empower a million data scientists. Regularly hosts live tech talks and curates articles on big data, recommendation/search algorithms, advertising algorithms, NLP, intelligent risk control, autonomous driving, and machine learning/deep learning.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.