Building a Simple Stock Sentiment Analysis System with Python, Web Scraping, and Baidu AI

This tutorial walks through creating a stock sentiment analysis pipeline in Python by scraping news from a financial website, applying Baidu AI's sentiment API to classify headlines, aggregating positive and negative ratios per stock, and visualizing the results with matplotlib.

DataFunSummit
DataFunSummit
DataFunSummit
Building a Simple Stock Sentiment Analysis System with Python, Web Scraping, and Baidu AI

This article provides a step‑by‑step guide to building a simple stock sentiment analysis system using Python.

Environment preparation : Python 3.7, IDE (PyCharm), and libraries such as re, lxml, requests, baidu-aip, and matplotlib are installed via pip install baidu-aip, and the required imports are shown.

Code implementation – news crawling : Functions are defined to download news titles for a list of stock codes, parse pagination, and save each title to a local .txt file. Example snippets include:

# download specified stock news
def download_news(codes):
    for code in codes:
        print(code)
        url = "http://stock.jrj.com.cn/share," + str(code) + ",ggxw.shtml"
        parse_pages(url, code)
# parse each page
def parse_pages(url, code):
    max_page = get_max_page(url)
    for i in range(1, max_page + 1):
        if i != 1:
            url = "http://stock.jrj.com.cn/share," + str(code) + ",ggxw_" + str(i) + ".shtml"
        download_page(url, code)
# get maximum page number
def get_max_page(url):
    page_data = requests.get(url).content.decode("gbk")
    data_tree = etree.HTML(page_data)
    if page_data.find("page_newslib"):
        max_page = data_tree.xpath("//*[@class=\"page_newslib\"]//a[last()-1]/text()")
        return int(max_page[0])
    else:
        return 1
# download a single page
def download_page(url, code):
    try:
        page_data = requests.get(url).content.decode("gbk")
        data_tree = etree.HTML(page_data)
        titles = data_tree.xpath("//*[@class = \"newlist\"]//li/span/a/text()")
        for title in titles:
            title = title + "
"
            with open(str(code) + ".txt", "a") as file:
                file.write(title)
                file.flush()
    except:
        print("服务器超时")

After defining the list of stock codes (e.g., codes = [600381, 600284, 600570, 600519, 600258, 601179]) and calling download_news(codes), the news titles are saved locally.

Sentiment analysis : Using Baidu AI's NLP service ( AipNlp), each saved headline is classified for sentiment. The script aggregates positive and negative counts per stock and writes the results to stocks.csv. Key snippets:

# analyze all stocks and save results
def analyze_stocks(codes):
    df = pd.DataFrame()
    for code in codes:
        print(code)
        stock_dict = analyze(code)
        df = df.append(stock_dict, ignore_index=True)
    df.to_csv('./stocks.csv')
# sentiment analysis for a single stock
def analyze(code):
    APP_ID = 'your app id'
    API_KEY = 'your api key'
    SECRET_KEY = 'your secret key'
    positive_nums = 0
    negative_nums = 0
    count = 0
    aipNlp = AipNlp(APP_ID, API_KEY, SECRET_KEY)
    lines = open(str(code) + '.txt').readlines()
    for line in lines:
        if not line.isspace():
            line = line.strip()
            try:
                result = aipNlp.sentimentClassify(line)
                positive_prob = result['items'][0]['positive_prob']
                negative_prob = result['items'][0]['negative_prob']
                count += 1
                if positive_prob >= negative_prob:
                    positive_nums += 1
                else:
                    negative_nums += 1
            except:
                pass
    avg_positive = positive_nums / count
    avg_negative = negative_nums / count
    print('股票代码:', code, '消极比例:', avg_negative, '积极比例:', avg_positive)
    return {'股票代码': code, '消极比例': avg_negative, '积极比例': avg_positive}

Running analyze_stocks(codes) produces a CSV with the proportion of positive and negative news for each stock.

Data visualization : The CSV is read with pandas and plotted as a horizontal bar chart using matplotlib to visually compare sentiment ratios across stocks.

# visualize results
def show():
    matplotlib.rcParams['font.sans-serif'] = ['SimHei']
    matplotlib.rcParams['axes.unicode_minus'] = False
    df = pd.read_csv('./stocks.csv', index_col='股票代码', usecols=['股票代码', '消极比例', '积极比例'])
    df.plot(kind='barh', figsize=(10, 8))
    plt.show()

The resulting chart clearly shows which stocks have more bullish or bearish news coverage.

Conclusion : The tutorial demonstrates how to combine web scraping, AI‑based sentiment analysis, and visualization to assess stock market sentiment, providing a foundation for further quantitative finance experiments.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

PythonSentiment AnalysisData visualizationWeb ScrapingBaidu AIstock market
DataFunSummit
Written by

DataFunSummit

Official account of the DataFun community, dedicated to sharing big data and AI industry summit news and speaker talks, with regular downloadable resource packs.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.