Artificial Intelligence 9 min read

Building a Simple Stock Sentiment Analysis System with Python, Web Scraping, and Baidu AI

This tutorial walks through creating a stock sentiment analysis pipeline in Python by scraping news from a financial website, applying Baidu AI's sentiment API to classify headlines, aggregating positive and negative ratios per stock, and visualizing the results with matplotlib.

DataFunSummit
DataFunSummit
DataFunSummit
Building a Simple Stock Sentiment Analysis System with Python, Web Scraping, and Baidu AI

This article provides a step‑by‑step guide to building a simple stock sentiment analysis system using Python.

Environment preparation : Python 3.7, IDE (PyCharm), and libraries such as re , lxml , requests , baidu-aip , and matplotlib are installed via pip install baidu-aip , and the required imports are shown.

Code implementation – news crawling : Functions are defined to download news titles for a list of stock codes, parse pagination, and save each title to a local .txt file. Example snippets include:

# download specified stock news
def download_news(codes):
    for code in codes:
        print(code)
        url = "http://stock.jrj.com.cn/share," + str(code) + ",ggxw.shtml"
        parse_pages(url, code)
# parse each page
def parse_pages(url, code):
    max_page = get_max_page(url)
    for i in range(1, max_page + 1):
        if i != 1:
            url = "http://stock.jrj.com.cn/share," + str(code) + ",ggxw_" + str(i) + ".shtml"
        download_page(url, code)
# get maximum page number
def get_max_page(url):
    page_data = requests.get(url).content.decode("gbk")
    data_tree = etree.HTML(page_data)
    if page_data.find("page_newslib"):
        max_page = data_tree.xpath("//*[@class=\"page_newslib\"]//a[last()-1]/text()")
        return int(max_page[0])
    else:
        return 1
# download a single page
def download_page(url, code):
    try:
        page_data = requests.get(url).content.decode("gbk")
        data_tree = etree.HTML(page_data)
        titles = data_tree.xpath("//*[@class = \"newlist\"]//li/span/a/text()")
        for title in titles:
            title = title + "\r\n"
            with open(str(code) + ".txt", "a") as file:
                file.write(title)
                file.flush()
    except:
        print("服务器超时")

After defining the list of stock codes (e.g., codes = [600381, 600284, 600570, 600519, 600258, 601179] ) and calling download_news(codes) , the news titles are saved locally.

Sentiment analysis : Using Baidu AI's NLP service ( AipNlp ), each saved headline is classified for sentiment. The script aggregates positive and negative counts per stock and writes the results to stocks.csv . Key snippets:

# analyze all stocks and save results
def analyze_stocks(codes):
    df = pd.DataFrame()
    for code in codes:
        print(code)
        stock_dict = analyze(code)
        df = df.append(stock_dict, ignore_index=True)
    df.to_csv('./stocks.csv')
# sentiment analysis for a single stock
def analyze(code):
    APP_ID = 'your app id'
    API_KEY = 'your api key'
    SECRET_KEY = 'your secret key'
    positive_nums = 0
    negative_nums = 0
    count = 0
    aipNlp = AipNlp(APP_ID, API_KEY, SECRET_KEY)
    lines = open(str(code) + '.txt').readlines()
    for line in lines:
        if not line.isspace():
            line = line.strip()
            try:
                result = aipNlp.sentimentClassify(line)
                positive_prob = result['items'][0]['positive_prob']
                negative_prob = result['items'][0]['negative_prob']
                count += 1
                if positive_prob >= negative_prob:
                    positive_nums += 1
                else:
                    negative_nums += 1
            except:
                pass
    avg_positive = positive_nums / count
    avg_negative = negative_nums / count
    print('股票代码:', code, '消极比例:', avg_negative, '积极比例:', avg_positive)
    return {'股票代码': code, '消极比例': avg_negative, '积极比例': avg_positive}

Running analyze_stocks(codes) produces a CSV with the proportion of positive and negative news for each stock.

Data visualization : The CSV is read with pandas and plotted as a horizontal bar chart using matplotlib to visually compare sentiment ratios across stocks.

# visualize results
def show():
    matplotlib.rcParams['font.sans-serif'] = ['SimHei']
    matplotlib.rcParams['axes.unicode_minus'] = False
    df = pd.read_csv('./stocks.csv', index_col='股票代码', usecols=['股票代码', '消极比例', '积极比例'])
    df.plot(kind='barh', figsize=(10, 8))
    plt.show()

The resulting chart clearly shows which stocks have more bullish or bearish news coverage.

Conclusion : The tutorial demonstrates how to combine web scraping, AI‑based sentiment analysis, and visualization to assess stock market sentiment, providing a foundation for further quantitative finance experiments.

Pythonsentiment analysisData VisualizationWeb Scrapingbaidu-aistock market
DataFunSummit
Written by

DataFunSummit

Official account of the DataFun community, dedicated to sharing big data and AI industry summit news and speaker talks, with regular downloadable resource packs.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.