Building a Simple Stock Sentiment Analysis System with Python, Web Scraping, and Baidu AI
This tutorial walks through creating a stock sentiment analysis pipeline in Python by scraping news from a financial website, applying Baidu AI's sentiment API to classify headlines, aggregating positive and negative ratios per stock, and visualizing the results with matplotlib.
This article provides a step‑by‑step guide to building a simple stock sentiment analysis system using Python.
Environment preparation : Python 3.7, IDE (PyCharm), and libraries such as re , lxml , requests , baidu-aip , and matplotlib are installed via pip install baidu-aip , and the required imports are shown.
Code implementation – news crawling : Functions are defined to download news titles for a list of stock codes, parse pagination, and save each title to a local .txt file. Example snippets include:
# download specified stock news
def download_news(codes):
for code in codes:
print(code)
url = "http://stock.jrj.com.cn/share," + str(code) + ",ggxw.shtml"
parse_pages(url, code) # parse each page
def parse_pages(url, code):
max_page = get_max_page(url)
for i in range(1, max_page + 1):
if i != 1:
url = "http://stock.jrj.com.cn/share," + str(code) + ",ggxw_" + str(i) + ".shtml"
download_page(url, code) # get maximum page number
def get_max_page(url):
page_data = requests.get(url).content.decode("gbk")
data_tree = etree.HTML(page_data)
if page_data.find("page_newslib"):
max_page = data_tree.xpath("//*[@class=\"page_newslib\"]//a[last()-1]/text()")
return int(max_page[0])
else:
return 1 # download a single page
def download_page(url, code):
try:
page_data = requests.get(url).content.decode("gbk")
data_tree = etree.HTML(page_data)
titles = data_tree.xpath("//*[@class = \"newlist\"]//li/span/a/text()")
for title in titles:
title = title + "\r\n"
with open(str(code) + ".txt", "a") as file:
file.write(title)
file.flush()
except:
print("服务器超时")After defining the list of stock codes (e.g., codes = [600381, 600284, 600570, 600519, 600258, 601179] ) and calling download_news(codes) , the news titles are saved locally.
Sentiment analysis : Using Baidu AI's NLP service ( AipNlp ), each saved headline is classified for sentiment. The script aggregates positive and negative counts per stock and writes the results to stocks.csv . Key snippets:
# analyze all stocks and save results
def analyze_stocks(codes):
df = pd.DataFrame()
for code in codes:
print(code)
stock_dict = analyze(code)
df = df.append(stock_dict, ignore_index=True)
df.to_csv('./stocks.csv') # sentiment analysis for a single stock
def analyze(code):
APP_ID = 'your app id'
API_KEY = 'your api key'
SECRET_KEY = 'your secret key'
positive_nums = 0
negative_nums = 0
count = 0
aipNlp = AipNlp(APP_ID, API_KEY, SECRET_KEY)
lines = open(str(code) + '.txt').readlines()
for line in lines:
if not line.isspace():
line = line.strip()
try:
result = aipNlp.sentimentClassify(line)
positive_prob = result['items'][0]['positive_prob']
negative_prob = result['items'][0]['negative_prob']
count += 1
if positive_prob >= negative_prob:
positive_nums += 1
else:
negative_nums += 1
except:
pass
avg_positive = positive_nums / count
avg_negative = negative_nums / count
print('股票代码:', code, '消极比例:', avg_negative, '积极比例:', avg_positive)
return {'股票代码': code, '消极比例': avg_negative, '积极比例': avg_positive}Running analyze_stocks(codes) produces a CSV with the proportion of positive and negative news for each stock.
Data visualization : The CSV is read with pandas and plotted as a horizontal bar chart using matplotlib to visually compare sentiment ratios across stocks.
# visualize results
def show():
matplotlib.rcParams['font.sans-serif'] = ['SimHei']
matplotlib.rcParams['axes.unicode_minus'] = False
df = pd.read_csv('./stocks.csv', index_col='股票代码', usecols=['股票代码', '消极比例', '积极比例'])
df.plot(kind='barh', figsize=(10, 8))
plt.show()The resulting chart clearly shows which stocks have more bullish or bearish news coverage.
Conclusion : The tutorial demonstrates how to combine web scraping, AI‑based sentiment analysis, and visualization to assess stock market sentiment, providing a foundation for further quantitative finance experiments.
DataFunSummit
Official account of the DataFun community, dedicated to sharing big data and AI industry summit news and speaker talks, with regular downloadable resource packs.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.