Big Data 4 min read

How Python Powers Financial Research: From Web Scraping to Large‑Scale Text Analysis

The "Academic Path" lecture series at Tsinghua University taught non‑CS students practical Python techniques—including Requests, AJAX, Selenium, PDF parsing, and multiprocessing—to extract, process, and analyze financial data, culminating in hands‑on case studies of ESG news, A‑share reports, and IFC databases.

Data Party THU
Data Party THU
Data Party THU
How Python Powers Financial Research: From Web Scraping to Large‑Scale Text Analysis

Event Overview

On a recent date, the "Academic Path" lecture series on Python research applications was co‑hosted by Tsinghua University Wudaokou Finance School Graduate Association and the Tsinghua Data Science Association. Aimed at students without a computer‑science background, the three‑part series offered a cross‑disciplinary toolbox covering basic, advanced, and hands‑on sessions.

First Session – Static Web Scraping with Requests

The inaugural talk introduced the Python Requests library and demonstrated precise extraction of static webpages using the Sina Finance ESG channel as an example, guiding beginners through the first step of data acquisition.

Second Session – Dynamic Data Capture and PDF Parsing

The second lecture explained AJAX principles, showing how to use browser developer tools (F12) to locate JSON endpoints and retrieve backend data. Selenium was presented as a fallback for complex interactive pages. The session also covered key techniques with PyMuPDF and pdfplumber for parsing financial PDF documents.

Final Session – Large‑Scale Processing and Text Mining

The concluding session focused on a real‑world research case: bulk downloading and processing A‑share annual reports from the China Securities Journal (Juchao). Multiprocessing was employed to dramatically speed up crawling, followed by multi‑process text analysis, word‑frequency statistics, and sentiment analysis. An additional example demonstrated Selenium‑based data collection from the International Finance Corporation (IFC) database.

Takeaways

The series addressed common challenges in interdisciplinary research—difficulty obtaining data and low processing efficiency—by providing systematic programming training and practical case studies. Participants left with hands‑on experience using Python as a research productivity tool and a stronger awareness of how digital methods can drive academic innovation.

Pythonweb-scrapingseleniumfinancial datamultiprocessing
Data Party THU
Written by

Data Party THU

Official platform of Tsinghua Big Data Research Center, sharing the team's latest research, teaching updates, and big data news.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.