How to Scrape GDP Data with Python and Save to CSV in Minutes
This article demonstrates how to use Python's requests, lxml, and pandas libraries to crawl GDP data from a website, parse the HTML tables, and efficiently write the extracted rankings, regions, GDP values, and years into a CSV file, providing a complete, runnable example for web scraping beginners.
Introduction
The author received a request to modify a Python web‑scraping script that originally used pandas to fetch GDP data but stored the results in an inconvenient way. The goal is to retrieve ranking, region, GDP, and year information from a target site and write it directly to a CSV file.
import requests
from lxml import etree
import csv
import time
import pandas as pd
def gdpData(page):
url = f'https://www.hongheiku.com/category/gdjsgdp/page/{page}'
headers = {'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/109.0.0.0 Safari/537.36'}
resp = requests.get(url, headers=headers)
# print(resp.text)
data(resp.text)
file = open('data.csv', mode='a', encoding='utf-8', newline='')
csv_write = csv.DictWriter(file, fieldnames=['排名', '地区', 'GDP', '年份'])
csv_write.writeheader()
def data(text):
e = etree.HTML(text)
lst = e.xpath('//*[@id="tablepress-48"]/tbody/tr[@class="even"]')
for l in lst:
no = l.xpath('./td[1]/center/span/text()')
name = l.xpath('./td[2]/a/center/text()')
team = l.xpath('./td[3]/center/text()')
year = l.xpath('./td[4]/center/text()')
data_dict = {'排名': no, '地区': name, 'GDP': team, '年份': year}
print(f'排名:{no} 地区:{name} GDP:{team} 年份:{year} ')
csv_write.writerow(data_dict)
file.close()
url = 'https://www.hongheiku.com/category/gdjsgdp'
headers = {'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/109.0.0.0 Safari/537.36'}
resp = requests.get(url, headers=headers)
# print(resp.text)
data(resp.text)
e = etree.HTML(resp.text)
#//*[@id="tablepress-48"]/tbody/tr[192]/td[3]/center
count = e.xpath('//div[@class="pagination pagination-multi"][last()]/ul/li[last()]/span/text()')[0].split(' ')[1]
for index in range(int(count) - 1):
gdpData(index + 2)Implementation
The revised script moves the CSV file handling inside the data function, ensures each field is joined into a plain string, and adds explicit header writing before looping through the rows. This makes the code self‑contained and ready to run.
import requests
from lxml import etree
import csv
import time
import pandas as pd
def gdpData(page):
url = f'https://www.hongheiku.com/category/gdjsgdp/page/{page}'
headers = {'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/109.0.0.0 Safari/537.36'}
resp = requests.get(url, headers=headers)
# print(resp.text)
data(resp.text)
def data(text):
file = open('data.csv', mode='a', encoding='utf-8', newline='')
csv_write = csv.DictWriter(file, fieldnames=['排名', '地区', 'GDP', '年份'])
csv_write.writeheader()
e = etree.HTML(text)
lst = e.xpath('//*[@id="tablepress-48"]/tbody/tr[@class="even"]')
for l in lst:
no = ''.join(l.xpath('./td[1]/center/span/text()'))
name = ''.join(l.xpath('./td[2]/a/center/text()')[0])
team = ''.join(l.xpath('./td[3]/center/text()'))
year = ''.join(l.xpath('./td[4]/center/text()'))
data_dict = {'排名': no, '地区': name, 'GDP': team, '年份': year}
print(f'排名:{no} 地区:{name} GDP:{team} 年份:{year} ')
csv_write.writerow(data_dict)
file.close()
url = 'https://www.hongheiku.com/category/gdjsgdp'
headers = {'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/109.0.0.0 Safari/537.36'}
resp = requests.get(url, headers=headers)
# print(resp.text)
data(resp.text)
e = etree.HTML(resp.text)
#//*[@id="tablepress-48"]/tbody/tr[192]/td[3]/center
count = e.xpath('//div[@class="pagination pagination-multi"][last()]/ul/li[last()]/span/text()')[0].split(' ')[1]
for index in range(int(count) - 1):
gdpData(index + 2)Running the script writes all extracted rows into data.csv, which can then be opened in Excel or processed further.
Conclusion
The article provides a complete, step‑by‑step solution for a Python web‑scraping task that fetches GDP rankings from a public website and stores the results in a CSV file, illustrating how to combine requests, lxml, and the standard csv module for reliable data extraction.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Python Crawling & Data Mining
Life's short, I code in Python. This channel shares Python web crawling, data mining, analysis, processing, visualization, automated testing, DevOps, big data, AI, cloud computing, machine learning tools, resources, news, technical articles, tutorial videos and learning materials. Join us!
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
