Scrape Lianjia Real‑Estate Listings with Python and Export to Word
This guide walks through building a Python web‑scraper that fetches house names, prices and popularity from Lianjia, formats the data into a Word template, and generates individual Word documents while demonstrating key libraries, URL handling, HTML parsing, and best‑practice considerations.
Introduction
With the growing importance of housing data, the article demonstrates how to crawl second‑hand house listings from Lianjia (bj.lianjia.com) using Python and generate individual Word documents containing the house name, price and popularity.
Project Goal
Extract house name, price, and attention metrics, fill a Word template, and produce separate Word files for each listing.
Target Site and Libraries
Target URL pattern: https://bj.lianjia.com/ershoufang/pg{}/ where the page number is formatted into the URL.
Required libraries: requests , time , lxml .
Implementation Details
1. Class Definition
import requests
from lxml import etree
import time
class LianJia(object):
def __init__(self):
self.url = "https://bj.lianjia.com/ershoufang/pg{}/"
self.headers = {
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/78.0.3904.108 Safari/537.36"
}
def get_page(self, url):
html = requests.get(url=url, headers=self.headers).content.decode("utf-8")
self.page_process(html)
def page_process(self, html):
parse_html = etree.HTML(html)
items = parse_html.xpath('//*[@id="content"]/div[1]/ul/li')
for li in items:
house = {}
house["名称"] = li.xpath('.//div[@class="infoclear"]//div[@class="title"]/a/text()')[0].strip()
house["价格"] = li.xpath(".//div[@class='priceInfo']/div[@class='totalPrice']/span/text()")[0].strip() + "万"
house["关注度"] = li.xpath('.//div[@class="info clear"]//div[@class="followInfo"]//text()')[0].strip()
self.save_to_word(house)
def save_to_word(self, house_dict):
with open('房子.doc', 'a', encoding='utf-8') as f:
f.write(str(house_dict) + "
")
print(house_dict)
def main(self):
for pg in range(1, 101):
url = self.url.format(pg)
print("=" * 50)
self.get_page(url)
time.sleep(1.4)
if __name__ == "__main__":
spider = LianJia()
spider.main()2. Running the Spider
Execute the script; the console will display progress and the generated Word file (named “房子.doc”) will contain the scraped data.
Result Showcase
Conclusion
Do not scrape excessively to avoid overloading the server.
The project helps understand recent housing price trends.
Python web crawling combined with Word automation provides a practical way to collect and present real‑estate data.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Python Crawling & Data Mining
Life's short, I code in Python. This channel shares Python web crawling, data mining, analysis, processing, visualization, automated testing, DevOps, big data, AI, cloud computing, machine learning tools, resources, news, technical articles, tutorial videos and learning materials. Join us!
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
