Backend Development 6 min read

Scrape Lianjia Real‑Estate Listings with Python and Export to Word

This guide walks through building a Python web‑scraper that fetches house names, prices and popularity from Lianjia, formats the data into a Word template, and generates individual Word documents while demonstrating key libraries, URL handling, HTML parsing, and best‑practice considerations.

Python Crawling & Data Mining

Jun 1, 2020

Scrape Lianjia Real‑Estate Listings with Python and Export to Word

Introduction

With the growing importance of housing data, the article demonstrates how to crawl second‑hand house listings from Lianjia (bj.lianjia.com) using Python and generate individual Word documents containing the house name, price and popularity.

Project Goal

Extract house name, price, and attention metrics, fill a Word template, and produce separate Word files for each listing.

Target Site and Libraries

Target URL pattern: https://bj.lianjia.com/ershoufang/pg{}/ where the page number is formatted into the URL.

Required libraries: requests , time , lxml .

Implementation Details

1. Class Definition

import requests
from lxml import etree
import time

class LianJia(object):
    def __init__(self):
        self.url = "https://bj.lianjia.com/ershoufang/pg{}/"
        self.headers = {
            "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/78.0.3904.108 Safari/537.36"
        }

    def get_page(self, url):
        html = requests.get(url=url, headers=self.headers).content.decode("utf-8")
        self.page_process(html)

    def page_process(self, html):
        parse_html = etree.HTML(html)
        items = parse_html.xpath('//*[@id="content"]/div[1]/ul/li')
        for li in items:
            house = {}
            house["名称"] = li.xpath('.//div[@class="infoclear"]//div[@class="title"]/a/text()')[0].strip()
            house["价格"] = li.xpath(".//div[@class='priceInfo']/div[@class='totalPrice']/span/text()")[0].strip() + "万"
            house["关注度"] = li.xpath('.//div[@class="info clear"]//div[@class="followInfo"]//text()')[0].strip()
            self.save_to_word(house)

    def save_to_word(self, house_dict):
        with open('房子.doc', 'a', encoding='utf-8') as f:
            f.write(str(house_dict) + "
")
            print(house_dict)

    def main(self):
        for pg in range(1, 101):
            url = self.url.format(pg)
            print("=" * 50)
            self.get_page(url)
            time.sleep(1.4)

if __name__ == "__main__":
    spider = LianJia()
    spider.main()

2. Running the Spider

Execute the script; the console will display progress and the generated Word file (named “房子.doc”) will contain the scraped data.

Result Showcase

Conclusion

Do not scrape excessively to avoid overloading the server.

The project helps understand recent housing price trends.

Python web crawling combined with Word automation provides a practical way to collect and present real‑estate data.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Python data extraction Web Scraping Lianjia Word Automation

Written by

Python Crawling & Data Mining

Life's short, I code in Python. This channel shares Python web crawling, data mining, analysis, processing, visualization, automated testing, DevOps, big data, AI, cloud computing, machine learning tools, resources, news, technical articles, tutorial videos and learning materials. Join us!

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.