When Web Crawlers Cross the Line: A Legal Case Study on Unauthorized Data Scraping
This article recounts how a Chinese fintech company's automated web‑crawler, built to query a municipal residence‑permit system, overloaded the server, triggered police action, led to criminal charges for the CTO and programmer, and offers lessons on the legal risks of large‑scale data scraping.
Development
KG Company was founded in 2014, initially focusing on internet finance and later shifting to technology services centered on mortgage‑related loans. To support its business, the company needed frequent access to a municipal residence‑permit website for property and school‑district information, which was manually inefficient.
In December 2017 the CTO assigned a newly hired programmer to create a timed, automated crawler that could query and download the required data. By January 2018 the programmer received a basic data‑capture script and began modifying it. In March 2018 the small program was deployed on an Alibaba‑type cloud server and could:
• Connect to the city’s residence‑permit system; • Retrieve property address, building code, and related details; • Generate tens of thousands of requests per hour.
The retrieved information was stored on the company’s cloud server and later used to monitor real‑estate listings from agencies such as Lianjia and Qfang.
Incident
On April 27, 2018 the residence‑permit system experienced a crash; investigators traced the overload to high‑frequency requests but could not locate the source IP due to missing logs. A second attack occurred on May 2, and this time the IP was captured and reported.
On May 17 the cloud provider informed KG that its server IP had been locked by cyber‑police for suspected attacks. The CTO contacted the development team, who blamed a new CAPTCHA on the target site for the crawler’s failure, leading to unintended aggressive requests.
During the outage the residence‑permit platform, serving over 5.3 million registered users, was unable to process self‑service applications, on‑site registrations, or inter‑departmental queries, causing widespread disruption and exposing millions of building‑code records.
Investigation and Confession
In August 2018 the CTO and programmer were detained. Police seized the crawler’s source code, logs, and a database dump containing roughly 29 million property records. Forensic analysis confirmed that the program repeatedly sent massive, unauthorized queries to the target URL, effectively downloading the database.
Both defendants claimed they only scraped publicly available information to improve business efficiency, denied any profit motive, and asserted they were unaware of the system’s capacity limits.
Judgment
The court found both men guilty of interfering with a computer information system that served more than 50 000 users, causing the system to be inoperable for over an hour. The CTO, who authorized and oversaw the crawler’s development, was sentenced to three years in prison as the principal offender. The programmer, acting under instruction, received a one‑year‑and‑six‑months sentence as an accomplice.
The case underscores that while routine web scraping can be a legitimate business tool, large‑scale, unregulated crawling can breach legal thresholds and result in severe criminal liability.
Lesson: Before launching any automated data‑collection project, ask “What are the legal and technical risks?” and ensure collective responsibility and compliance.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Java High-Performance Architecture
Sharing Java development articles and resources, including SSM architecture and the Spring ecosystem (Spring Boot, Spring Cloud, MyBatis, Dubbo, Docker), Zookeeper, Redis, architecture design, microservices, message queues, Git, etc.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
