When Web Crawlers Cross the Line: A Legal Case Study on Unauthorized Data Scraping

This article recounts how a Chinese fintech company's automated web‑crawler, built to query a municipal residence‑permit system, overloaded the server, triggered police action, led to criminal charges for the CTO and programmer, and offers lessons on the legal risks of large‑scale data scraping.

Java High-Performance Architecture
Java High-Performance Architecture
Java High-Performance Architecture
When Web Crawlers Cross the Line: A Legal Case Study on Unauthorized Data Scraping

Development

KG Company was founded in 2014, initially focusing on internet finance and later shifting to technology services centered on mortgage‑related loans. To support its business, the company needed frequent access to a municipal residence‑permit website for property and school‑district information, which was manually inefficient.

In December 2017 the CTO assigned a newly hired programmer to create a timed, automated crawler that could query and download the required data. By January 2018 the programmer received a basic data‑capture script and began modifying it. In March 2018 the small program was deployed on an Alibaba‑type cloud server and could:

• Connect to the city’s residence‑permit system; • Retrieve property address, building code, and related details; • Generate tens of thousands of requests per hour.

The retrieved information was stored on the company’s cloud server and later used to monitor real‑estate listings from agencies such as Lianjia and Qfang.

Incident

On April 27, 2018 the residence‑permit system experienced a crash; investigators traced the overload to high‑frequency requests but could not locate the source IP due to missing logs. A second attack occurred on May 2, and this time the IP was captured and reported.

On May 17 the cloud provider informed KG that its server IP had been locked by cyber‑police for suspected attacks. The CTO contacted the development team, who blamed a new CAPTCHA on the target site for the crawler’s failure, leading to unintended aggressive requests.

During the outage the residence‑permit platform, serving over 5.3 million registered users, was unable to process self‑service applications, on‑site registrations, or inter‑departmental queries, causing widespread disruption and exposing millions of building‑code records.

Investigation and Confession

In August 2018 the CTO and programmer were detained. Police seized the crawler’s source code, logs, and a database dump containing roughly 29 million property records. Forensic analysis confirmed that the program repeatedly sent massive, unauthorized queries to the target URL, effectively downloading the database.

Both defendants claimed they only scraped publicly available information to improve business efficiency, denied any profit motive, and asserted they were unaware of the system’s capacity limits.

Judgment

The court found both men guilty of interfering with a computer information system that served more than 50 000 users, causing the system to be inoperable for over an hour. The CTO, who authorized and oversaw the crawler’s development, was sentenced to three years in prison as the principal offender. The programmer, acting under instruction, received a one‑year‑and‑six‑months sentence as an accomplice.

The case underscores that while routine web scraping can be a legitimate business tool, large‑scale, unregulated crawling can breach legal thresholds and result in severe criminal liability.

Lesson: Before launching any automated data‑collection project, ask “What are the legal and technical risks?” and ensure collective responsibility and compliance.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

cloud computinginformation securityWeb Crawlingdata-scrapinglegal casecomputer crime
Java High-Performance Architecture
Written by

Java High-Performance Architecture

Sharing Java development articles and resources, including SSM architecture and the Spring ecosystem (Spring Boot, Spring Cloud, MyBatis, Dubbo, Docker), Zookeeper, Redis, architecture design, microservices, message queues, Git, etc.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.