When Web Crawlers Turn Criminal: A Real‑World Data Scraping Case Study

This article recounts how a fintech company's automated web‑scraping tool overloaded a municipal residence‑permit system, leading to massive data leakage, legal prosecution of its CTO and programmer, and highlights the severe legal risks of unchecked crawling practices.

21CTO
21CTO
21CTO
When Web Crawlers Turn Criminal: A Real‑World Data Scraping Case Study

Background

KG Company, founded in 2014, shifted from internet finance to technology in 2017, offering loan‑replacement services that required frequent queries to a city’s residence‑permit website for property and school‑district information.

To improve efficiency, the product team proposed an automated crawler to retrieve and download data from the government portal.

Development

In December 2017, the CTO assigned a new programmer to create a timed data‑capture mini‑program. By January 2018 the programmer received source code and began modifications. In March 2018 the program was deployed on an Alibaba‑cloud server, capable of querying the residence‑permit system, extracting property addresses, building codes, etc., with traffic reaching tens of thousands of requests per hour.

The collected data were stored on the company’s cloud server and later used to monitor real‑estate listings from agencies such as Centaline and Lianjia.

Incident

On 27 April 2018 the residence‑permit system experienced a crash; logs were missing, and the cause was initially suspected to be a malicious attack. A second outage occurred on 2 May 2018, after which the system’s administrators captured the offending IP address and reported it.

On 17 May 2018 the cloud provider informed KG that its server IP had been blocked by the police for alleged attacks. The CTO learned that the crawler had not been updated to handle a new captcha, causing it to generate massive unauthorized requests.

During the outage, the crawler generated about 183 queries per second, totaling roughly 1.51 million records and leaking extensive building‑code data, rendering the residence‑permit services unusable for over 5 million registered users.

Legal Investigation

In August 2018 the CTO and programmer were detained. Police seized the application source code, logs, and database backups containing about 29 million property records. Forensic analysis confirmed that the crawler performed high‑frequency, automated queries that overwhelmed the target system.

Both defendants claimed they only scraped publicly available information to improve business efficiency, denied any intent to attack, and argued they were following internal directives.

Judgment

The court found that the defendants violated national regulations by interfering with a computer information system serving over 50 000 users for more than one hour, constituting a serious offense.

Consequently, the CTO, identified as the principal offender, received a three‑year prison sentence, while the programmer, deemed an accomplice, was sentenced to one year and six months.

Takeaway

This case illustrates the legal dangers of unregulated web crawling, emphasizing the need for risk assessment, compliance checks, and responsible engineering practices when accessing third‑party systems.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

Web Crawlingdata-scrapinglegal casecomputer crime
21CTO
Written by

21CTO

21CTO (21CTO.com) offers developers community, training, and services, making it your go‑to learning and service platform.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.