Information Security 22 min read

API Anti-Crawling Architecture and Effectiveness at Bilibili

Bilibili combats API abuse by deploying a two‑layer anti‑crawling system—gateway‑side signature verification and a GAIA risk‑control engine integrated into APIGW—that unifies device data, applies flexible rule packages, triggers diverse human challenges, and has already blocked billions of malicious requests with over 85% recall while preventing service outages.

Bilibili Tech
Bilibili Tech
Bilibili Tech
API Anti-Crawling Architecture and Effectiveness at Bilibili

Interface anti‑crawling, also referred to as API security, is a fundamental issue for websites and apps, especially as platform scale grows and the number of functional APIs increases. Crawlers consume bandwidth and compute resources, leak user privacy, and enable malicious activities such as mass‑following and fraudulent transactions.

At Bilibili, many APIs are targeted, including video info, user info, comment, live‑room danmaku/gift, live‑lottery, and main‑site activity interfaces.

2. Anti‑crawling data‑flow framework

The request flow contains two layers of anomaly detection: (1) a gateway‑side signature verification component deployed on APIGW that blocks obvious malicious calls, and (2) a risk‑control layer (GAIA engine) that evaluates device, IP, and account features to trigger front‑end human verification (captcha, login prompt, etc.).

2.1 Data integration to risk control

Initially, each business service reported data to the risk engine via custom interfaces, which required significant development effort. To handle hundreds of APIs, a generic flow was built by integrating the risk engine directly into APIGW, allowing unified reporting of common device parameters (IP, buvid, UA, Referer) without per‑service code changes.

Benefits include reduced gateway resource consumption (abnormal traffic blocked before reaching downstream services) and a ten‑fold increase in integration speed (from 2‑5 APIs per week to >10 per day).

2.2 Risk perception and strategy iteration

Short‑term near‑real‑time monitoring detects traffic spikes and feature distribution anomalies. Alerts are triggered for sudden traffic surges or abnormal ratios of illegal device IDs or UA strings.

2.2.1 Strategy deployment

Strategies are grouped into three categories:

Frequency‑control (e.g., high request count per user or IP within X minutes)

Abnormal‑aggregation (e.g., high illegal UA proportion, high emulator proportion)

Parameter‑value checks (e.g., illegal Referer, script‑like UA)

Reusable rule packages are created to achieve high precision and broad applicability, allowing rapid gray‑release after minimal manual verification.

2.3 Abnormal traffic handling

After detection, various mitigation methods are applied, ranging from simple toast‑style rejections to data poisoning, graphic/behavioral captchas, SMS verification, and login dialogs. The verification flow involves:

APIGW forwards the request to the risk engine, which returns a verification decision.

The client SDK receives a verification code and requests the appropriate verification service.

The user completes the challenge (captcha, SMS, login).

The verification result is sent back to the risk engine for final judgment.

2.4 Gateway signature component design

The architecture consists of a signature SDK, web gateway, business gateway, risk platform, and front‑end. Key features include high security (random key generation with obfuscation), configurability, high performance (in‑memory verification), low cost (shared keys for initial rollout), and resilience (automatic exemption on massive failures).

2.4.2 Signature encryption process

The five steps are:

Key generation (periodic job stores a key pair in Redis).

Key distribution (first page‑load API delivers the key to the client).

Key obfuscation (field shifting, salting, mapping).

Signature construction (concatenate parameters, timestamp, and key, then encrypt to produce sign ).

Gateway verification (server recomputes the signature and validates it, reporting the result to the risk engine).

3. Anti‑crawling effectiveness

3.1 General API impact

Qualitative assessment shows that traffic spikes are smoothed after risk deployment. Quantitatively, billions of abnormal requests are identified daily, with recall rates above 85% for key interfaces. No service outages caused by crawlers were observed in Q3 2023.

3.2 Special interface cases

Live‑connection abuse: sampling of broadcast messages reduces data leakage; daily abnormal connections decreased by 25%.

Gold‑seed exchange abuse: detection of coordinated low‑value gifting and exchange patterns led to account bans.

Follow‑spam: massive batch follows from malicious accounts were identified and mitigated.

4. Summary and future outlook

Key pain points and solutions include high integration effort (solved by APIGW integration), parameter tampering (addressed by gateway signature), strategy complexity (unified rule packages), captcha bypass (diverse verification methods), and accidental large‑scale false positives (monitoring and circuit‑breakers).

Future work will focus on:

Deploying lightweight risk engines for ultra‑high‑QPS interfaces.

Developing metrics for low‑frequency, long‑term crawling behaviors.

Introducing model‑based, behavior‑sequence detection to complement rule‑based methods.

API securityrisk controltraffic analysisBilibilianti-crawlinggateway verificationsignature encryption
Bilibili Tech
Written by

Bilibili Tech

Provides introductions and tutorials on Bilibili-related technologies.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.