Information Security 19 min read

API Anti‑Crawling and Security Architecture: Risk Detection, Strategy, and Effectiveness at Bilibili

This article details Bilibili's comprehensive anti‑crawling system, covering the background of API abuse, the data‑flow framework, risk perception, strategy iteration, verification mechanisms, gateway signing design, and the measurable impact on normal and special‑case interfaces.

High Availability Architecture
High Availability Architecture
High Availability Architecture
API Anti‑Crawling and Security Architecture: Risk Detection, Strategy, and Effectiveness at Bilibili

1. Background of API anti‑crawling – API abuse threatens platform resources, user privacy, and business operations. Bilibili identifies vulnerable interfaces such as video info, user info, comments, live‑stream messages, and activity data.

2. Anti‑crawling data‑flow framework – Traffic passes through APIGW where a signature verification component blocks obvious malicious calls, then flows to the GAIA risk engine for feature‑based anomaly detection, prompting front‑end verification (CAPTCHA, login prompts) when needed.

2.1 Data integration with risk engine – Initially, each service reported data individually, which was labor‑intensive. Integrating risk reporting into APIGW enabled unified, code‑free onboarding, raising efficiency from 2‑5 interfaces per week to over 10 per day.

2.2 Risk perception and strategy iteration – Near‑real‑time monitoring detects traffic spikes and feature anomalies (e.g., abnormal UA or device ratios). Strategies include frequency limits, abnormal aggregation, and parameter‑value checks, with reusable rule groups applied automatically to new interfaces.

2.3 Abnormal traffic handling – Multiple mitigation methods (toast rejection, data poisoning, various CAPTCHA types, SMS, login dialogs) are deployed based on risk level, achieving ~99% coverage.

2.4 Gateway signature component – A mixed‑key signing scheme encrypts request parameters; the gateway validates signatures, reports to risk engine, and can block suspicious calls. The architecture comprises a signing SDK, web gateway, business gateway, risk platform, and front‑end.

2.4.1 Signing process – Key generation, distribution, obfuscation, signature construction, and gateway verification occur in five steps, ensuring only legitimate requests pass.

3. Effectiveness of anti‑crawling – Quantitative metrics show billions of abnormal requests blocked daily, recall rates above 85%, and no service outages due to crawlers in Q3 2023. Special interfaces (live‑stream connections, gold‑seed exchanges, follow actions) also saw significant reductions in malicious activity.

4. Summary and future outlook – The project delivered fast onboarding, timely risk perception, layered mitigation, and reproducible results. Future work includes lightweight engine deployment, advanced crawler behavior modeling, and AI‑driven risk identification.

gatewayAPI securityRisk MitigationBilibilianti-crawlingrisk detectionVerification
High Availability Architecture
Written by

High Availability Architecture

Official account for High Availability Architecture.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.