How Bilibili Built a Scalable Anti‑Crawling System: Architecture, Data Flow, and Real‑World Impact
The article details Bilibili's comprehensive anti‑crawling solution, covering the problem background, a two‑layer detection framework integrated with APIGW and GAIA, risk perception, strategy iteration, verification mechanisms, quantitative results, and future improvement directions, all illustrated with concrete examples and performance numbers.
Background of API anti‑crawling
API anti‑crawling (API security) protects platform interfaces that expose large amounts of data. Crawlers consume bandwidth, CPU, expose private user data, and can flood activity‑related APIs (video info, user info, comments, live‑room danmaku/gift, live lottery, site‑wide events) without providing value.
Anti‑crawling data‑flow framework
Two detection layers are deployed:
Gateway‑side signature verification on APIGW intercepts obvious malicious calls.
Risk‑control side (GAIA engine) analyses device, IP, and account features reported from the gateway; on anomaly it triggers front‑end human verification (captcha, login prompt, etc.).
Request flow: front‑end → APIGW → GAIA → anomaly signal → front‑end verification flow.
Risk engine integration
Initially each business service modified its own API to report data to GAIA, allowing fine‑grained features but requiring heavy development. To handle hundreds of interfaces, the risk engine was integrated directly into APIGW, reporting generic device parameters (IP, buvid, UA, referer) centrally.
Per‑service gateway modification: can add business‑specific features; requires substantial backend work; integration speed 2‑5 interfaces per week.
APIGW unified collection: no code changes; pre‑filters abnormal traffic; only generic device/network parameters; integration speed >10 interfaces per day.
Risk perception and strategy iteration
After traffic is fed into GAIA, continuous risk perception identifies abnormal flows.
Near‑real‑time spike monitoring: Detects sudden traffic spikes that deviate from normal periodic patterns. When a spike exceeds a configured threshold, an alert is raised, indicating possible crawler activity or a sudden event.
Feature‑based anomaly monitoring: Crawlers often send malformed parameters. The system monitors abnormal ratios of illegal device IDs, illegal UA strings, etc., to flag suspicious traffic.
Abnormal traffic disposition
Multiple disposition methods balance security and user experience:
Reject (toast) – low cost, high user awareness, requires accurate rules.
Data poisoning (mock) – high cost, low user awareness, needs business cooperation.
Captcha (image, Geetest, self‑built) – broad applicability, moderate security.
SMS/Phone verification – higher barrier, ties to user account.
Login popup – integrates with authentication, higher cost.
Disposition flow:
Front‑end request reaches APIGW, which returns a verification decision.
Verification SDK calls the verification gateway to fetch the challenge.
User completes the challenge on the page.
SDK sends the answer to the verification gateway for validation.
If Geetest passes, the request is re‑checked by GAIA for known cracking tools.
Final result is returned to the client; successful verification data is reported back to GAIA for whitelisting.
Effectiveness
Ordinary interface results
Traffic curves become smooth after risk deployment, indicating mitigation of crawler spikes. Daily abnormal request detection exceeds hundreds of millions, with recall rates for key interfaces above 85%. Since Q3 2023 no service outages caused by crawler attacks have been observed.
Special interface cases
Live‑room long connection: Black‑market actors scraped danmaku, user entry, and revenue data. Sampling‑based mock responses and risk rules reduced daily connection attempts by 25% while processing millions of connections.
Gold‑seed activity: Coordinated low‑amount gifting and bulk gold‑seed exchanges were detected, leading to targeted account bans and activity shutdowns.
Follow spam: Batch follow attacks produced clear traffic spikes. Analysis of user and relationship data identified malicious accounts driving traffic to illegal sites; customized rules blocked these accounts.
Gateway signature verification component
To prevent plain‑text API calls from being replayed, a mixed‑key digital signature scheme was built.
Architecture (five modules):
Signature SDK: Generates and rotates masked keys, validates timestamps, performs signature verification, provides per‑endpoint rate limiting, and reports verification results to risk control.
Web gateway: Supplies masked keys to the front‑end.
Business gateway: Integrates the SDK verification logic.
Risk platform: Consumes verification data as features for GAIA strategies.
Front‑end: Retrieves masked keys, mixes them, and generates a sign parameter for each request.
Signature workflow (five steps):
Key generation: Web gateway job periodically creates a key pair and stores it in Redis.
Key distribution: The first page‑load request fetches the raw key.
Key masking: Front‑end applies field shifting, salting, and mapping to produce a masked key.
Signature construction: Parameters and timestamp are concatenated and encrypted with the masked key to produce sign, which is appended to the request.
Gateway verification: Business gateway recomputes the expected signature using the same algorithm; if it matches, the request is accepted and the result is reported to GAIA.
Future outlook
Deploy lightweight risk engines for high‑QPS interfaces.
Develop metrics for low‑frequency, long‑term crawling patterns.
Introduce model‑based behavior‑sequence detection to complement rule‑based methods.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Architect
Professional architect sharing high‑quality architecture insights. Topics include high‑availability, high‑performance, high‑stability architectures, big data, machine learning, Java, system and distributed architecture, AI, and practical large‑scale architecture case studies. Open to ideas‑driven architects who enjoy sharing and learning.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
