PB‑Level Ad‑hoc Query Practice with Flink: Threat Hunting Platform Architecture and IO‑Reducing Optimizations
This article details 360's Threat Hunting platform built on Flink, covering its evolution, architecture, block‑index design, Hilbert‑curve data ordering, like‑pushdown, join optimizations, Alluxio caching, and future plans for BI and multi‑user concurrency, all aimed at efficient PB‑scale data querying.
The article summarizes a presentation by 360 Gov‑Enterprise Security Group’s data engineers on their Flink‑based Threat Hunting platform, originally shared at Flink Forward Asia 2020.
Platform evolution : Starting from a UEBA system (2017) to an HQL offline query language (2018) and finally a real‑time HQL version (2019) that handles PB‑scale data, the team continuously refined storage and query mechanisms.
Architecture design : Data originates from Elasticsearch (historical) and Kafka (real‑time), synchronized to ORC files; a scheduling system parses HQL, performs operator caching, and leverages predicate push‑down to an index database. Cached intermediate results are reused when possible, otherwise computation starts from ORC files.
Block index structure : To avoid costly row‑level indexes, a block‑level index (min/max, Bloom, bitmap, inverted) is stored in a database, enabling high‑cut‑rate pruning (≈85% of queries achieve >90% file reduction). Indexes are kept in the DB for transaction support and high compression.
IO‑reducing optimizations :
Data is sorted using a Hilbert space‑filling curve during ingestion, improving theoretical pruning limits.
ORC API is extended to push down LIKE predicates, allowing row‑group level skipping via dictionary checks.
Join performance is boosted by pre‑filtering files with Bloom indexes and performing cost‑based pre‑joins for broadcast and hash joins.
Alluxio is employed as a cloud‑native caching layer, bringing data closer to Flink and reducing remote storage reads.
Performance tests on a 249 TB Elasticsearch cluster show the optimized HQL engine often matches or exceeds ES query speed, especially for high‑selectivity queries.
Future plans : Extend the platform for BI dashboards, containerization, JVM warm‑up, and enhanced multi‑user concurrency.
Big Data Technology Architecture
Exploring Open Source Big Data and AI Technologies
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.