Big Data 18 min read

PB‑Level Ad‑hoc Query Practice with Flink: Threat Hunting Platform Architecture and IO‑Reducing Optimizations

This article details 360's Threat Hunting platform built on Flink, covering its evolution, architecture, block‑index design, Hilbert‑curve data ordering, like‑pushdown, join optimizations, Alluxio caching, and future plans for BI and multi‑user concurrency, all aimed at efficient PB‑scale data querying.

Big Data Technology Architecture

Jul 20, 2021

PB‑Level Ad‑hoc Query Practice with Flink: Threat Hunting Platform Architecture and IO‑Reducing Optimizations

The article summarizes a presentation by 360 Gov‑Enterprise Security Group’s data engineers on their Flink‑based Threat Hunting platform, originally shared at Flink Forward Asia 2020.

Platform evolution : Starting from a UEBA system (2017) to an HQL offline query language (2018) and finally a real‑time HQL version (2019) that handles PB‑scale data, the team continuously refined storage and query mechanisms.

Architecture design : Data originates from Elasticsearch (historical) and Kafka (real‑time), synchronized to ORC files; a scheduling system parses HQL, performs operator caching, and leverages predicate push‑down to an index database. Cached intermediate results are reused when possible, otherwise computation starts from ORC files.

Block index structure : To avoid costly row‑level indexes, a block‑level index (min/max, Bloom, bitmap, inverted) is stored in a database, enabling high‑cut‑rate pruning (≈85% of queries achieve >90% file reduction). Indexes are kept in the DB for transaction support and high compression.

IO‑reducing optimizations :

Data is sorted using a Hilbert space‑filling curve during ingestion, improving theoretical pruning limits.

ORC API is extended to push down LIKE predicates, allowing row‑group level skipping via dictionary checks.

Join performance is boosted by pre‑filtering files with Bloom indexes and performing cost‑based pre‑joins for broadcast and hash joins.

Alluxio is employed as a cloud‑native caching layer, bringing data closer to Flink and reducing remote storage reads.

Performance tests on a 249 TB Elasticsearch cluster show the optimized HQL engine often matches or exceeds ES query speed, especially for high‑selectivity queries.

Future plans : Extend the platform for BI dashboards, containerization, JVM warm‑up, and enhanced multi‑user concurrency.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Flink Indexing Alluxio IO optimization threat hunting Block Index

Written by

Big Data Technology Architecture

Exploring Open Source Big Data and AI Technologies

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.