Backend Development 13 min read

How to Build a Real‑Time 8‑Hour Hot‑Article Ranking System at Massive Scale

This article explains how to design a distributed backend that ingests millions of clicks per second, stores data with Kafka and HDFS, computes top‑N articles using sliding windows and periodic aggregation, handles node failures, and mitigates click fraud, all while balancing accuracy and resource usage.

Java Backend Technology

May 10, 2018

How to Build a Real‑Time 8‑Hour Hot‑Article Ranking System at Massive Scale

Data Reception

To handle a click rate of about 1 M/s, the system uses multiple HTTP‑based client processes that batch and forward click events to a fleet of server instances. Each server parses incoming parameters and forwards aggregated data to a storage layer, optionally using short‑term in‑process windows before sending.

Data Storage

Because storing all click data in memory is prohibitively expensive, the design combines Kafka for high‑throughput durable queuing, HDFS for long‑term offline analysis, and a hybrid approach where Kafka streams feed both real‑time processors and periodic HDFS writes.

Distributed Top‑N Algorithm

Articles are sharded into 1024 tables by user‑ID hash. Each sub‑table can compute its own top‑100 scores with a simple SQL query. The results from all sub‑tables are merged and re‑sorted to obtain the global top‑100.

select id, score from user order by score desc limit 100

For multiple shards, the following pseudo‑code runs a top‑N query on each shard and aggregates the candidates.

candidates = []
for k in range(1024):
    # each table gets its top‑n
    rows = select id, score from user_${k} order by score desc limit 100
    candidates.extend(rows)
# sort by score descending
candidates = sorted(candidates, key=lambda t: t[1], reverse=True)
# final top‑n
candidates[:100]

Sliding Window

The system maintains a precise 8‑hour sliding window by assigning each click to a one‑minute slot. Slots are created, expired, and aggregated to keep the window up‑to‑date while limiting memory usage.

class HitSlot {
    long timestamp
    map[int]int hits  # post_id => hits
    void onHit(int postId, int hits) { this.hits[postId] += hits }
}

class WindowSlots {
    HitSlot currentSlot
    LinkedList<HitSlot> historySlots
    map[int]int topHits
    void onHit(int postId, int hits) {
        long ts = System.currentTimeMillis()
        if (this.currentSlot == null) {
            this.currentSlot = new HitSlot(ts)
        } elif (ts - this.currentSlot.timestamp > 60*1000) {
            this.historySlots.add(this.currentSlot)
            this.currentSlot = new HitSlot(ts)
        }
        this.currentSlot.onHit(postId, hits)
    }
    void onBeat() {
        if (historySlots.isEmpty()) return
        HitSlot slot = historySlots[0]
        long ts = System.currentTimeMillis()
        if (ts - slot.timestamp > 8*60*60*1000) {
            historySlots.remove(0)
            topHits = topn(aggregateSlots(historySlots))
        }
    }
}

Timed Tasks

Each node runs a periodic task that expires old slots, recomputes its local top‑N, and reports the results to a central aggregator. The aggregator merges local top‑N lists to produce the global ranking, which can be refreshed at a lower frequency.

class HotPostsAggregator {
    map[int]map[int]int localTopnPosts  # nodeId => top‑n posts
    map[int]int globalTopnPosts
    void onBeat() {
        // aggregate local top‑n and store to Redis
    }
    void onLocalReport(int nodeId, map[int]int topnPosts) {
        // receive local hot posts from a node
    }
}

Hashing

Kafka partitions are aligned with the number of worker nodes; article IDs are hashed to ensure that clicks for the same article always go to the same partition, allowing each node to handle a distinct subset of articles and reduce memory pressure.

Consumer Failure

If a worker node crashes, its 8‑hour window data is lost. To mitigate this, the system can periodically checkpoint the node’s state to local storage or a database, enabling fast recovery on restart. Spark‑Streaming’s built‑in checkpointing can also be used.

Click Deduplication

To avoid inflating counts from malicious repeated clicks, the client can filter rapid duplicate clicks from the same user, while allowing spaced‑out clicks that reflect genuine rereading. Server‑side detection of abnormal patterns (e.g., many clicks from a single IP) can further reduce fraud, possibly using IP blocking or manual review.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Backend real-time Kafka distributed-systems Sliding Window top-n

Written by

Java Backend Technology

Focus on Java-related technologies: SSM, Spring ecosystem, microservices, MySQL, MyCat, clustering, distributed systems, middleware, Linux, networking, multithreading. Occasionally cover DevOps tools like Jenkins, Nexus, Docker, and ELK. Also share technical insights from time to time, committed to Java full-stack development!

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.