Building a High‑Performance Content Moderation System with Trie, Aho‑Corasick, Redis, and Go

This article details how to design and implement a scalable, low‑cost content moderation pipeline that combines a local Trie + Aho‑Corasick engine, Redis‑based hot‑updates, MySQL persistence, and third‑party machine‑review fallback to achieve millisecond‑level response, high accuracy, and controllable costs.

Rare Earth Juejin Tech Community
Rare Earth Juejin Tech Community
Rare Earth Juejin Tech Community
Building a High‑Performance Content Moderation System with Trie, Aho‑Corasick, Redis, and Go

Project‑stage Review Mechanism

When I joined a new social project, the "Moments" feature was extremely slow, taking several minutes to display a post, which is fatal for user engagement. The root cause was the review mechanism: only two customer service agents handled all reviews (avatars, nicknames, moments) and they stopped working at night, while user activity peaked at night.

We introduced a shift‑based schedule with two agents per shift, covering morning‑to‑afternoon and afternoon‑to‑midnight, allowing remote reviews. This reduced latency dramatically.

New Problems After User Surge

Massive user growth caused review pressure to explode. Adding more agents still couldn't keep up, and relying solely on third‑party machine review was costly and error‑prone.

“The cost is too high, find a way to reduce it!”

We needed a technical solution to lower machine‑review calls while preserving user experience and detection accuracy.

Core Goals

Local review capacity insufficient

Machine‑review cost too high

Mis‑detections increase complaints and require double handling

Management demands cost reduction without hurting experience

We defined the following objectives:

Reduce machine‑review calls by intercepting obvious cases locally.

Guarantee sub‑second user experience for moments/comments.

Minimize complaint volume from false positives.

Enable real‑time rule updates via Redis Pub/Sub.

Keep third‑party review as a fallback, not the primary path.

Step 1: Build a Local Blacklist System

We created a MySQL table api_sensitive_words to store high‑confidence sensitive words (blacklist, whitelist, normal) with fields for type, category, source, status, hit count, etc. Indexes on keyword support fast Trie construction.

CREATE TABLE `api_sensitive_words` ( `id` BIGINT UNSIGNED NOT NULL AUTO_INCREMENT COMMENT '自增ID', `keyword` VARCHAR(255) NOT NULL COMMENT '敏感词', `type` ENUM('BLACK','WHITE','NORMAL') DEFAULT 'NORMAL' COMMENT '类型: 黑名单/白名单/普通', `category` ENUM('PORN','POLITICS','TERROR','AD','INSULT','OTHER') DEFAULT 'OTHER' COMMENT '分类', `source` ENUM('HUMAN','VENDOR','AUDIT') DEFAULT 'HUMAN' COMMENT '来源', `status` TINYINT(1) DEFAULT 1 COMMENT '状态: 1启用 0停用', `hit_count` BIGINT DEFAULT 0 COMMENT '命中次数', `updated_by` VARCHAR(64) DEFAULT NULL COMMENT '最后操作人', `updated_at` TIMESTAMP DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP COMMENT '最后更新时间', PRIMARY KEY (`id`), KEY `idx_keyword` (`keyword`) ) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COMMENT='敏感词表';

Key benefits:

Admins can flexibly maintain the word list.

Local Trie matches avoid any external request.

Future extensions for whitelist, hot‑updates, and mis‑detection handling.

Step 2: High‑Performance Matching with Trie + Aho‑Corasick

We implemented a Go library that builds a Trie from the word list and augments it with Aho‑Corasick failure pointers for O(N) matching regardless of dictionary size.

type TrieNode struct { children map[rune]*TrieNode fail *TrieNode isEnd bool word string }

type ACTrie struct { root *TrieNode mu sync.RWMutex }

func (ac *ACTrie) Build(words []string) { /* build Trie and failure links */ }

func (ac *ACTrie) Match(text string) []string { /* return all matched keywords */ }

This approach handles fuzzy matches, homophones, and emoji variations efficiently.

Step 3: Architecture Design

The overall flow:

┌───────────────────────┐   MySQL Persistent Store
│ api_sensitive_words   │← Store all words
└───────────┬───────────┘
            │
   Backend admin adds/updates words
            │
   ┌────────▼─────────┐   Redis Cache
   │   Redis Cache    │← Store latest word list
   └───────┬───────────┘
           │
   Redis Pub/Sub (sensitive:update)
           │
   ┌───────▼───────────┐   Go Service In‑Memory Trie + AC
   │   Go Trie Engine   │← Real‑time matching
   └───────┬───────────┘
           │
   User request → Local match →
   ├─ Hit → Block/Flag
   └─ Miss → Optional third‑party audit

Advantages:

99% of requests are answered by in‑memory matching (milliseconds).

Only suspicious or unmatched content triggers third‑party audit, drastically cutting cost.

Hot‑updates via Redis ensure new rules take effect within seconds.

Step 4: Machine‑Review Fallback and Feedback Loop

When local matching misses, we call a vendor audit service (e.g., Shumei, Tianwang). The vendor returns three statuses:

Status

Meaning

Our Action

PASS

Content safe

Allow passage but record candidate words for later review.

REVIEW

Potential risk

Store in api_sensitive_candidates for manual verification.

REJECT

Definite violation

Block immediately and log candidate.

Candidate words are stored in api_sensitive_candidates with fields for vendor, risk level, and status (PENDING, CONFIRMED, REJECTED). After manual review:

Confirmed violations are added to the blacklist (type = BLACK) and the Trie is hot‑updated.

Confirmed false positives are added to the whitelist (type = WHITE) to prevent future blocks.

Step 5: Intelligent Word‑Library Evolution

We automate the evolution cycle:

Machine‑review → candidate table.

Human verification → blacklist/whitelist.

Redis publish → Go service rebuilds Trie instantly.

This creates a self‑learning system where each audit improves future detection.

Risk Scoring for Selective Machine Review

Not every unmatched request goes to the vendor. We compute a risk score based on:

Obfuscation patterns (spaces, emojis, phonetic variants) – +1‑2.

External links or contact info – +2.

Similarity to blacklist words – +1‑2.

Context weight (nickname, private message, time of day) – +1‑3.

Account/device signals (new account, rapid posting, IP sharing) – +1‑2.

Template or bulk‑post signatures – +1.

Thresholds T1/T2 decide:

Score < T1 → direct pass.

T1 ≤ Score < T2 → send to vendor for fallback.

Score ≥ T2 → block or route to manual review.

Additional safeguards include random human sampling and user‑report channels, feeding results back into the model.

Results and Benefits

Machine‑review calls reduced by >70%.

False‑positive rate dropped below 1%.

Customer‑service workload cut by ~50%.

System scales to high traffic with sub‑second latency.

Conclusion

The final architecture combines a fast in‑memory Trie + Aho‑Corasick engine, Redis‑driven hot updates, and a controlled machine‑review fallback with a feedback loop, delivering a cost‑effective, high‑performance content safety solution suitable for any large‑scale social platform.

For more AI‑coding resources, visit AI编程资讯AI编码专区指南 .

image.png
image.png
backendRedisGocontent moderationAho-CorasickTrie
Rare Earth Juejin Tech Community
Written by

Rare Earth Juejin Tech Community

Juejin, a tech community that helps developers grow.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.