How to Build a Real‑Time Sensitive Word Detection Service in Go
This article explains how to design, implement, and deploy a high‑performance Go service that uses an upgraded sego tokenizer to load custom sensitive‑word dictionaries, provide JSON‑RPC detection, support hot‑reloading, and scale across multiple data centers for live‑stream platforms.
Background
Live‑stream platforms need to filter prohibited content such as political, illegal, religious, violent, copyrighted terms, as well as platform‑specific spam (competitor poaching, vulgarity, advertisements). This article describes a custom sensitive‑word detection service built for the Huajiao live‑streaming system.
System Overview
The service is implemented in Go and extends the open‑source sego word‑segmentation library. It loads a self‑maintained sensitive‑word dictionary in seconds, performs segmentation using a shortest‑path algorithm with dynamic programming, and returns word type, attributes, hit status, and matched words via a JSON‑RPC interface. A clustered deployment provides high‑throughput, elastic scalability.
Architecture
Key Features
Customized upgrade of sego for Huajiao scenarios.
Returns word type, attributes, hit flag, and hit‑word list.
Dictionary generation is decoupled from the detection service.
Second‑level hot‑update of the dictionary with a single command.
JSON‑RPC API.
Multi‑data‑center deployment for elastic scaling.
Sensitive‑Word Dictionary Format
The dictionary is a CSV‑like file where each line contains four columns: sensitive word , frequency , attribute , and type . This enables business‑specific categorisation and scene flags.
Core Code
Automatic Hot‑Reload
func init() {
flag.Parse()
c := cron.New()
_ = c.AddFunc("@every "+*reloadInterval, reloadDict)
c.Start()
}
func reloadDict() {
logToFile(logFile,
"reload "+fmt.Sprintf("%d", *port)+" start interval : "+*reloadInterval+
" "+time.Now().Format("2006/01/02 15:04:05"))
segmenter.LoadDictionary(*dict)
logToFile(logFile,
"reload "+fmt.Sprintf("%d", *port)+" end :"+*reloadInterval+
" "+time.Now().Format("2006/01/02 15:04:05"))
}Hit Filtering with Custom Return Values
func HitFilter(text string, words []map[string]string) (map[string][]string, bool) {
hitMap := make(map[string][]string)
length := len(words)
hit := false
for i := length - 1; i >= 0; i-- {
tmpMap := strings.Split(words[i]["Pos"], "|")
for _, pos := range tmpMap {
hitMap[pos] = append(hitMap[pos], words[i]["Text"])
}
}
if len(hitMap) > 0 {
hit = true
}
return hitMap, hit
}Deployment Guide
Prerequisites
Go version 1.11.2 or later
Build
cd $project_dir && go build -o ./bin/segoserver *.goRun Service
/bin/segoserver --port=8080 --dict=/tmp/segoserver-user-dict.txt --reloadInterval=30sParameters port: listening port of the detection service. dict: absolute path to the sensitive‑word dictionary file. reloadInterval: time interval for automatic hot‑reloading (e.g., 30s).
Dictionary Generation Refresh the dictionary file with an external script and place it at the path specified by --dict (e.g., /tmp/segoserver-user-dict.txt ).
Testing Example request:
curl -i http://127.0.0.1:8080/json?text=加微Sample response:
{"Hit":true,"HitMap":{"3":["加微"],"4":["加微"],"5":["加微"],"6":["加微"]},"Segment":[{"Newtyp":"2","Pos":"3|4|5|6","Text":"加微"}]}Extended Discussion
Use Cases
Live‑stream chat (bullet comments) where messages are broadcast and searchable.
User nicknames, signatures, comments, and status updates.
Different scenarios require distinct sensitive‑word policies, and policies may change frequently, demanding rapid dictionary updates.
Effectiveness and Limitations
Traditional keyword filters work well for Chinese characters, numbers, and letters but perform poorly on special characters.
Sensitive‑word filtering alone cannot block all spam; it should be combined with intelligent anti‑spam models for secondary analysis.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Huajiao Technology
The Huajiao Technology channel shares the latest Huajiao app tech on an irregular basis, offering a learning and exchange platform for tech enthusiasts.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
