How to Build a Real‑Time Sensitive Word Detection Service in Go

This article explains how to design, implement, and deploy a high‑performance Go service that uses an upgraded sego tokenizer to load custom sensitive‑word dictionaries, provide JSON‑RPC detection, support hot‑reloading, and scale across multiple data centers for live‑stream platforms.

Huajiao Technology
Huajiao Technology
Huajiao Technology
How to Build a Real‑Time Sensitive Word Detection Service in Go

Background

Live‑stream platforms need to filter prohibited content such as political, illegal, religious, violent, copyrighted terms, as well as platform‑specific spam (competitor poaching, vulgarity, advertisements). This article describes a custom sensitive‑word detection service built for the Huajiao live‑streaming system.

System Overview

The service is implemented in Go and extends the open‑source sego word‑segmentation library. It loads a self‑maintained sensitive‑word dictionary in seconds, performs segmentation using a shortest‑path algorithm with dynamic programming, and returns word type, attributes, hit status, and matched words via a JSON‑RPC interface. A clustered deployment provides high‑throughput, elastic scalability.

Architecture

Service architecture diagram
Service architecture diagram

Key Features

Customized upgrade of sego for Huajiao scenarios.

Returns word type, attributes, hit flag, and hit‑word list.

Dictionary generation is decoupled from the detection service.

Second‑level hot‑update of the dictionary with a single command.

JSON‑RPC API.

Multi‑data‑center deployment for elastic scaling.

Sensitive‑Word Dictionary Format

The dictionary is a CSV‑like file where each line contains four columns: sensitive word , frequency , attribute , and type . This enables business‑specific categorisation and scene flags.

Core Code

Automatic Hot‑Reload

func init() {
    flag.Parse()
    c := cron.New()
    _ = c.AddFunc("@every "+*reloadInterval, reloadDict)
    c.Start()
}

func reloadDict() {
    logToFile(logFile,
        "reload "+fmt.Sprintf("%d", *port)+" start interval : "+*reloadInterval+
        " "+time.Now().Format("2006/01/02 15:04:05"))
    segmenter.LoadDictionary(*dict)
    logToFile(logFile,
        "reload "+fmt.Sprintf("%d", *port)+" end :"+*reloadInterval+
        " "+time.Now().Format("2006/01/02 15:04:05"))
}

Hit Filtering with Custom Return Values

func HitFilter(text string, words []map[string]string) (map[string][]string, bool) {
    hitMap := make(map[string][]string)
    length := len(words)
    hit := false
    for i := length - 1; i >= 0; i-- {
        tmpMap := strings.Split(words[i]["Pos"], "|")
        for _, pos := range tmpMap {
            hitMap[pos] = append(hitMap[pos], words[i]["Text"])
        }
    }
    if len(hitMap) > 0 {
        hit = true
    }
    return hitMap, hit
}

Deployment Guide

Prerequisites

Go version 1.11.2 or later

Build

cd $project_dir && go build -o ./bin/segoserver *.go

Run Service

/bin/segoserver --port=8080 --dict=/tmp/segoserver-user-dict.txt --reloadInterval=30s

Parameters port: listening port of the detection service. dict: absolute path to the sensitive‑word dictionary file. reloadInterval: time interval for automatic hot‑reloading (e.g., 30s).

Dictionary Generation Refresh the dictionary file with an external script and place it at the path specified by --dict (e.g., /tmp/segoserver-user-dict.txt ).

Testing Example request:

curl -i http://127.0.0.1:8080/json?text=加微

Sample response:

{"Hit":true,"HitMap":{"3":["加微"],"4":["加微"],"5":["加微"],"6":["加微"]},"Segment":[{"Newtyp":"2","Pos":"3|4|5|6","Text":"加微"}]}

Extended Discussion

Use Cases

Live‑stream chat (bullet comments) where messages are broadcast and searchable.

User nicknames, signatures, comments, and status updates.

Different scenarios require distinct sensitive‑word policies, and policies may change frequently, demanding rapid dictionary updates.

Effectiveness and Limitations

Traditional keyword filters work well for Chinese characters, numbers, and letters but perform poorly on special characters.

Sensitive‑word filtering alone cannot block all spam; it should be combined with intelligent anti‑spam models for secondary analysis.

Effectiveness illustration
Effectiveness illustration
Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

GoHot Reloadbackend servicesensitive-word detectionJSON-RPCtext segmentation
Huajiao Technology
Written by

Huajiao Technology

The Huajiao Technology channel shares the latest Huajiao app tech on an irregular basis, offering a learning and exchange platform for tech enthusiasts.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.