Tagged articles
12 articles
Page 1 of 1
Baidu Geek Talk
Baidu Geek Talk
Nov 16, 2022 · Artificial Intelligence

How Baidu’s Ernie‑SimCSE Uses Contrastive Learning to Crush Spam Promotion

This article explains how Baidu's anti‑spam team tackled large‑scale promotional spam on Baidu Zhidao by combining the Ernie pretrained model with SimCSE contrastive learning, detailing the problem background, traditional methods, text‑representation stages, the SimCSE approach, training pipeline, optimizations, and experimental results.

ErnieNLPSimCSE
0 likes · 15 min read
How Baidu’s Ernie‑SimCSE Uses Contrastive Learning to Crush Spam Promotion
360 Tech Engineering
360 Tech Engineering
Nov 13, 2019 · Artificial Intelligence

Text Anti‑Spam Techniques and TextCNN Model for Real‑Time Spam Detection on the Huajiao Platform

This article introduces the Huajiao platform's text anti‑spam architecture, analyzes spam categories and challenges, compares rule‑based and machine‑learning approaches, details traditional NLP methods and the TextCNN deep‑learning model, provides its TensorFlow implementation, and describes the online deployment workflow.

CNNNLPTensorFlow
0 likes · 14 min read
Text Anti‑Spam Techniques and TextCNN Model for Real‑Time Spam Detection on the Huajiao Platform
High Availability Architecture
High Availability Architecture
Jul 12, 2018 · Information Security

Evolution of Zhihu’s Anti‑Cheat System “Wukong”: Architecture, Strategies, and Lessons Learned

This article chronicles the three‑generation evolution of Zhihu’s anti‑cheat platform Wukong, detailing its business context, spam taxonomy, multi‑layered control methods, architectural redesigns, strategy language improvements, graph‑based risk analysis, and the continuous integration of big‑data and machine‑learning techniques to combat content and behavior spam.

Big Dataanti-cheatgraph-analysis
0 likes · 23 min read
Evolution of Zhihu’s Anti‑Cheat System “Wukong”: Architecture, Strategies, and Lessons Learned
dbaplus Community
dbaplus Community
Oct 30, 2017 · Big Data

How to Build a Real‑Time Spam Monitoring System with Apache Storm

This article walks through the design, deployment, and code implementation of a real‑time spam detection pipeline using Apache Storm, comparing it with Hadoop, detailing cluster setup, topology components, data flow, and how to package and run the solution on a distributed Storm cluster.

Apache StormBig DataHibernate
0 likes · 13 min read
How to Build a Real‑Time Spam Monitoring System with Apache Storm
Nightwalker Tech
Nightwalker Tech
Mar 2, 2017 · Information Security

Techniques and Tools for Anti‑Spam Content Filtering in PHP

The discussion outlines practical anti‑spam strategies—including text length limits, keyword replacement, trie‑based data structures, AC automata, Bayesian and vector‑similarity algorithms, and PHP extensions such as libdatrie—while also sharing performance metrics and resource links for implementing robust content filtering systems.

PHPTriecontent filtering
0 likes · 4 min read
Techniques and Tools for Anti‑Spam Content Filtering in PHP
21CTO
21CTO
Mar 4, 2016 · Artificial Intelligence

How Do We Analyze Influence and Spam on Sina Weibo? Algorithms Explained

This article introduces a range of algorithms for Sina Weibo—including tag propagation, user similarity via LDA, time‑aware weighting, community detection, PageRank‑based influence ranking, and spam user identification—to illustrate how social network analysis can uncover user interests, influence, and malicious behavior.

LDAPageRankSocial network
0 likes · 17 min read
How Do We Analyze Influence and Spam on Sina Weibo? Algorithms Explained
21CTO
21CTO
Oct 24, 2015 · Artificial Intelligence

Building an Offline Recommendation System with Mahout: Practical Steps and Tips

This article walks through the end‑to‑end process of building an offline recommendation system using Mahout, covering data collection, filtering, storage, various collaborative‑filtering algorithms, similarity measures, evaluation metrics, parameter tuning, AB testing, and spam‑fighting strategies.

Mahoutcollaborative filteringmachine learning
0 likes · 16 min read
Building an Offline Recommendation System with Mahout: Practical Steps and Tips
Qunar Tech Salon
Qunar Tech Salon
Oct 10, 2015 · Fundamentals

Overview of Search Engine Architecture and Core Technologies

This article provides a comprehensive overview of search engine evolution, core technologies such as crawling, indexing, retrieval and link analysis, platform foundations including cloud storage and computing, and techniques for improving search results through anti‑spam, user‑intent analysis, deduplication and caching.

Link Analysiscloud computingcrawling
0 likes · 15 min read
Overview of Search Engine Architecture and Core Technologies

Social Network Analysis on Weibo: Label Propagation, User Similarity, Community Detection, Influence Ranking, and Spam User Identification

This article introduces a series of algorithms for analyzing the Weibo social network, including label propagation, LDA‑based user similarity, time‑aware and interaction‑aware similarity measures, community detection, influence ranking via PageRank variants, and methods for identifying spam users, illustrating how these techniques can be applied to large‑scale social media data.

Big DataSocial Network Analysisinfluence ranking
0 likes · 19 min read
Social Network Analysis on Weibo: Label Propagation, User Similarity, Community Detection, Influence Ranking, and Spam User Identification
Architect
Architect
May 22, 2015 · Big Data

Weibo Social Network Analysis: Label Propagation, Similarity Measures, Community Detection, Influence Ranking and Spam User Identification

The article presents a comprehensive overview of algorithms for analyzing Weibo’s social network, covering label propagation, user similarity via LDA, temporal and interaction factors, community detection, influence ranking using PageRank variants, and methods for identifying spam accounts.

LDASocial Network Analysiscommunity-detection
0 likes · 16 min read
Weibo Social Network Analysis: Label Propagation, Similarity Measures, Community Detection, Influence Ranking and Spam User Identification