How Do We Analyze Influence and Spam on Sina Weibo? Algorithms Explained

This article introduces a range of algorithms for Sina Weibo—including tag propagation, user similarity via LDA, time‑aware weighting, community detection, PageRank‑based influence ranking, and spam user identification—to illustrate how social network analysis can uncover user interests, influence, and malicious behavior.

21CTO
21CTO
21CTO
How Do We Analyze Influence and Spam on Sina Weibo? Algorithms Explained

Weibo is a widely used social platform where users regularly create original posts, repost, reply, read, follow, and mention others. Understanding user interests, influence, and detecting spam requires a suite of algorithms that model both content and network structure.

Tag Propagation

Each user is assigned one or more interest tags. The basic assumption is that a user's friends or followers share the same interests. The algorithm iteratively updates tags based on the most frequent tags among a user's connections, optionally weighting friends and followers differently.

Initialize tags for a subset of users.

For each user, count the tags of their friends and followers and assign the most frequent tag(s).

Repeat step 2 until tag assignments stabilize.

User Similarity Calculation

When the simple tag‑propagation assumption fails, similarity between users is computed. All of a user's posts are aggregated and represented as a bag‑of‑words vector; similarity can be measured with cosine distance or KL divergence. A more sophisticated method uses LDA to obtain a topic distribution for each user, then compares these distributions.

LDA Generation Process

For each document, draw a topic from the document's topic distribution.

From the chosen topic, draw a word according to the topic's word distribution.

Repeat steps 1 and 2 until the document is fully generated.

The resulting topic vectors are used with cosine or KL distance to weight the tag‑propagation step.

Time and Network Factors

Interests evolve over time, so similarity should consider recent posts. Selecting the N most recent posts (e.g., the latest 50) for each user before LDA training captures temporal dynamics. Additionally, interaction types such as reposts, replies, and mentions provide extra network signals: higher repost or @ frequency between two users suggests greater similarity.

Community Detection

Communities are groups of tightly connected users. Two similarity measures are introduced:

Common‑friend similarity (Jaccard of friend sets).

Common‑follower similarity (Jaccard of follower sets).

These measures, combined with shortest‑path similarity, can be fused (e.g., weighted sum) and fed into clustering algorithms such as K‑Means or DBSCAN to obtain community clusters.

Influence Calculation

Borrowing from PageRank, influence is propagated through the follower network. The algorithm iteratively distributes influence weight from each user to the users they follow until convergence.

Assign equal initial influence to all users.

Distribute each user's influence equally among the users they follow.

Update each user's influence as the sum of contributions from their followers.

Repeat steps 2–3 until the influence scores stabilize.

Additional factors—such as activity level, post quality (repost and reply counts), and interaction networks (reply, repost, @)—can be incorporated to refine the ranking.

Topic and Domain Factors

Influence scores can be applied to specific topics. By retrieving posts related to a hotspot topic (using hashtags or LDA‑derived topics) and running the influence algorithm, one can identify opinion leaders for that topic or domain.

Spam User Identification

Spam accounts exhibit distinctive patterns: regular posting intervals (low entropy), high @‑mention ratios, excessive URLs, and mismatched content between posts and linked pages. Structural cues such as abnormal follower‑friend ratios and lack of triadic closure also help. These features can be fed into classifiers (logistic regression, decision trees, Naïve Bayes) to flag spam users, and a PageRank‑style propagation can further estimate spam probabilities.

Conclusion

The presented algorithms provide a foundation for analyzing Sina Weibo data. While real‑world systems are more complex, the discussed methods—tag propagation, similarity via LDA, time‑aware weighting, community detection, influence ranking, and spam detection—demonstrate how social‑network analysis can uncover hidden patterns and improve platform services.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

PageRankLDAspam detectionWeiboSocial networkinfluence
21CTO
Written by

21CTO

21CTO (21CTO.com) offers developers community, training, and services, making it your go‑to learning and service platform.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.