Big Data 16 min read

Social Network Analysis and Graph Database Solution for 58 Community Using JanusGraph and Spark GraphX

This article describes how the 58 community builds a large‑scale social network graph, evaluates graph databases such as Neo4j, JanusGraph and HugeGraph, implements centrality metrics with Spark GraphX, and designs a JanusGraph‑based pipeline for detecting valuable and fraudulent users.

58 Tech
58 Tech
58 Tech
Social Network Analysis and Graph Database Solution for 58 Community Using JanusGraph and Spark GraphX

As the 58 community deepens its social network analysis applications, the increasing complexity of data demands a solution capable of mining valuable users, analyzing their relationships, and detecting cheating users among tens of millions of members.

58 Community Network Overview

The network is a point‑based topology where each node represents a user and edges represent interactions such as posts, comments, follows, and likes. By linking users through these behaviors, a massive graph is constructed for analysis.

Graph Database Research

Popular graph databases such as Neo4j, JanusGraph and HugeGraph were compared. The comparison table shows support for scalability, storage engines, transactions, partitioning, full‑text search, indexing, and other features.

Graph Storage

Neo4j

JanusGraph

HugeGraph

Scalability

Not supported

Supported

Supported

Storage Engine

Standalone

Supports HBase, Cassandra, etc.

Supports HBase, Cassandra, MySQL, etc.

Transactions

Not supported

Supported

RC‑level supported

Graph Partition

Not supported

Supported

Supported

Full‑text Search

Lucene

ES, Solr, Lucene

Built‑in

In‑Memory Store

Supported

Supported

Supported

Secondary Index

Supported

Supported

Supported

Range Index

Supported

Not supported

Supported

Persistence

Supported

Supported

Supported

Composite Index

Supported

Supported

Supported

Neo4j and JanusGraph provide good query capabilities, but Neo4j lacks distributed architecture while JanusGraph lacks built‑in graph algorithms. Therefore Spark is used for large‑scale computation, with JanusGraph as the storage engine.

Social Network Centrality

Centrality measures how central a node is within the network. Three common metrics are degree centrality, closeness centrality, and betweenness (intermediary) centrality.

Degree Centrality

Degree centrality is the total number of direct connections a node has. The formula is shown below:

High degree users are typically “big V” users with many followers.

Closeness Centrality

Closeness centrality measures the sum of the shortest distances from a node to all other nodes. It is defined as the number of nodes divided by the total distance:

Users with high closeness are well‑connected to many others, indicating strong social influence.

Betweenness (Intermediary) Centrality

Betweenness centrality counts how often a node lies on the shortest paths between other node pairs. The formula is illustrated below:

High‑betweenness users act as bridges between community clusters, facilitating information flow.

JanusGraph Architecture

JanusGraph supports massive graph storage, real‑time traversal, and OLAP analytics. Its architecture includes OLTP query, OLAP computation, transaction management, and compatibility with multiple storage back‑ends (Cassandra, HBase) and index back‑ends (Elasticsearch, Solr).

JanusGraph Cluster

The cluster diagram shows JanusGraph nodes connected to HBase for storage and Elasticsearch for indexing.

58 Community User Graph Framework

Users are modeled as nodes with properties (id, age, gender, level, degree, closeness, betweenness, etc.) and edges represent follow, like, and comment actions with attributes such as timestamp and count.

Node label: User

Edge labels: FOLLOW, LIKE, COMMENT

Node properties: node_id, age, name, degree, closeness, betweenness, …

Edge properties: date, values, …

Bulk Import into JanusGraph

Initial imports via JanusGraph server were slow for large datasets. To improve performance, the import tool was extended to connect directly to HBase and Elasticsearch, support batch transactions, enable multi‑worker parallel writes, and automatically create schema and indexes.

The optimized tool significantly reduced import time, as shown in the speed comparison chart.

System Effect and Demonstration

The pipeline automatically identifies cheating users (high degree, low closeness/betweenness) and high‑value users (high scores on all centrality metrics), improving community health and user experience.

Conclusion and Outlook

Integrating JanusGraph with Spark GraphX enables effective value‑user mining for the 58 community. Future work will explore additional graph algorithms, community detection, link analysis, and richer user tagging to support broader business scenarios.

big datagraph databaseJanusGraphsocial network analysisSpark GraphXcentrality
58 Tech
Written by

58 Tech

Official tech channel of 58, a platform for tech innovation, sharing, and communication.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.