How NTE Algorithm Accelerates New Common‑Friend Discovery in Billion‑Scale Graphs
Introducing the NTE (New Triangle Enumeration) algorithm, a divide‑and‑conquer approach that transforms the computation of newly added common friends in massive social graphs into efficient triangle enumeration tasks, with detailed implementations using GraphX‑based GTE, join‑based JTE, and sort‑based STE methods.
Background and Idea
Common friends are a typical social feature widely used in recommendation, advertising, and gaming. When user volume reaches billions, computing the full common‑friend list monthly becomes costly and loses timeliness. Computing newly added common friends is therefore more valuable.
New common‑friend discovery can be modeled as new triangle enumeration: when users A and B both add friend C, a new triangle A‑B‑C is formed.
Model Introduction
The proposed New Triangle Enumeration (NTE) algorithm classifies new triangles into three types based on the number of new edges: new1 (one new edge), new2 (two new edges), and new3 (three new edges). Each type is processed with a dedicated computation mode.
Algorithm Implementation
Part1: new3 Triangle Computation
All three edges are new; the workload is smallest. We use the GraphX‑based GTE (Graph based Triangle Enumeration) algorithm.
1. Collect neighbor information
Build an initial graph from the new edge list, then traverse each vertex to gather its neighbors as vertex attributes.
2. Compute common friends
For each edge, intersect the neighbor sets of its two endpoints to obtain common friends.
3. Form friend triangles
Combine each edge with its common friends to create ordered triangles (A<B<C) and send them to the vertex with the smallest id.
4. Aggregate triangles per vertex
Collect triangles arriving at each vertex from different edges and deduplicate them.
Part2: new2 Triangle Computation
new2 triangles consist of two new edges and one existing edge. We apply the Join‑based Triangle Enumeration (JTE) algorithm, which joins the new edge set with itself to find candidate pairs, maps them to a key ((B,C), A), then joins with the historical edge set to confirm the third edge.
Part3: new1 Triangle Computation
new1 triangles have one new edge and two existing edges, representing the heaviest workload. We use the Sort‑based Triangle Enumeration (STE) algorithm:
1. Connect directed edges
Read new and historical edge sets, keep only directed edges (srcId<dstId), and join new edge A‑B with historical edge A‑C to produce tuples (A,(B,C)).
2. Order filtering
Filter tuples to ensure B<C, guaranteeing ordered triples.
3. Key conversion
Transform each tuple so that the potential edge D‑E becomes the join key.
4. Triangle formation
Join the transformed tuples with the historical edge set to verify D‑E, then convert to ordered triangle (A,D,E).
Algorithm Tuning
Repartition tables after loading from TDW to mitigate data skew.
Set executor_cores to 3 or 4 to fully utilize CPU and memory for Spark tasks.
Use edge‑partitioning strategy (EdgePartition2D) instead of default vertex partitioning for large edge counts.
Split large Spark jobs into smaller tasks to reduce shuffle volume and avoid memory overflow.
References
Elenberg E R, Shanmugam K, Borokhovich M, et al. Beyond triangles: A distributed framework for estimating 3‑profiles of large graphs. KDD 2015.
Kim J, Han W S, Lee S, et al. OPT: a new framework for overlapped and parallel triangulation in large‑scale graphs. SIGMOD 2014.
Park H M, Silvestri F, Kang U, et al. MapReduce triangle enumeration with guarantees. CIKM 2014.
Cui Y, Xiao D, Loguinov D. On Efficient External‑Memory Triangle Listing. ICDM 2016.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
21CTO
21CTO (21CTO.com) offers developers community, training, and services, making it your go‑to learning and service platform.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
