Fundamentals 6 min read

How to Group Connected People Using Pandas and NetworkX in Python

An experienced Python user demonstrates how to group related individuals into connected components using pandas for data manipulation and networkx for graph analysis, providing complete code examples, visualizations, and step-by-step explanations to help readers solve similar connectivity problems.

Python Crawling & Data Mining
Python Crawling & Data Mining
Python Crawling & Data Mining
How to Group Connected People Using Pandas and NetworkX in Python

1. Introduction

Hello, I'm Pi Pi. A group member asked how to use ChatGPT to solve a data analysis problem.

2. Implementation

The first solution uses pandas to assign a group number to each person based on connections. The following code demonstrates the process:

import pandas as pd

data = [
    ['刘备', '关羽'], ['刘备', '张飞'],
    ['曹操', '夏侯'], ['张飞', '诸葛'],
    ['夏侯', '荀彧'], ['孙权', '鲁肃']
]

df = pd.DataFrame(data, columns=['发起', '接收'])

# Create an empty dictionary to store name‑to‑group mapping
groups = {}

# Iterate over each row of the DataFrame
for _, row in df.iterrows():
    sender = row['发起']
    receiver = row['接收']

    # If the sender is not yet in the mapping, assign a new group
    if sender not in groups:
        group = max(groups.values()) + 1 if groups else 1
        groups[sender] = group

    # If the receiver is not yet in the mapping, assign the same group as the sender
    if receiver not in groups:
        group = groups[sender]
        groups[receiver] = group

# Add the group column to the DataFrame
df['组别'] = df['发起'].map(groups)
print(df)

# Output the groups as a dictionary
result = {}
for k, v in groups.items():
    if v not in result:
        result[v] = k
    else:
        result[v] += "," + k
print(result)

Running the script produces a DataFrame with a new "组别" column and a dictionary that maps each group to its members.

Pandas result
Pandas result

Another approach leverages the networkx library to find connected components directly:

import networkx as nx

g = nx.Graph()
data = [
    ['刘备', '关羽'], ['刘备', '张飞'],
    ['曹操', '夏侯'], ['张飞', '诸葛'],
    ['夏侯', '荀彧'], ['孙权', '鲁肃']
]

g.add_edges_from(data)

for sub_g in nx.connected_components(g):
    g_node = g.subgraph(sub_g).nodes()
    print(g_node)

The output lists each connected component of the graph.

Networkx can also draw the graph:

from matplotlib import pyplot as plt
import networkx as nx

plt.rcParams['font.sans-serif'] = ['SimHei']
plt.rcParams['axes.unicode_minus'] = False

g = nx.Graph()
data = [
    ['刘备', '关羽'], ['刘备', '张飞'],
    ['曹操', '夏侯'], ['张飞', '诸葛'],
    ['夏侯', '荀彧'], ['孙权', '鲁肃']
]

g.add_edges_from(data)
nx.draw_networkx(g)

The resulting plot visualizes the relationships among the individuals.

3. Summary

This article showcases a typical graph connectivity problem in Python, offering both pandas and networkx solutions with complete code and visual results.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

Pythonpandasnetworkxgraph connectivity
Python Crawling & Data Mining
Written by

Python Crawling & Data Mining

Life's short, I code in Python. This channel shares Python web crawling, data mining, analysis, processing, visualization, automated testing, DevOps, big data, AI, cloud computing, machine learning tools, resources, news, technical articles, tutorial videos and learning materials. Join us!

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.