Master pandas merge: Combine Multiple DataFrames Like a Pro
This tutorial explains how to horizontally merge three pandas DataFrames on column A using concat, join, and merge, demonstrates handling missing values, shows iterative merging with itertools.accumulate, and provides practical code snippets for flexible data‑frame combination.
Data Requirement
We need to horizontally merge three DataFrames on column A so that the final result contains the columns A, num, label, count, similar to a SQL join.
Requirement Breakdown
In Python, list operations such as append, extend, and + can extend lists, while update can modify dictionaries. pandas offers similar merging functions: concat, join, and merge. concat merges regardless of column alignment as long as the axis is compatible. join requires setting a key column as the index; it can only be used on a DataFrame object, not directly on the DataFrame class. merge can combine a left and right DataFrame; it works pairwise, automatically detecting common column names when on is omitted.
Solution
For side‑by‑side merging, concat with the appropriate axis parameter solves the problem.
# Use a as the base DataFrame and join b and c
# The key column must be set as index for join
# on specifies the column name (already indexed)
# how defaults to 'left'; here we use 'outer'
a.set_index('A').join(b.set_index('A'), on='A', how='outer')The above merges on column A; missing values remain as NaN while the column name stays A.
Attempting to join multiple tables at once:
# This raises an error because 'on' is not supported for multiple joins
a.set_index('A').join([b.set_index('A'), c.set_index('A')], how='outer')When merging two DataFrames, omitting the on parameter also treats the common column as the join key.
# Three equivalent ways to merge
pd.merge(a, b, how='outer', on='A')
a.merge(b, how='outer', on='A')
a.merge(b, how='outer')To merge more than two DataFrames, chain merges:
a.merge(b, how='outer', on='A').merge(c, how='outer', on='A')For many DataFrames, a loop or itertools.accumulate can automate the process:
from itertools import accumulate
import pandas as pd
data_list = [a, b, c]
res_iter = accumulate(data_list, lambda x, y: x.merge(y, how='outer', on='A'))
for _ in range(len(data_list)):
result = next(res_iter)
resultSummary
There are multiple ways to merge DataFrames; choose the one that fits your workflow. Simpler methods are easier to understand, while more advanced techniques like accumulate can handle many tables efficiently.
Python Crawling & Data Mining
Life's short, I code in Python. This channel shares Python web crawling, data mining, analysis, processing, visualization, automated testing, DevOps, big data, AI, cloud computing, machine learning tools, resources, news, technical articles, tutorial videos and learning materials. Join us!
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
