Fundamentals 7 min read

Master pandas merge: Combine Multiple DataFrames Like a Pro

This tutorial explains how to horizontally merge three pandas DataFrames on column A using concat, join, and merge, demonstrates handling missing values, shows iterative merging with itertools.accumulate, and provides practical code snippets for flexible data‑frame combination.

Python Crawling & Data Mining
Python Crawling & Data Mining
Python Crawling & Data Mining
Master pandas merge: Combine Multiple DataFrames Like a Pro

Data Requirement

We need to horizontally merge three DataFrames on column A so that the final result contains the columns A, num, label, count, similar to a SQL join.

Requirement Breakdown

In Python, list operations such as append, extend, and + can extend lists, while update can modify dictionaries. pandas offers similar merging functions: concat, join, and merge. concat merges regardless of column alignment as long as the axis is compatible. join requires setting a key column as the index; it can only be used on a DataFrame object, not directly on the DataFrame class. merge can combine a left and right DataFrame; it works pairwise, automatically detecting common column names when on is omitted.

Solution

For side‑by‑side merging, concat with the appropriate axis parameter solves the problem.

# Use a as the base DataFrame and join b and c
# The key column must be set as index for join
# on specifies the column name (already indexed)
# how defaults to 'left'; here we use 'outer'
a.set_index('A').join(b.set_index('A'), on='A', how='outer')

The above merges on column A; missing values remain as NaN while the column name stays A.

Attempting to join multiple tables at once:

# This raises an error because 'on' is not supported for multiple joins
a.set_index('A').join([b.set_index('A'), c.set_index('A')], how='outer')

When merging two DataFrames, omitting the on parameter also treats the common column as the join key.

# Three equivalent ways to merge
pd.merge(a, b, how='outer', on='A')
a.merge(b, how='outer', on='A')
a.merge(b, how='outer')

To merge more than two DataFrames, chain merges:

a.merge(b, how='outer', on='A').merge(c, how='outer', on='A')

For many DataFrames, a loop or itertools.accumulate can automate the process:

from itertools import accumulate
import pandas as pd

data_list = [a, b, c]
res_iter = accumulate(data_list, lambda x, y: x.merge(y, how='outer', on='A'))
for _ in range(len(data_list)):
    result = next(res_iter)
result

Summary

There are multiple ways to merge DataFrames; choose the one that fits your workflow. Simpler methods are easier to understand, while more advanced techniques like accumulate can handle many tables efficiently.

PythonmergeDataFramepandasdata-manipulation
Python Crawling & Data Mining
Written by

Python Crawling & Data Mining

Life's short, I code in Python. This channel shares Python web crawling, data mining, analysis, processing, visualization, automated testing, DevOps, big data, AI, cloud computing, machine learning tools, resources, news, technical articles, tutorial videos and learning materials. Join us!

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.