Mastering Pandas: Merge, Join, and Concat Techniques for Efficient Data Merging
This guide walks through Pandas' three core data‑combination methods—merge, join, and concat—explaining their syntax, key parameters, and join types (inner, outer, left, right) with clear code examples and practical tips for seamless DataFrame integration.
Merge Method
Pandas provides high‑performance, SQL‑like in‑memory join operations such as merge , join , and concat for combining Series or DataFrame objects based on common columns or indexes.
merge combines tables using column labels (or index) as keys. Its basic syntax is:
result = pd.merge(left, right, how='inner', on=None, left_on=None, right_on=None, left_index=False, right_index=False, ...)Key parameters: left / right: DataFrames to merge. how: Merge type – 'inner' (intersection), 'outer' (union), 'left', or 'right'. on: Column(s) to join on (must exist in both frames). left_on / right_on: Columns to join when names differ. left_index / right_index: Use index as join key when True.
Example (Code Listing 1) demonstrates various join types using two sample DataFrames:
# Code Listing 1 – merge examples
import pandas as pd
left = pd.DataFrame({'key1': ['K0','K1','K2'], 'key2': ['K0','K1','K0'], 'A': ['A0','A1','A2'], 'B': ['B0','B1','B2']})
right = pd.DataFrame({'key1': ['K0','K1','K2'], 'key2': ['K0','K0','K0'], 'C': ['C0','C1','C2'], 'D': ['D0','D1','D2']})
print('left:
', left)
print('right:
', right)
# inner join on key1
result1 = pd.merge(left, right, on='key1')
print('Inner join on key1:
', result1)
# inner join on both keys
result2 = pd.merge(left, right, on=['key1','key2'])
print('Inner join on key1 & key2:
', result2)
# outer join on both keys
result3 = pd.merge(left, right, how='outer', on=['key1','key2'])
print('Outer join on key1 & key2:
', result3)
# left join on both keys
result4 = pd.merge(left, right, how='left', on=['key1','key2'])
print('Left join on key1 & key2:
', result4)
# right join on both keys
result5 = pd.merge(left, right, how='right', on=['key1','key2'])
print('Right join on key1 & key2:
', result5)The output shows how rows are matched, how duplicate column names are suffixed (e.g., key2_x, key2_y), and how missing values are filled with NaN in outer, left, or right joins.
Join Method
The join method provides a concise, index‑based way to merge DataFrames, supporting the same join types as merge . Its signature is:
result = data.join(other, on=None, how='left', ...) data: Left DataFrame. other: Right DataFrame. on: Column(s) from data to align with other 's index. how: Join type – 'left', 'right', 'inner', or 'outer'.
Example (Code Listing 2) shows left, inner, and right joins using index labels and the on parameter:
# Code Listing 2 – join examples
import pandas as pd
left = pd.DataFrame({'A': ['A0','A1','A2'], 'B': ['B0','B1','B2']}, index=['K0','K1','K2'])
right = pd.DataFrame({'C': ['C0','C2','C3'], 'D': ['D0','D2','D3']}, index=['K0','K2','K3'])
print('left:
', left)
print('right:
', right)
# left join using index
result1 = left.join(right)
print('Left join (join):
', result1)
# inner join using index
result2 = left.join(right, how='inner')
print('Inner join (join):
', result2)
# left join using merge with index
result3 = pd.merge(left, right, left_index=True, right_index=True, how='left')
print('Left join (merge):
', result3)
# join on a column key
left2 = pd.DataFrame({'key': ['K0','K1','K0'], 'A': ['A0','A1','A2'], 'B': ['B0','B1','B2']})
print('left2:
', left2)
result4 = left2.join(right, on='key')
print('Left2 join on key (join):
', result4)
result5 = pd.merge(left2, right, left_on='key', right_index=True, how='left')
print('Left2 join on key (merge):
', result5)Concat Method
The concat function concatenates a sequence of Pandas objects along a particular axis, optionally performing inner or outer set‑logic on the other axis.
result = pd.concat(objs, axis=0, join='outer', ignore_index=False, keys=None, ...) objs: List or dict of Series/DataFrames to concatenate. axis: 0 for vertical (row) concatenation, 1 for horizontal (column) concatenation. join: 'outer' (union) or 'inner' (intersection). ignore_index: If True, creates a new integer index. keys: Labels for hierarchical indexing when objects share axis labels.
Example (Code Listing 3) demonstrates vertical concatenation of three DataFrames, horizontal concatenation with keys, and handling of mismatched columns:
# Code Listing 3 – concat examples
import pandas as pd
df1 = pd.DataFrame({'A': ['A0','A1','A2'], 'B': ['B0','B1','B2'], 'C': ['C0','C1','C2'], 'D': ['D0','D1','D2']}, index=[0,1,2])
df2 = pd.DataFrame({'A': ['A3','A4','A5'], 'B': ['B3','B4','B5'], 'C': ['C3','C4','C5'], 'D': ['D3','D4','D5']}, index=[3,4,5])
df3 = pd.DataFrame({'A': ['A6','A7','A8'], 'B': ['B6','B7','B8'], 'C': ['C6','C7','C8'], 'D': ['D6','D7','D8']}, index=[6,7,8])
# vertical outer concat
result1 = pd.concat([df1, df2, df3])
print('Vertical concat:
', result1)
# horizontal outer concat with keys
result2 = pd.concat([df1, df2], axis=1, keys=['df1','df2'])
print('Horizontal concat with keys:
', result2)
# horizontal concat using join method for comparison
result3 = df1.join(df2, how='outer', lsuffix='_df1', rsuffix='_df2')
print('Horizontal concat via join:
', result3)
# vertical concat with ignore_index
result4 = pd.concat([df1, df3], ignore_index=True)
print('Vertical concat with new index:
', result4)
# inner horizontal concat of result1 and another df
df4 = pd.DataFrame({'B': ['B2','B3','B6'], 'D': ['D2','D3','D6'], 'F': ['F2','F3','F6']}, index=[2,3,6])
result5 = pd.concat([result1, df4], axis=1, join='inner', keys=['result1','df4'])
print('Inner horizontal concat:
', result5)Tips
Both merge and join return a new DataFrame and do not modify the originals.
join is often simpler and faster for index‑based merges, while merge offers more flexibility with column keys.
concat is ideal for stacking DataFrames vertically or horizontally, especially when dealing with homogeneous structures.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Python Crawling & Data Mining
Life's short, I code in Python. This channel shares Python web crawling, data mining, analysis, processing, visualization, automated testing, DevOps, big data, AI, cloud computing, machine learning tools, resources, news, technical articles, tutorial videos and learning materials. Join us!
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
