Master Pandas Data Merging: concat, merge, append, and join Explained
This tutorial explains pandas' four primary data merging functions—concat, merge, append, and join—detailing their usage, key parameters, and practical code examples to help you efficiently combine DataFrames in various scenarios.
Data merging is a crucial step in data processing, and pandas offers four common methods— concat(), merge(), append(), and join() —to combine DataFrames in flexible ways.
1. concat()
concat()can concatenate two or more DataFrames along rows (axis 0) or columns (axis 1). By default it performs an outer union on the row axis.
Usage
pd.concat(
objs,
axis=0,
join='outer',
ignore_index=False,
keys=None,
levels=None,
names=None,
verify_integrity=False,
sort=False,
copy=True,
)Main parameters objs: a sequence or mapping of DataFrame/Series objects to concatenate. axis: 0 for rows (index), 1 for columns. join: 'inner' (intersection) or 'outer' (union). Default is 'outer'. ignore_index: if True, reset the index to a continuous range. keys: add hierarchical keys to the concatenated axis. names: names for the levels of the hierarchical index.
Example
df1 = pd.DataFrame({
'char': ['a', 'b'],
'num': [1, 2]
})
df2 = pd.DataFrame({
'char': ['b', 'c'],
'num': [3, 4]
})
# Simple row‑wise concatenation (default outer join)
pd.concat([df1, df2])Resetting the index:
pd.concat([df1, df2], ignore_index=True)Adding hierarchical keys:
pd.concat([df1, df2], keys=['d1', 'd2'])Labeling the index levels:
pd.concat([df1, df2], keys=['d1', 'd2'], names=['DF Name', 'Row ID'])2. merge()
merge()performs database‑style joins between two DataFrames, aligning on column(s) or index. The default join is an inner join on the intersection of column names.
Usage
pd.merge(
left,
right,
how='inner',
on=None,
left_on=None,
right_on=None,
left_index=False,
right_index=False,
sort=False,
suffixes=('_x', '_y'),
copy=True,
indicator=False,
validate=None,
)Key parameters left, right: DataFrames to join. how: 'left', 'right', 'outer', 'inner'. Default 'inner'. on: column name(s) to join on; if omitted, uses the intersection of column names. left_on, right_on: column(s) from left/right DataFrames when names differ. left_index, right_index: join on index instead of columns. suffixes: suffixes added to overlapping column names. sort: sort the result by join keys.
Example
df1 = pd.DataFrame({
'name': ['A1', 'B1', 'C1'],
'grade': [60, 70, 80]
})
df2 = pd.DataFrame({
'name': ['B1', 'C1', 'D1'],
'grade': [70, 80, 100]
})
# Default inner join on the common column "name"
df1.merge(df2)Outer join (union):
df1.merge(df2, how='outer')Joining on different column names with custom suffixes:
df1 = pd.DataFrame({
'name1': ['A1', 'B1', 'B1', 'C1'],
'grade': [60, 70, 80, 90]
})
df2 = pd.DataFrame({
'name2': ['B1', 'C1', 'D1', 'E1'],
'grade': [70, 80, 90, 100]
})
df1.merge(df2, left_on='name1', right_on='name2')Adding custom suffixes to overlapping columns:
df1.merge(df2, left_on='name1', right_on='name2', suffixes=('_1', '_2'))3. append()
append()adds rows from another DataFrame or Series to the end of a DataFrame. It is essentially a shortcut for concat(..., axis=0) with fewer parameters.
Usage
df1.append(
other,
ignore_index=False,
verify_integrity=False,
sort=False,
)Parameters other: DataFrame, Series, or list of these to append. ignore_index: if True, reset the index to a continuous range. verify_integrity: raise an error if duplicate indices are created. sort: sort columns if they are not aligned.
Example
df1 = pd.DataFrame([[1, 2], [3, 4]], columns=list('AB'))
df2 = pd.DataFrame([[5, 6], [7, 8]], columns=list('BC'))
# Append df2 to df1 (columns not aligned will be filled with NaN)
df1.append(df2)Appending with index reset:
df1.append(df2, ignore_index=True)4. join()
join()merges two DataFrames based on their index (or a column when using the on argument). It is a convenient shortcut for left‑join operations.
Usage
df1.join(
other,
on=None,
how='left',
lsuffix='',
rsuffix='',
sort=False,
)Parameters other: DataFrame or Series to join. on: column name in the caller to join on; defaults to index. how: 'left', 'right', 'outer', 'inner'. Default 'left'. lsuffix, rsuffix: suffixes for overlapping column names. sort: sort the result by the join keys.
Example
df1 = pd.DataFrame({
'A': ['A0', 'A1', 'A2', 'A3', 'A4'],
'val': ['V0', 'V1', 'V2', 'V3', 'V4']
})
df2 = pd.DataFrame({
'B': ['B3', 'B4', 'B5'],
'val': ['V3', 'V4', 'V5']
})
# Join on the "val" column by setting it as index in both frames
df1.set_index('val').join(df2.set_index('val'))Same join using the on argument:
df1.join(df2.set_index('val'), on='val')Outer join example:
df1.join(df2.set_index('val'), on='val', how='outer')Summary of the four methods
concat()can combine pandas objects along any axis and optionally add a hierarchical index. join() is primarily used for column‑wise merging based on row indexes. merge() provides database‑style joins, aligning on columns or indexes. append() and join() can be seen as simplified versions of concat() and merge() with fewer parameters and higher ease of use.
Choosing the appropriate method depends on the specific merging scenario; practicing these functions will reinforce your understanding.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Python Crawling & Data Mining
Life's short, I code in Python. This channel shares Python web crawling, data mining, analysis, processing, visualization, automated testing, DevOps, big data, AI, cloud computing, machine learning tools, resources, news, technical articles, tutorial videos and learning materials. Join us!
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
