Fundamentals 9 min read

Understanding Deep vs Shallow Copy in pandas: A Practical Guide

This article explains how pandas' copy method works with deep and shallow options, demonstrates their effects on mutable and immutable data, and shows how to safely modify DataFrames using Python's copy module to avoid unintended changes to the original data.

Python Crawling & Data Mining
Python Crawling & Data Mining
Python Crawling & Data Mining
Understanding Deep vs Shallow Copy in pandas: A Practical Guide

Rescuing pandas (9) – A Brief Look at Deep and Shallow Copy in pandas

Many users avoid pandas because copying behavior can be confusing; this series aims to help them fall in love with pandas again.

Platform

Windows 10

Python 3.8

pandas >= 1.2.4

Data Requirement

When you want to modify a pandas object without affecting the original DataFrame, you often use df.copy(). However, if the data contains mutable objects, the copy may not behave as expected. The official documentation clarifies this.

.copy(deep=True)

Function signature and description:

Signature: df.copy(deep: 'bool_t' = True) -> 'FrameOrSeries'
Docstring:
Make a copy of this object's indices and data.

When ``deep=True`` (default), a new object is created with a copy of the calling object's data and indices. Modifications to the copy do not affect the original.

When ``deep=False``, a new object is created without copying the data or index (only references are copied). Changes to the original data are reflected in the shallow copy and vice versa.

Parameters
----------
deep : bool, default True
    Make a deep copy, including a copy of the data and the indices.
    With ``deep=False`` neither the indices nor the data are copied.

Returns
-------
copy : Series or DataFrame
    Object type matches caller.

Notes
-----
When ``deep=True``, data is copied but Python objects are not copied recursively—only references are copied. This differs from ``copy.deepcopy`` which recursively copies objects.

Index objects are copied when ``deep=True``, but the underlying NumPy array is not copied for performance reasons. Since ``Index`` is immutable, the data can be safely shared.

The documentation notes that deep copy does not recursively copy mutable Python objects, and that immutable Index objects are safely shared.

Function Usage

Creating a DataFrame with mutable and immutable columns:

import pandas as pd

data = {
    'A': [1, 2, 3],
    'B': [['厉害', '真棒'], ['值得鼓励', '继续加油'], ['相信未来--勇闯天涯']],
    'C': [
        {'key': '试一试', 'value': 'try'},
        {'key': '看一看', 'value': 'look'},
        {'key': '拍一拍', 'value': 'tickle'}
    ]
}
df = pd.DataFrame(data)
df_shallow = df.copy(deep=False)  # shallow copy
df_copy = df.copy()               # deep copy

Modifying column A (immutable values) does not affect the copies:

df.loc[0, 'A'] = 50

After this change, df_shallow reflects the modification, while df_copy remains unchanged, confirming that df.copy(deep=False) creates a shallow copy.

Modifying column B (contains mutable lists) affects all copies because the underlying list objects are shared:

df.loc[0, 'B'][0] = 'lihai'

The change propagates to the original data structure as well.

To safely modify mutable objects without affecting the original, use Python's copy or deepcopy:

from copy import copy, deepcopy

def value_upper(dic):
    """Create a shallow copy of the dict and uppercase its 'value' field"""
    dic = copy(dic)  # use deepcopy for deeper nesting
    dic['value'] = dic['value'].upper()
    return dic

# Apply without altering the original column
df_copy['C'].apply(value_upper)  # does not change df_copy['C']

Assigning the result back updates only the copy:

df_copy['C'] = df_copy['C'].apply(value_upper)

Summary

When modifying pandas objects, use df.copy() to avoid unintended changes to the original DataFrame. For data containing mutable objects, pandas' shallow copy behaves like a reference, so you may need Python's copy.deepcopy or explicit copying of nested structures to achieve true isolation.

Understanding the distinction between pandas' copy and Python's deepcopy helps prevent subtle bugs in data processing pipelines.

Written on March 20, 2022

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

Pythondeep copydataframepandascopyshallow copy
Python Crawling & Data Mining
Written by

Python Crawling & Data Mining

Life's short, I code in Python. This channel shares Python web crawling, data mining, analysis, processing, visualization, automated testing, DevOps, big data, AI, cloud computing, machine learning tools, resources, news, technical articles, tutorial videos and learning materials. Join us!

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.