Advanced Pandas Data Manipulation Techniques
This article provides a comprehensive guide to using Pandas for complex queries, data type conversion, sorting, adding and modifying data, advanced filtering, iteration, and functional operations, offering numerous code examples that illustrate how to efficiently clean, transform, and analyze tabular data in Python.
Complex Queries
Demonstrates logical operations, slicing, .loc , .iloc , function filtering, comparison functions, df.query , df.eval , and df.filter for selecting data based on conditions.
<code># Logical operation examples
df.Q1 > 36
~(df.Q1 < 60) & (df['team'] == 'C')
# Slice with logical expression
df[df['Q1'] == 8]
# .loc example
df.loc[df['Q1'] > 90, 'Q1':]
# df.query example
df.query('Q1 > Q2 > 90')
</code>Data Type Conversion
Shows how to infer types, specify dtypes, use astype , and convert to datetime or timedelta.
<code># Infer types
df = df.infer_objects()
# Specify dtypes on read
df = pd.read_excel(data, dtype={'team':'string','Q1':'int32'})
# Convert with astype
df['Q1'] = df['Q1'].astype('int32')
# Convert to datetime
t = pd.Series(['20200801','20200802'])
</code>Data Sorting
Explains index sorting with sort_index , value sorting with sort_values , mixed sorting, and nsmallest / nlargest for selecting extreme values.
<code># Index sorting
df.sort_index()
# Value sorting
df.sort_values('Q4')
df.sort_values(by=['team','Q1'], ascending=[True, False])
# Smallest / largest
s.nsmallest(3)
df.nlargest(5, 'Q1')
</code>Adding and Modifying Data
Details modifying values, replacing data, filling missing values, renaming columns/indexes, adding columns via assignment, insert , assign , evaluating expressions, adding rows, appending DataFrames, and deleting rows or NaNs.
<code># Modify a single value
df.iloc[0,0] = 'Lily'
# Replace values
df.replace(0, 5)
# Fill missing values
df.fillna(0)
df.rename(columns={'team':'class'})
df['foo'] = df.Q1 + df.Q2
df.insert(2, 'total', df.sum(1))
df.assign(total=df.sum(1), Q=100)
# Add a new row
df.loc[100] = ['tom','A',88,88,88,88]
# Append another DataFrame
df = df.append(df2)
# Drop NaNs
df.dropna()
</code>Advanced Filtering
Introduces df.where , np.where , df.mask , and df.lookup for sophisticated filtering.
<code># df.where example
df.where(df > 70)
# np.where example
np.where(df >= 60, 'Pass', 'Fail')
# df.mask example
df.mask(s > 80)
# df.lookup example
df.lookup([1,3,4], ['Q1','Q2','Q3'])
</code>Iteration
Covers iterating over Series, DataFrame rows with iterrows , itertuples , items , and column iteration.
<code># Iterate over a column
for i in df.name:
print(i)
# iterrows
for idx, row in df.iterrows():
print(idx, row['name'], row.Q1)
# itertuples
for row in df.itertuples():
print(row)
</code>Function Application
Describes pipe , apply , applymap , map , agg , transform , and copy methods for functional data processing.
<code># pipe chaining
(df.pipe(h)
.pipe(g, arg1=a)
.pipe(f, arg2=b, arg3=c))
# apply on a column
df.name.apply(lambda x: x.lower())
# applymap on entire DataFrame
df.applymap(len)
# map on a Series
df.team.map({'A':'Class1','B':'Class2'})
# aggregation
df.agg(['max','min'])
# transform
df.transform(lambda x: x*2)
# copy
df_copy = df.copy()
</code>Python Programming Learning Circle
A global community of Chinese Python developers offering technical articles, columns, original video tutorials, and problem sets. Topics include web full‑stack development, web scraping, data analysis, natural language processing, image processing, machine learning, automated testing, DevOps automation, and big data.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.