Top 25 pandas tricks for efficient data analysis in Python
This tutorial presents 25 practical pandas techniques, covering version checking, DataFrame creation, column renaming, row and column reversal, dtype selection, type conversion, memory optimization, reading and concatenating multiple files, handling missing values, string splitting, series expansion, aggregation, pivot tables, categorizing continuous data, DataFrame styling, and profiling, all illustrated with clear code examples.
This article demonstrates a collection of 25 useful pandas tricks for data analysis.
Version and installation : Check pandas version with pd.__version__ and view dependency versions using pd.show_versions().
Creating DataFrames : Build a DataFrame from a dictionary or using np.random.rand() for larger datasets, and display it.
Renaming columns : Use df.rename({'old_name':'new_name'}, axis='columns'), assign directly with df.columns = ['col_one','col_two'], or replace spaces via df.columns = df.columns.str.replace(' ', '_').
Row and column reversal : Reverse rows with df.loc[::-1].reset_index(drop=True) and reverse columns with df.loc[:, ::-1].
Selecting by dtype : Retrieve numeric columns using df.select_dtypes(include='number') or object columns similarly.
Type conversion : Convert object columns to numeric with df['col'] = df['col'].astype(float) and handle non‑numeric entries using pd.to_numeric(df['col'], errors='coerce'), then fill NaNs with df['col'].fillna(0).
Memory reduction : Examine memory usage via df.memory_usage(deep=True), read only needed columns with usecols, and convert object columns to category dtype to shrink size.
Combining multiple files : Use glob.glob('stocks*.csv') to list files, sort them, and concatenate with
pd.concat([pd.read_csv(f) for f in files], ignore_index=True)for row‑wise merging or axis=1 for column‑wise merging.
Handling missing data : Identify missing values with df.isna().sum(), compute percentages with df.isna().mean(), drop columns entirely with df.dropna(axis=1) or drop those exceeding a threshold using df.dropna(thresh=int(0.9*len(df))).
String splitting : Split a column into multiple columns using df['name'].str.split(' ', expand=True) and assign the result back to the DataFrame.
Expanding Series : Convert a Series of lists into separate columns with df['list_col'].apply(pd.Series) and concatenate the result.
Aggregations : Group by a column and aggregate with df.groupby('order_id')['item_price'].agg(['sum','count']), or use transform('sum') to add the total back to each row.
Pivot tables : Create summary tables with
df.pivot_table(index='Sex', columns='Pclass', values='Survived', aggfunc='mean', margins=True)and generate cross‑tabulations using pd.crosstab(df['Sex'], df['Pclass']).
Categorizing continuous data : Bin ages into categories using
pd.cut(df['Age'], bins=[0,18,25,99], labels=['child','young adult','adult']).
Styling DataFrames : Format columns with
df.style.format({'Date':'{:%m-%d-%Y}', 'Close':'${:,.2f}', 'Volume':'{:,}'}), highlight minima/maxima, and add bar charts with .bar(subset=['Volume']).
Profiling : Quickly explore a new dataset using the pandas_profiling.ProfileReport(df) function, which generates an interactive HTML report.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Python Programming Learning Circle
A global community of Chinese Python developers offering technical articles, columns, original video tutorials, and problem sets. Topics include web full‑stack development, web scraping, data analysis, natural language processing, image processing, machine learning, automated testing, DevOps automation, and big data.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
