Fundamentals 16 min read

Essential Pandas Techniques for Data Analysis in Python

This article presents a comprehensive guide to essential Pandas operations, including creating Series and DataFrames, common methods for data selection, indexing, grouping, reading and writing files, handling missing values, sorting, statistical analysis, and data transformation, with practical code examples for each feature.

Python Programming Learning Circle

Oct 11, 2021

Essential Pandas Techniques for Data Analysis in Python

Using Python for data analysis, Pandas is essential beyond NumPy and Matplotlib.

1. Creating Pandas Two Main Data Structures

No.

Method

Description

pd.Series(object, index=[])

Create a Series. The object can be a list, ndarray, dict, or a row/column from a DataFrame.

pd.DataFrame(data, columns=[], index=[])

Create a DataFrame. Columns and index specify column and row labels in order.

Example: create a DataFrame

df = pd.DataFrame({"id":[1001,1002,1003,1004,1005,1006], "date":pd.date_range('20130102', periods=6), "city":["Beijing ","SH"," guangzhou ","Shenzhen","shanghai","BEIJING "], "age":[23,44,54,32,34,32], "category":["100-A","100-B","110-A","110-C","210-A","130-F"], "price":[1200,np.nan,2133,5433,np.nan,4432]}, columns=["id","date","city","category","age","price"])

2. Common DataFrame Methods

No.

Method

Description

df.head()

View the first five rows.

df.tail()

View the last five rows.

pandas.qcut()

Discretize a variable into equal‑size bins based on quantiles.

pandas.cut()

Discretize based on specified bins.

pandas.date_range()

Generate a date index.

df.apply()

Apply a function along a given axis.

Series.value_counts()

Count occurrences of each value.

df.reset_index()

Reset the index; with drop=True the old index is discarded.

Example: reset index

df_inner.reset_index()

3. Indexing

No.

Method

Description

.values

Convert DataFrame to a 2‑D ndarray.

.append(idx)

Concatenate another Index, producing a new Index.

.insert(loc, e)

Insert an element at the given location.

.delete(loc)

Delete the element at the given location.

.union(idx)

Compute the union of two indexes.

.intersection(idx)

Compute the intersection of two indexes.

.diff(idx)

Compute the difference, returning a new Index.

.reindex(index, columns, fill_value, method, limit, copy)

Reorder or change the index, introducing missing values where needed.

.drop()

Delete specified rows or columns.

.loc[row_label, col_label]

Label‑based access to a specific cell.

.iloc[row_pos, col_pos]

Position‑based access to a specific cell.

Example: extract a single row by label

df_inner.loc[3]

4. Selecting and Recombining Data

No.

Method

Description

df[val]

Select a single column or a list of columns; also works with boolean arrays or slices.

df.loc[val]

Label‑based row selection.

df.loc[:, val]

Select columns by label.

df.iloc[val]

Position‑based row selection.

df.iloc[where_i, where_j]

Position‑based selection of rows and columns.

df.at[row_label, col_label]

Scalar access by label.

df.iat[i, j]

Scalar access by integer position.

reindex

Select rows or columns by label, creating a new object.

get_value

Get a scalar value by label.

set_value

Set a scalar value by label.

Example: select rows by position

df_inner.iloc[:3, :2]  # first three rows, first two columns

5. Sorting

No.

Function

Description

.sort_index(axis=0, ascending=True)

Sort by index values.

Series.sort_values(axis=0, ascending=True)

Sort a Series by its values.

DataFrame.sort_values(by, axis=0, ascending=True)

Sort a DataFrame by one or more columns.

Example: sort by index

df_inner.sort_index()

6. Correlation and Statistical Analysis

No.

Method

Description

.idxmin()

Index of the minimum value (custom index).

.idxmax()

Index of the maximum value (custom index).

.argmin()

Position of the minimum value (integer index).

.argmax()

Position of the maximum value (integer index).

.describe()

Statistical summary of each column.

.sum()

Sum of each column.

.count()

Count of non‑NaN values.

.mean()

Arithmetic mean.

.median()

Median value.

.var()

Variance.

.std()

Standard deviation.

.corr()

Correlation matrix.

.cov()

Covariance matrix.

.corrwith()

Correlation of each column/row with another Series or DataFrame.

.min()

Minimum value.

.max()

Maximum value.

.diff()

First difference (useful for time series).

.mode()

Mode(s) – most frequent value(s).

.quantile()

Quantile calculation (0‑1).

.isin()

Boolean mask indicating membership in a collection.

.unique()

Array of unique values.

.value_counts()

Frequency of each value.

Example: check if the "city" column equals Beijing

df_inner['city'].isin(['beijing'])

7. Grouping

No.

Method

Description

DataFrame.groupby()

Groupby function.

pandas.cut()

Bin data based on numeric intervals to reveal patterns.

Example: groupby usage

group_by_name = salaries.groupby('name')
print(type(group_by_name))

8. Reading and Writing Text Formats

No.

Method

Description

read_csv

Read comma‑separated data from a file, URL, or file‑like object.

read_table

Read tab‑separated data (default separator is a tab).

read_fwf

Read fixed‑width formatted data (no delimiter).

read_clipboard

Read data from the clipboard; useful for converting web tables.

read_excel

Read Excel XLS or XLSX files.

read_hdf

Read HDF5 files written by pandas.

read_html

Read all tables from an HTML document.

read_json

Read JSON strings.

read_msgpack

Read binary‑encoded pandas data.

read_pickle

Read any Python object stored with pickle.

read_sas

Read SAS data sets.

read_sql

Read SQL query results into a DataFrame.

read_stata

Read Stata file formats.

read_feather

Read Feather binary file format.

Example: import CSV or Excel

df = pd.DataFrame(pd.read_csv('name.csv', header=1))
df = pd.DataFrame(pd.read_excel('name.xlsx'))

9. Handling Missing Data

No.

Method

Description

.fillna(value, method, limit, inplace)

Fill missing values.

.dropna()

Drop rows/columns with missing data.

.info()

Show summary information about the DataFrame.

.isnull()

Boolean mask indicating missing values.

Example: view basic information of the data table

df.info()

10. Data Transformation

No.

Method

Description

.replace(old, new)

Replace old values with new ones; can accept lists for multiple replacements.

.duplicated()

Detect duplicate rows, returning a boolean Series.

.drop_duplicates()

Remove duplicate rows and return a new DataFrame.

Example: drop duplicate city values

df['city'].drop_duplicates()

Conclusion

The article lists common Pandas methods; understanding basic concepts such as Series and DataFrames will make data processing and analysis with Pandas much easier.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

statistics data analysis data cleaning dataframe Pandas series

Written by

Python Programming Learning Circle

A global community of Chinese Python developers offering technical articles, columns, original video tutorials, and problem sets. Topics include web full‑stack development, web scraping, data analysis, natural language processing, image processing, machine learning, automated testing, DevOps automation, and big data.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.