Essential Pandas Techniques for Data Analysis in Python
This article presents a comprehensive guide to essential Pandas operations, including creating Series and DataFrames, common methods for data selection, indexing, grouping, reading and writing files, handling missing values, sorting, statistical analysis, and data transformation, with practical code examples for each feature.
Using Python for data analysis, Pandas is essential beyond NumPy and Matplotlib.
1. Creating Pandas Two Main Data Structures
No.
Method
Description
1
pd.Series(object, index=[])
Create a Series. The object can be a list, ndarray, dict, or a row/column from a DataFrame.
2
pd.DataFrame(data, columns=[], index=[])
Create a DataFrame. Columns and index specify column and row labels in order.
Example: create a DataFrame
<code>df = pd.DataFrame({"id":[1001,1002,1003,1004,1005,1006], "date":pd.date_range('20130102', periods=6), "city":["Beijing ","SH"," guangzhou ","Shenzhen","shanghai","BEIJING "], "age":[23,44,54,32,34,32], "category":["100-A","100-B","110-A","110-C","210-A","130-F"], "price":[1200,np.nan,2133,5433,np.nan,4432]}, columns=["id","date","city","category","age","price"])</code>2. Common DataFrame Methods
No.
Method
Description
1
df.head()
View the first five rows.
2
df.tail()
View the last five rows.
3
pandas.qcut()
Discretize a variable into equal‑size bins based on quantiles.
4
pandas.cut()
Discretize based on specified bins.
5
pandas.date_range()
Generate a date index.
6
df.apply()
Apply a function along a given axis.
7
Series.value_counts()
Count occurrences of each value.
8
df.reset_index()
Reset the index; with drop=True the old index is discarded.
Example: reset index
<code>df_inner.reset_index()</code>3. Indexing
No.
Method
Description
1
.values
Convert DataFrame to a 2‑D ndarray.
2
.append(idx)
Concatenate another Index, producing a new Index.
3
.insert(loc, e)
Insert an element at the given location.
4
.delete(loc)
Delete the element at the given location.
5
.union(idx)
Compute the union of two indexes.
6
.intersection(idx)
Compute the intersection of two indexes.
7
.diff(idx)
Compute the difference, returning a new Index.
8
.reindex(index, columns, fill_value, method, limit, copy)
Reorder or change the index, introducing missing values where needed.
9
.drop()
Delete specified rows or columns.
10
.loc[row_label, col_label]
Label‑based access to a specific cell.
11
.iloc[row_pos, col_pos]
Position‑based access to a specific cell.
Example: extract a single row by label
<code>df_inner.loc[3]</code>4. Selecting and Recombining Data
No.
Method
Description
1
df[val]
Select a single column or a list of columns; also works with boolean arrays or slices.
2
df.loc[val]
Label‑based row selection.
3
df.loc[:, val]
Select columns by label.
4
df.iloc[val]
Position‑based row selection.
5
df.iloc[where_i, where_j]
Position‑based selection of rows and columns.
6
df.at[row_label, col_label]
Scalar access by label.
7
df.iat[i, j]
Scalar access by integer position.
8
reindex
Select rows or columns by label, creating a new object.
9
get_value
Get a scalar value by label.
10
set_value
Set a scalar value by label.
Example: select rows by position
<code>df_inner.iloc[:3, :2] # first three rows, first two columns</code>5. Sorting
No.
Function
Description
1
.sort_index(axis=0, ascending=True)
Sort by index values.
2
Series.sort_values(axis=0, ascending=True)
Sort a Series by its values.
3
DataFrame.sort_values(by, axis=0, ascending=True)
Sort a DataFrame by one or more columns.
Example: sort by index
<code>df_inner.sort_index()</code>6. Correlation and Statistical Analysis
No.
Method
Description
1
.idxmin()
Index of the minimum value (custom index).
2
.idxmax()
Index of the maximum value (custom index).
3
.argmin()
Position of the minimum value (integer index).
4
.argmax()
Position of the maximum value (integer index).
5
.describe()
Statistical summary of each column.
6
.sum()
Sum of each column.
7
.count()
Count of non‑NaN values.
8
.mean()
Arithmetic mean.
9
.median()
Median value.
10
.var()
Variance.
11
.std()
Standard deviation.
12
.corr()
Correlation matrix.
13
.cov()
Covariance matrix.
14
.corrwith()
Correlation of each column/row with another Series or DataFrame.
15
.min()
Minimum value.
16
.max()
Maximum value.
17
.diff()
First difference (useful for time series).
18
.mode()
Mode(s) – most frequent value(s).
19
.quantile()
Quantile calculation (0‑1).
20
.isin()
Boolean mask indicating membership in a collection.
21
.unique()
Array of unique values.
22
.value_counts()
Frequency of each value.
Example: check if the "city" column equals Beijing
<code>df_inner['city'].isin(['beijing'])</code>7. Grouping
No.
Method
Description
1
DataFrame.groupby()
Groupby function.
2
pandas.cut()
Bin data based on numeric intervals to reveal patterns.
Example: groupby usage
<code>group_by_name = salaries.groupby('name')
print(type(group_by_name))</code>8. Reading and Writing Text Formats
No.
Method
Description
1
read_csv
Read comma‑separated data from a file, URL, or file‑like object.
2
read_table
Read tab‑separated data (default separator is a tab).
3
read_fwf
Read fixed‑width formatted data (no delimiter).
4
read_clipboard
Read data from the clipboard; useful for converting web tables.
5
read_excel
Read Excel XLS or XLSX files.
6
read_hdf
Read HDF5 files written by pandas.
7
read_html
Read all tables from an HTML document.
8
read_json
Read JSON strings.
9
read_msgpack
Read binary‑encoded pandas data.
10
read_pickle
Read any Python object stored with pickle.
11
read_sas
Read SAS data sets.
12
read_sql
Read SQL query results into a DataFrame.
13
read_stata
Read Stata file formats.
14
read_feather
Read Feather binary file format.
Example: import CSV or Excel
<code>df = pd.DataFrame(pd.read_csv('name.csv', header=1))
df = pd.DataFrame(pd.read_excel('name.xlsx'))</code>9. Handling Missing Data
No.
Method
Description
1
.fillna(value, method, limit, inplace)
Fill missing values.
2
.dropna()
Drop rows/columns with missing data.
3
.info()
Show summary information about the DataFrame.
4
.isnull()
Boolean mask indicating missing values.
Example: view basic information of the data table
<code>df.info()</code>10. Data Transformation
No.
Method
Description
1
.replace(old, new)
Replace old values with new ones; can accept lists for multiple replacements.
2
.duplicated()
Detect duplicate rows, returning a boolean Series.
3
.drop_duplicates()
Remove duplicate rows and return a new DataFrame.
Example: drop duplicate city values
<code>df['city'].drop_duplicates()</code>Conclusion
The article lists common Pandas methods; understanding basic concepts such as Series and DataFrames will make data processing and analysis with Pandas much easier.
Python Programming Learning Circle
A global community of Chinese Python developers offering technical articles, columns, original video tutorials, and problem sets. Topics include web full‑stack development, web scraping, data analysis, natural language processing, image processing, machine learning, automated testing, DevOps automation, and big data.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.