Fundamentals 7 min read

Essential Pandas Functions for Data Analysis in Python

This article introduces Python's pandas library as a powerful open‑source alternative to MATLAB for data modeling competitions, covering basic, intermediate, and advanced functions—including data I/O, inspection, logical filtering, visualization, aggregation, and integration with tqdm for progress tracking—complete with code examples.

Python Programming Learning Circle

Aug 6, 2021

Essential Pandas Functions for Data Analysis in Python

In modeling competitions, MATLAB is often used for scientific computing, but as a commercial product it has limitations; the open‑source Python ecosystem, especially pandas, provides a free and versatile alternative for matrix operations, data processing, scientific calculations, visualization, and machine learning.

Basic Functions

1. Read data : data = pd.read_csv('newfile.csv') (also read_excel, read_clipboard, read_sql).

2. Write data : data.to_csv('2_newfile.csv', index=None) (similarly to_excel, to_json, to_pickle).

3. Inspect data : data.shape gives rows and columns; data.describe() provides basic statistics.

4. View data : data.head(3) shows the first three rows; data.tail() shows the last row; data.loc[8] accesses the eighth row; data.loc[8, 'column_1'] accesses a specific cell; data.loc[range(4,6)] selects rows 4‑5.

Intermediate Functions

1. Count occurrences : data['column_1'].value_counts().

2. Apply functions to a column : data['column_1'].map(len) and chain operations, e.g., data['column_1'].map(len).map(lambda x: x/100).plot(); data.apply(sum) applies a function to a column; data.applymap() applies a function to every cell.

3. Progress monitoring with tqdm :

from tqdm import tqdm_notebook

tqdm_notebook().pandas()

Then replace map / apply / applymap with progress_map, e.g., data['column_1'].progress_map(lambda x: x.count('e')).

4. Correlation and scatter matrix : data.corr() gives the correlation matrix; data.corr().applymap(lambda x: int(x*100)/100) rounds it; pd.plotting.scatter_matrix(data, figsize=(12,8)) visualizes pairwise relationships.

Advanced Operations

1. SQL‑style merge :

data.merge(other_data, on=['column_1','column_2','column_3'])

2. Group by :

data.groupby('column_1')['column_2'].apply(sum).reset_index()

aggregates and reshapes the result.

3. Row iteration :

dictionary = {}

for i, row in data.iterrows():

dictionary[row['column_1']] = row['column_2']

The iterrows() method provides both the index and the row data.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Python data analysis visualization Pandas tqdm

Written by

Python Programming Learning Circle

A global community of Chinese Python developers offering technical articles, columns, original video tutorials, and problem sets. Topics include web full‑stack development, web scraping, data analysis, natural language processing, image processing, machine learning, automated testing, DevOps automation, and big data.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.