Master Pandas: Essential Data Manipulation Techniques for Python Beginners
This guide introduces pandas, the essential Python library for data science, covering installation, data import/export, basic DataFrame operations, logical filtering, visualization with matplotlib, performance tips using tqdm, and advanced techniques like merging, grouping, and iterating, helping beginners become efficient data analysts.
Why pandas?
Python is open‑source and powerful, but the abundance of packages can overwhelm newcomers. pandas stands out as the indispensable data‑science library because it bundles many functionalities into a single, easy‑to‑use package.
Getting started
import pandas as pdBy convention pandas is imported as pd, which you will use for all subsequent calls.
Reading data
data = pd.read_csv('my_file.csv')</code><code>data = pd.read_csv('my_file.csv', sep=';', encoding='latin-1', nrows=1000, skiprows=[2,5]) sepspecifies the delimiter (e.g., ; for French CSV files). encoding='latin-1' handles French characters. nrows limits rows read, and skiprows omits specific lines.
Common readers: read_csv, read_excel Other useful readers: read_clipboard,
read_sqlWriting data
data.to_csv('my_new_file.csv', index=None)Setting index=None prevents pandas from writing an extra index column.
Inspecting data
data.shape # (rows, columns) data.describe() data.head(3) data.tail() data.loc[8] data.loc[8, 'column_1'] data.loc[range(4,6)]Logical filtering
data[data['column_1'] == 'french']</code><code>data[(data['column_1'] == 'french') & (data['year_born'] == 1990)]</code><code>data[(data['column_1'] == 'french') & (data['year_born'] == 1990) & ~(data['city'] == 'London')]Use & (AND), | (OR), and ~ (NOT) with parentheses.
data[data['column_1'].isin(['french', 'english'])]Basic plotting
matplotlib enables plotting directly from pandas.
data['numeric_column'].plot()data['numeric_column'].hist()%matplotlib inlineInclude the magic command when using Jupyter notebooks.
Updating data
data.loc[8, 'column_1'] = 'english'</code><code>data.loc[data['column_1'] == 'french', 'column_1'] = 'French'Counting values
data['column_1'].value_counts()Applying functions
data['column_1'].map(len) data['column_1'].map(len).map(lambda x: x/100).plot() data.apply(sum) data.applymap(lambda x: int(x*100)/100)Progress bars with tqdm
from tqdm import tqdm_notebook</code><code>tqdm_notebook().pandas()</code><code>data['column_1'].progress_map(lambda x: x.count('e'))Correlation and scatter matrix
data.corr()</code><pre><code>data.corr().applymap(lambda x: int(x*100)/100)pd.plotting.scatter_matrix(data, figsize=(12,8))Advanced operations
data.merge(other_data, on=['column_1','column_2','column_3']) data.groupby('column_1')['column_2'].apply(sum).reset_index()dictionary = {}
for i, row in data.iterrows():
dictionary[row['column_1']] = row['column_2']Key takeaways
Easy to use: abstracts complex calculations.
Intuitive: works like Excel with DataFrames.
Fast: provides high‑performance data handling.
Pandas empowers data scientists to read, transform, visualize, and analyze data efficiently, making it a cornerstone of modern Python data workflows.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Python Programming Learning Circle
A global community of Chinese Python developers offering technical articles, columns, original video tutorials, and problem sets. Topics include web full‑stack development, web scraping, data analysis, natural language processing, image processing, machine learning, automated testing, DevOps automation, and big data.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
