Getting Started with petl: Installation, Basic Operations, and Practical Examples
This article introduces the Python petl library for easy ETL tasks, explains how to install it via pip, and demonstrates core operations such as loading CSV data, viewing, filtering, sorting, converting, aggregating, joining, deduplicating, and performing basic statistical analysis with clear code examples.
petl Overview – petl (Python ETL) is a library designed to simplify data extraction, transformation, and loading, offering spreadsheet‑like functions for handling CSV, Excel, databases, and other sources.
Installation
Install petl using pip: pip install petl Basic Usage Examples
1. Load data from a CSV file :
from petl import fromcsv
table = fromcsv('example.csv')2. View the first few rows :
from petl import head
head(table, 5) # display first 5 rows3. Filter rows where a column meets a condition :
from petl import selectwhere
filtered_table = selectwhere(table, 'age', '>', 30) # rows with age > 304. Sort data :
from petl import sort
sorted_table = sort(table, 'age') # ascending
sorted_table_desc = sort(table, 'age', reverse=True) # descending5. Convert or map column values :
from petl import convert
converted_table = convert(table, 'age', lambda v: v + 1) # increment age by 16. Output data to a new CSV file :
from petl import tocsv
tocsv(converted_table, 'output.csv')Group Aggregation
Aggregate sales data by product and sum the amount column:
from petl import aggregate
grouped_data = aggregate(sales_data, keys=['product'], aggregates={'amount': 'sum'})Joining Tables
Join two tables (e.g., orders and products) on a common column:
from petl import join
joined_table = join(orders, products, key='product_id')Deduplication
Remove duplicate rows from a table:
from petl import distinct
unique_data = distinct(data_with_duplicates)Statistical Analysis
Compute basic statistics such as mean, max, and min on a numeric column:
from petl import stats
mean_value = stats.mean(numbers, 'value')
max_value = stats.max(numbers, 'value')
min_value = stats.min(numbers, 'value')Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
