Fundamentals 4 min read

Getting Started with petl: Installation, Basic Operations, and Practical Examples

This article introduces the Python petl library for easy ETL tasks, explains how to install it via pip, and demonstrates core operations such as loading CSV data, viewing, filtering, sorting, converting, aggregating, joining, deduplicating, and performing basic statistical analysis with clear code examples.

Test Development Learning Exchange
Test Development Learning Exchange
Test Development Learning Exchange
Getting Started with petl: Installation, Basic Operations, and Practical Examples

petl Overview – petl (Python ETL) is a library designed to simplify data extraction, transformation, and loading, offering spreadsheet‑like functions for handling CSV, Excel, databases, and other sources.

Installation

Install petl using pip: pip install petl Basic Usage Examples

1. Load data from a CSV file :

from petl import fromcsv
table = fromcsv('example.csv')

2. View the first few rows :

from petl import head
head(table, 5)  # display first 5 rows

3. Filter rows where a column meets a condition :

from petl import selectwhere
filtered_table = selectwhere(table, 'age', '>', 30)  # rows with age > 30

4. Sort data :

from petl import sort
sorted_table = sort(table, 'age')  # ascending
sorted_table_desc = sort(table, 'age', reverse=True)  # descending

5. Convert or map column values :

from petl import convert
converted_table = convert(table, 'age', lambda v: v + 1)  # increment age by 1

6. Output data to a new CSV file :

from petl import tocsv
tocsv(converted_table, 'output.csv')

Group Aggregation

Aggregate sales data by product and sum the amount column:

from petl import aggregate
grouped_data = aggregate(sales_data, keys=['product'], aggregates={'amount': 'sum'})

Joining Tables

Join two tables (e.g., orders and products) on a common column:

from petl import join
joined_table = join(orders, products, key='product_id')

Deduplication

Remove duplicate rows from a table:

from petl import distinct
unique_data = distinct(data_with_duplicates)

Statistical Analysis

Compute basic statistics such as mean, max, and min on a numeric column:

from petl import stats
mean_value = stats.mean(numbers, 'value')
max_value = stats.max(numbers, 'value')
min_value = stats.min(numbers, 'value')
Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

PythonETLTutorialpetl
Test Development Learning Exchange
Written by

Test Development Learning Exchange

Test Development Learning Exchange

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.