Fundamentals 10 min read

Boost Python Data Loading Speed 10×: 5 Proven Methods

This article demonstrates five practical techniques for loading Excel and CSV data with pandas in Python, including smarter DataFrame construction, parallel processing with Joblib, and using pickle for faster storage, achieving speed improvements of up to three orders of magnitude.

Python Programming Learning Circle
Python Programming Learning Circle
Python Programming Learning Circle
Boost Python Data Loading Speed 10×: 5 Proven Methods

5 Ways to Load Data in Python

As a Python user, I often need to load and store data from Excel or CSV files, but native Excel handling can be slow.

This article presents five methods to load data in Python, achieving up to three orders of magnitude speedup.

1. Load Excel files

Simple approach using pandas.read_excel, appending each file to a DataFrame. Benchmark shows about 53 seconds for ten 20,000‑row files.

import pandas as pd
import numpy as np
from joblib import Parallel, delayed
import time

start = time.time()
df = pd.read_excel("Dummy 0.xlsx")
for file_number in range(1,10):
    df.append(pd.read_excel(f"Dummy {file_number}.xlsx"))
end = time.time()
print("Excel:", end - start)
# >> Excel: 53.4

2. Load CSV files

Using pandas.read_csv dramatically speeds up loading—about 0.63 seconds for the same ten files, roughly ten times faster.

start = time.time()
df = pd.read_csv("Dummy 0.csv")
for file_number in range(1,10):
    df.append(pd.read_csv(f"Dummy {file_number}.csv"))
end = time.time()
print("CSV:", end - start)
# >> CSV: 0.632

CSV files are usually larger than .xlsx, but the speed gain outweighs the size increase.

3. Build smarter DataFrames

Load each file into a separate DataFrame stored in a list, then concatenate once. This reduces overhead for large DataFrames.

start = time.time()
df = []
for file_number in range(10):
    temp = pd.read_csv(f"Dummy {file_number}.csv")
    df.append(temp)
df = pd.concat(df, ignore_index=True)
end = time.time()
print("CSV2:", end - start)
# >> CSV2: 0.619

4. Parallelize CSV loading with Joblib

Joblib can load files in parallel, roughly doubling the speed on an 8‑core M1 Mac.

start = time.time()
def loop(file_number):
    return pd.read_csv(f"Dummy {file_number}.csv")
df = Parallel(n_jobs=-1, verbose=10)(delayed(loop)(i) for i in range(10))
df = pd.concat(df, ignore_index=True)
end = time.time()
print("CSV//:", end - start)
# >> CSV//: 0.386

5. Faster storage with Pickle

Saving DataFrames as pickle files speeds up loading to ~0.07 seconds, an 80 % reduction, though pickles are not human‑readable and may occupy more disk space.

start = time.time()
def loop(file_number):
    return pd.read_pickle(f"Dummy {file_number}.pickle")
df = Parallel(n_jobs=-1, verbose=10)(delayed(loop)(i) for i in range(10))
df = pd.concat(df, ignore_index=True)
end = time.time()
print("Pickle//:", end - start)
# >> Pickle//: 0.072

Use pickles when you need to reload the same data repeatedly or when the data will only be consumed by Python.

Bonus: Parallel Excel loading

Joblib can also parallelize Excel loading, cutting load time from 50 seconds to about 13 seconds.

start = time.time()
def loop(file_number):
    return pd.read_excel(f"Dummy {file_number}.xlsx")
df = Parallel(n_jobs=-1, verbose=10)(delayed(loop)(i) for i in range(10))
df = pd.concat(df, ignore_index=True)
end = time.time()
print("Excel//:", end - start)
# >> 13.45

These techniques let you load large tabular datasets in Python orders of magnitude faster.

pandasPicklejoblibdata loading
Python Programming Learning Circle
Written by

Python Programming Learning Circle

A global community of Chinese Python developers offering technical articles, columns, original video tutorials, and problem sets. Topics include web full‑stack development, web scraping, data analysis, natural language processing, image processing, machine learning, automated testing, DevOps automation, and big data.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.