Common Python Mistakes in Data Science Projects and How to Avoid Them
This article outlines frequent Python pitfalls in data‑science workflows—such as neglecting virtual environments, overusing notebooks, hard‑coding absolute paths, ignoring warnings, avoiding list comprehensions, skipping type hints, writing unreadable pandas chains, not following PEP guidelines, and not using coding assistance tools—and provides practical solutions to improve code quality and productivity.
Applying software engineering best practices can improve the quality of data‑science projects, reducing errors, increasing reliability, and boosting coding efficiency.
1. Not using virtual environments
Isolating each project’s dependencies prevents package conflicts and eases deployment; tools such as Anaconda, Pipenv, or Docker can be used.
2. Overusing Jupyter Notebooks
Notebooks are great for learning and quick analysis but lack IDE features; for long‑term, collaborative, deployable work, use IDEs like VS Code, PyCharm, or Spyder.
3. Using absolute paths instead of relative paths
Absolute paths hinder portability; set the project root as the working directory and use os.path.join with relative paths, as shown below.
import pandas as pd
import numpy as np
import os
# Wrong way
excel_path1 = "C:\\Users\\abdelilah\\Desktop\\mysheet1.xlsx"
excel_path2 = "C:\\Users\\abdelilah\\Desktop\\mysheet2.xlsx"
mydf1 = pd.read_excel(excel_path1)
mydf2 = pd.read_excel(excel_path2)
# Correct way
DATA_DIR = "data"
crime06_filename = "CrimeOneYearofData_2006.xlsx"
crime07_filename = "CrimeOneYearofData_2007.xlsx"
crime06_df = pd.read_excel(os.path.join(DATA_DIR, crime06_filename))
crime07_df = pd.read_excel(os.path.join(DATA_DIR, crime07_filename))4. Ignoring warnings
Warnings such as Pandas’ SettingWithCopyWarning or DeprecationWarning indicate potential issues; understand their causes and decide which can be safely ignored.
5. Not using (or rarely using) list comprehensions
List comprehensions make loops more readable and faster; an example of reading CSV files with a comprehension is provided.
import pandas as pd
import os
DATA_PATH = "data"
filename_list = os.listdir(DATA_PATH)
# Bad way
csv_list = []
for filename in filename_list:
csv_list.append(pd.read_csv(os.path.join(DATA_PATH, filename)))
# Recommended way
csv_list = [pd.read_csv(os.path.join(DATA_PATH, filename))
for filename in filename_list
if filename.endswith(".csv")]6. Not using type hints
Type annotations improve IDE assistance and code readability; they were introduced in Python 3.5 and are ignored at runtime.
def mystery_combine(a, b, times):
return (a + b) * times
def mystery_combine(a: str, b: str, times: int) -> str:
return (a + b) * times7. Unreadable pandas method chains
Break long method chains into separate lines within parentheses for better readability.
var_list = ["clicks", "time_spent"]
var_list_Q = [varname + "_Q" for varname in var_list]
# Unreadable
df_Q = df.groupby("id").rolling(window=3, min_periods=1, on="yearmonth")[var_list].mean().reset_index().rename(columns=dict(zip(var_list, var_list_Q)))
# Readable
df_Q = (
df
.groupby("id")
.rolling(window=3, min_periods=1, on="yearmonth")[var_list]
.mean()
.reset_index()
.rename(columns=dict(zip(var_list, var_list_Q)))
)8. Not following PEP guidelines
PEP 8 provides a comprehensive style guide; adhering to it improves code consistency.
9. Not using coding assistance tools
Tools such as Pylance, Kite, Tabnine, or GitHub Copilot can boost productivity through autocomplete and suggestions.
Python Programming Learning Circle
A global community of Chinese Python developers offering technical articles, columns, original video tutorials, and problem sets. Topics include web full‑stack development, web scraping, data analysis, natural language processing, image processing, machine learning, automated testing, DevOps automation, and big data.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.