Fundamentals 7 min read

Python Scripts for File Management, Data Processing, Automation, and More

This article provides a collection of practical Python code snippets covering file and directory management, data processing, network requests, automation tasks, document handling, image manipulation, system monitoring, visualization, cleaning, logging, and web scraping, all presented with clear explanations and ready-to-use examples.

Test Development Learning Exchange

Jan 11, 2025

Python Scripts for File Management, Data Processing, Automation, and More

File and Directory Management

Batch rename files in a directory based on a rule:

import os
for filename in os.listdir('.'):    
    if filename.endswith(".txt"):
        os.rename(filename, "new_" + filename)

Find large files exceeding a size threshold:

import os
for root, dirs, files in os.walk('.'):    
    for file in files:
        path = os.path.join(root, file)
        if os.path.getsize(path) > 1e6:  # larger than 1 MB
            print(path)

Copy only the folder structure without file contents:

import shutil
shutil.copytree('source_folder', 'destination_folder', copy_function=lambda src, dst: None)

Data Processing and Analysis

Merge multiple CSV files into one:

import pandas as pd
df_list = [pd.read_csv(f) for f in ['file1.csv', 'file2.csv']]
combined_df = pd.concat(df_list, ignore_index=True)
combined_df.to_csv('combined.csv', index=False)

Convert an Excel file to CSV:

import pandas as pd
excel_data = pd.read_excel('data.xlsx')
excel_data.to_csv('data.csv', index=False)

Remove duplicate lines from a text file:

with open('input.txt') as f_in, open('output.txt', 'w') as f_out:
    seen = set()
    for line in f_in:
        if line not in seen:
            f_out.write(line)
            seen.add(line)

Read and write JSON data:

import json
data = {'key': 'value'}
with open('data.json', 'w') as f:
    json.dump(data, f)
with open('data.json') as f:
    loaded_data = json.load(f)

Network Requests and API Interaction

Send a GET request and print the JSON response:

import requests
response = requests.get('https://api.example.com/data')
print(response.json())

Download a file from the internet and save it locally:

import requests
url = 'https://example.com/file.zip'
r = requests.get(url)
with open('file.zip', 'wb') as f:
    f.write(r.content)

Automation Tasks

Schedule a recurring job using APScheduler:

from apscheduler.schedulers.blocking import BlockingScheduler
sched = BlockingScheduler()
def job():
    print("Task executed!")
sched.add_job(job, 'interval', minutes=1)
sched.start()

Send an email automatically via SMTP:

import smtplib
from email.mime.text import MIMEText
msg = MIMEText('This is the body of the email.')
msg['Subject'] = 'Subject line'
msg['From'] = '[email protected]'
msg['To'] = '[email protected]'
with smtplib.SMTP('smtp.example.com') as server:
    server.login('user', 'pass')
    server.sendmail('[email protected]', ['[email protected]'], msg.as_string())

Document Processing

Create a simple Word document:

from docx import Document
doc = Document()
doc.add_paragraph('Hello World!')
doc.save('hello.docx')

Merge multiple PDF files into one:

from PyPDF2 import PdfFileMerger
merger = PdfFileMerger()
for pdf in ['file1.pdf', 'file2.pdf']:
    merger.append(pdf)
merger.write("merged.pdf")

Image Processing

Resize an image using Pillow:

from PIL import Image
img = Image.open('image.jpg')
resized_img = img.resize((800, 600))
resized_img.save('resized_image.jpg')

Text Analysis

Count word frequencies in a text file:

from collections import Counter
with open('text.txt') as f:
    words = f.read().split()
    word_counts = Counter(words)
    print(word_counts.most_common(10))

System Information

Get current CPU usage percentage:

import psutil
print(psutil.cpu_percent(interval=1))

Data Visualization

Plot a simple line chart with Matplotlib:

import matplotlib.pyplot as plt
x = [1, 2, 3, 4]
y = [1, 4, 9, 16]
plt.plot(x, y)
plt.show()

Data Cleaning

Remove empty rows from a CSV file:

import pandas as pd
df = pd.read_csv('dirty_data.csv')
df.dropna(inplace=True)
df.to_csv('clean_data.csv', index=False)

Logging

Record log messages to a file using the logging module:

import logging
logging.basicConfig(filename='app.log', level=logging.INFO)
logging.info('This is an info message.')

Web Scraping

Fetch a webpage and extract all H1 titles with BeautifulSoup:

import requests
from bs4 import BeautifulSoup
url = 'http://example.com'
response = requests.get(url)
soup = BeautifulSoup(response.text, 'html.parser')
titles = soup.find_all('h1')
for title in titles:
    print(title.text.strip())

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

file-management data-processing web-scraping

Written by

Test Development Learning Exchange

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.