Reading Various File Formats and Databases in Python
This guide demonstrates how to read and write common file types—including TXT, CSV, JSON, XML, Excel, YAML, Parquet, and HDF5—as well as how to connect to SQLite databases and perform web scraping using Python's standard libraries and popular third‑party packages.
1. Text files (TXT)
Read an entire text file or iterate line by line using the built‑in open function.
with open('file.txt', 'r') as file:
content = file.read()
print(content)
with open('file.txt', 'r') as file:
for line in file:
print(line.strip()) # remove newline and whitespace2. CSV files
Use the csv module or pandas to read CSV data.
import csv
with open('data.csv', newline='') as csvfile:
reader = csv.reader(csvfile)
for row in reader:
print(', '.join(row))
import pandas as pd
df = pd.read_csv('data.csv')
print(df.head())3. JSON files
Load and dump JSON data with the json module.
import json
with open('data.json', 'r') as jsonfile:
data = json.load(jsonfile)
print(data)
data = {'key': 'value'}
with open('data.json', 'w') as jsonfile:
json.dump(data, jsonfile, indent=4) # pretty‑print4. XML files
Parse XML using xml.etree.ElementTree or the more powerful lxml library.
import xml.etree.ElementTree as ET
tree = ET.parse('data.xml')
root = tree.getroot()
for child in root:
print(child.tag, child.attrib)
from lxml import etree
parser = etree.XMLParser(remove_blank_text=True)
tree = etree.parse('data.xml', parser)
root = tree.getroot()
for element in root.iter():
print(element.tag, element.text)5. Excel files
Read .xlsx files with openpyxl or pandas .
from openpyxl import load_workbook
wb = load_workbook('data.xlsx')
ws = wb.active
for row in ws.iter_rows(values_only=True):
print(row)
import pandas as pd
df = pd.read_excel('data.xlsx')
print(df.head())6. Database connection and queries
Interact with SQLite using the built‑in sqlite3 module or SQLAlchemy for ORM style access.
import sqlite3
conn = sqlite3.connect('example.db')
cursor = conn.cursor()
cursor.execute("SELECT * FROM table_name")
rows = cursor.fetchall()
for row in rows:
print(row)
conn.close()
from sqlalchemy import create_engine, Table, MetaData
engine = create_engine('sqlite:///example.db')
metadata = MetaData(bind=engine)
table = Table('table_name', metadata, autoload_with=engine)
with engine.connect() as connection:
result = connection.execute(table.select())
for row in result:
print(row)7. HTML and web scraping
Fetch a web page with requests and parse it using BeautifulSoup .
import requests
from bs4 import BeautifulSoup
url = 'http://example.com'
response = requests.get(url)
soup = BeautifulSoup(response.text, 'html.parser')
# get all H1 titles
titles = soup.find_all('h1')
for title in titles:
print(title.get_text())8. YAML files
Read YAML safely with PyYAML .
import yaml
with open('data.yaml', 'r') as stream:
try:
data = yaml.safe_load(stream)
print(data)
except yaml.YAMLError as exc:
print(exc)9. Parquet files
Load Parquet data using pandas .
import pandas as pd
df = pd.read_parquet('data.parquet')
print(df.head())10. HDF5 files
Access HDF5 datasets with pandas or h5py .
import pandas as pd
store = pd.HDFStore('data.h5')
df = store['dataset']
print(df.head())
import h5py
with h5py.File('data.h5', 'r') as f:
dset = f['dataset']
print(dset[...])Test Development Learning Exchange
Test Development Learning Exchange
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.