Fundamentals 6 min read

Reading Various File Formats and Databases in Python

This guide demonstrates how to read and write common file types—including TXT, CSV, JSON, XML, Excel, YAML, Parquet, and HDF5—as well as how to connect to SQLite databases and perform web scraping using Python's standard libraries and popular third‑party packages.

Test Development Learning Exchange

Jan 12, 2025

Reading Various File Formats and Databases in Python

1. Text files (TXT)

Read an entire text file or iterate line by line using the built‑in open function.

with open('file.txt', 'r') as file:
    content = file.read()
    print(content)

with open('file.txt', 'r') as file:
    for line in file:
        print(line.strip())  # remove newline and whitespace

2. CSV files

Use the csv module or pandas to read CSV data.

import csv
with open('data.csv', newline='') as csvfile:
    reader = csv.reader(csvfile)
    for row in reader:
        print(', '.join(row))

import pandas as pd
df = pd.read_csv('data.csv')
print(df.head())

3. JSON files

Load and dump JSON data with the json module.

import json
with open('data.json', 'r') as jsonfile:
    data = json.load(jsonfile)
    print(data)

data = {'key': 'value'}
with open('data.json', 'w') as jsonfile:
    json.dump(data, jsonfile, indent=4)  # pretty‑print

4. XML files

Parse XML using xml.etree.ElementTree or the more powerful lxml library.

import xml.etree.ElementTree as ET
tree = ET.parse('data.xml')
root = tree.getroot()
for child in root:
    print(child.tag, child.attrib)

from lxml import etree
parser = etree.XMLParser(remove_blank_text=True)
tree = etree.parse('data.xml', parser)
root = tree.getroot()
for element in root.iter():
    print(element.tag, element.text)

5. Excel files

Read .xlsx files with openpyxl or pandas.

from openpyxl import load_workbook
wb = load_workbook('data.xlsx')
ws = wb.active
for row in ws.iter_rows(values_only=True):
    print(row)

import pandas as pd
df = pd.read_excel('data.xlsx')
print(df.head())

6. Database connection and queries

Interact with SQLite using the built‑in sqlite3 module or SQLAlchemy for ORM style access.

import sqlite3
conn = sqlite3.connect('example.db')
cursor = conn.cursor()
cursor.execute("SELECT * FROM table_name")
rows = cursor.fetchall()
for row in rows:
    print(row)
conn.close()

from sqlalchemy import create_engine, Table, MetaData
engine = create_engine('sqlite:///example.db')
metadata = MetaData(bind=engine)
table = Table('table_name', metadata, autoload_with=engine)
with engine.connect() as connection:
    result = connection.execute(table.select())
    for row in result:
        print(row)

7. HTML and web scraping

Fetch a web page with requests and parse it using BeautifulSoup.

import requests
from bs4 import BeautifulSoup
url = 'http://example.com'
response = requests.get(url)
soup = BeautifulSoup(response.text, 'html.parser')
# get all H1 titles
titles = soup.find_all('h1')
for title in titles:
    print(title.get_text())

8. YAML files

Read YAML safely with PyYAML.

import yaml
with open('data.yaml', 'r') as stream:
    try:
        data = yaml.safe_load(stream)
        print(data)
    except yaml.YAMLError as exc:
        print(exc)

9. Parquet files

Load Parquet data using pandas.

import pandas as pd
df = pd.read_parquet('data.parquet')
print(df.head())

10. HDF5 files

Access HDF5 datasets with pandas or h5py.

import pandas as pd
store = pd.HDFStore('data.h5')
df = store['dataset']
print(df.head())

import h5py
with h5py.File('data.h5', 'r') as f:
    dset = f['dataset']
    print(dset[...])

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

XML SQLite CSV Pandas file-io web-scraping

Written by

Test Development Learning Exchange

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.