Fundamentals 23 min read

Master Python File I/O: From pathlib to buffering and advanced tricks

This tutorial covers Python's comprehensive file I/O capabilities, including pathlib and os.path directory operations, tempfile usage, fnmatch pattern matching, open() modes and buffering, reading and writing techniques, fileinput streams, linecache random access, and practical code examples.

Python Programming Learning Circle

Jan 4, 2020

Master Python File I/O: From pathlib to buffering and advanced tricks

Python provides extensive I/O support, offering modules such as pathlib , os.path , and the built‑in open() function for reading and writing files.

Using pathlib to operate on directories

For detailed usage see the dedicated "Python pathlib module guide".

from pathlib import *
# Get current directory
p = Path('.')
# Iterate all files and sub‑directories
for x in p.iterdir():
    print(x)
# Get parent directory
p = Path('../')
# Get all .py files under parent directories
for x in p.glob('**/*.py'):
    print(x)
# Specific directory example
p = Path('g:/publish/codes')
# Get specific file pattern
for x in p.glob('**/Path_test1.py'):
    print(x)

Using os.path to operate on directories

The os.path module provides functions such as exists(), getctime(), getmtime(), getatime(), and getsize() for directory and file queries.

import os
import time
print(os.path.abspath("abc.txt"))          # D:\Learning\Python Project\abc.txt
print(os.path.commonprefix(['/usr/lib', '/usr/local/lib']))  # /usr/l
print(os.path.commonpath(['/usr/lib', '/usr/local/lib']))    # \usr
print(os.path.dirname('abc/xyz/README.txt'))               # abc/xyz
print(os.path.exists('abc/xyz/README.txt'))                # False
print(time.ctime(os.path.getatime('os.path_test.py')))
print(time.ctime(os.path.getmtime('os.path_test.py')))
print(time.ctime(os.path.getctime('os.path_test.py')))
print(os.path.getsize('os.path_test.py'))                 # 2105
print(os.path.isfile('os.path_test.py'))                  # True
print(os.path.isdir('os.path_test.py'))                   # False
print(os.path.samefile('os.path_test.py', './os.path_test.py'))  # True

Filename matching with fnmatch

The fnmatch module supports Unix‑shell style wildcards: *, ?, [seq], and [!seq]. Key functions include fnmatch, fnmatchcase, filter, and translate.

from pathlib import *
import fnmatch
# Iterate files in current directory
for file in Path('.').iterdir():
    # Match files ending with _test.PY (case‑insensitive)
    if fnmatch.fnmatch(file, '*_test.PY'):
        print(file)

names = ['a.py', 'b.py', 'c.py', 'd.py']
sub = fnmatch.filter(names, '[a-c].py')
print(sub)  # ['a.py', 'b.py', 'c.py']

print(fnmatch.translate('?.py'))   # (?s:.\.py)\Z
print(fnmatch.translate('[ac].py'))  # (?s:[ac]\.py)\Z
print(fnmatch.translate('[a-c].py')) # (?s:[a-c]\.py)\Z

Opening files

The built‑in open(file_name, mode='r', buffering=True) opens a file; the first argument (file path) is mandatory. File objects expose attributes such as closed, mode, and name.

# Open a file with default mode (read)
f = open('main')
print(f.encoding)   # cp936
print(f.mode)       # r
print(f.closed)     # False
print(f.name)       # main

File opening mode table:

Buffering

Because I/O devices are slower than memory, buffering improves performance. In open(), a third argument of 0 disables buffering, 1 enables line buffering, a positive integer sets the buffer size, and a negative value uses the default size.

Reading files

The read() method returns bytes when the file is opened in binary mode ( b) and characters otherwise. It accepts an optional size argument. readline() reads a single line (optionally up to n characters) and readlines() returns a list of all lines.

# Read file character by character
f = open("read_test.py", 'r', True)
while True:
    ch = f.read(1)
    if not ch:
        break
    print(ch, end='')
f.close()

# Read entire file at once
f = open("test.txt", 'r', True)
print(f.read())
f.close()

# Binary read with explicit decode
f = open("read.py", 'rb', True)
print(f.read().decode('utf-8'))
f.close()

# Using codecs for specific encoding
import codecs
f = codecs.open("read.py", 'r', 'utf-8', buffering=True)
while True:
    ch = f.read(1)
    if not ch:
        break
    print(ch, end='')
f.close()

Reading multiple input streams with fileinput

The fileinput module merges several input streams. Functions such as filename(), fileno(), filelineno(), isfirstline(), isstdin(), nextfile(), and close() provide stream information.

import fileinput
for line in fileinput.input(files=('info.txt', 'test.txt')):
    print(fileinput.filename(), fileinput.filelineno(), line, end='')
fileinput.close()

File iterator

File objects are iterable; a for loop can traverse lines directly, and list() converts the iterator to a list.

import codecs
f = codecs.open("for_file.py", 'r', 'utf-8', buffering=True)
for line in f:
    print(line, end='')
f.close()
print(list(codecs.open("for_file.py", 'r', 'utf-8', buffering=True)))

Pipe input

Standard input sys.stdin can be used with pipes. The syntax cmd1 | cmd2 | cmd3 passes the output of one command as the input to the next.

import sys, re
mailPattern = r'([a-z0-9]*[-_]?[a-z0-9]+)*@([a-z0-9]*[-_]?[a-z0-9]+)+' \
    + r'[\.][a-z]{2,3}([\.][a-z]{2})?'
text = sys.stdin.read()
for e in re.finditer(mailPattern, text, re.I):
    print(str(e.span()) + "-->" + e.group())

Random line access with linecache

The linecache module caches lines from files, primarily Python source files, using UTF‑8 encoding by default.

import linecache, random
print(linecache.getline(random.__file__, 3))
print(linecache.getline('linecache_test.py', 3))
print(linecache.getline('utf_text.txt', 2))

Writing files

Modes r+, w, w+, a, a+ allow writing. Opening with w or w+ truncates the file immediately. The file pointer can be moved with seek(offset, whence) and queried with tell().

# Write using text mode
import os
f = open('x.txt', 'w+')
f.write('我爱Python' + os.linesep)
f.writelines(('zhaoyang' + os.linesep,
                'blog' + os.linesep,
                'csdn' + os.linesep,
                'net' + os.linesep))

# Write using binary mode with explicit encoding
f = open('y.txt', 'wb+')
f.write(('我爱Python' + os.linesep).encode('utf-8'))
f.writelines((("zhaoyang，" + os.linesep).encode('utf-8'),
                ('blog' + os.linesep).encode('utf-8'),
                ('csdn' + os.linesep).encode('utf-8'),
                ('net' + os.linesep).encode('utf-8')))

# Append mode preserves existing content
f = open('z.txt', 'a+')
f.write('我爱Python' + os.linesep)
f.writelines(('zhaoyang' + os.linesep,
                'blog' + os.linesep,
                'csdn' + os.linesep,
                'net' + os.linesep))

File pointer operations

Use seek() to reposition the pointer and tell() to obtain its current offset.

f = open('filept_test.py', 'rb')
print(f.tell())   # 0
f.seek(3)
print(f.tell())   # 3
print(f.read(1))  # b'o'
print(f.tell())   # 4
f.seek(5)
print(f.tell())   # 5
f.seek(5, 1)      # move forward 5 bytes
print(f.tell())   # 10
f.seek(-10, 2)    # move to 10 bytes before end
print(f.tell())
print(f.read(1)) # b'd'

- END -

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Python file I/O Buffering file-handling os.path pathlib fnmatch

Written by

Python Programming Learning Circle

A global community of Chinese Python developers offering technical articles, columns, original video tutorials, and problem sets. Topics include web full‑stack development, web scraping, data analysis, natural language processing, image processing, machine learning, automated testing, DevOps automation, and big data.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.