Using pytesseract for Image‑to‑Text Conversion with Python
This tutorial introduces OCR basics, explains the Tesseract engine, and demonstrates how to install and use the Python pytesseract library to convert images into editable text with just a few lines of code, including practical tips for handling file paths and language settings.
Do you encounter the following situation? You are responsible for organizing some documents, but the files lack Word archives, so you need to convert images into Word documents.
If you have faced these problems, don’t miss the handy tool pytesseract ! It can simply convert images to text by recognizing the characters within the picture.
OCR stands for Optical Character Recognition, which in plain language means translating an image into text.
Tesseract is an OCR engine currently sponsored by Google. It has a 30‑year history: originally a proprietary software from HP Labs, open‑sourced in 2005, and since 2006 has been further developed and maintained by Google, making it the most accurate open‑source OCR system available.
Beyond its high precision, Tesseract is highly flexible; it can be trained to recognize any font style (as long as the style remains consistent) and any Unicode character. The pytesseract module we will use is simply the Python wrapper for Tesseract.
Now let’s try it out.
Step 1: Install the modules
pip3 install pillow pip3 install pytesseract
Step 2: Write the program
from PIL import Image import pytesseract img = Image.open('test1.png') text = pytesseract.image_to_string(img, lang='eng') print(text)
The first two lines import the modules we just installed. PIL (provided by the pillow package) includes the Image class that can read image files. Place the image you want to convert in the same directory as the script; the filename test1.png is a relative path, so the file must reside alongside the .py file.
All the complex recognition and conversion processes are encapsulated inside pytesseract ; you only need to know how to call it. The image_to_string function also accepts a lang keyword argument, which defaults to English but can be changed to any supported language.
With just five lines of code you can convert an image to text—very convenient.
Besides solving everyday small problems, pytesseract is useful in web‑scraping scenarios where captchas appear, allowing you to handle them easily.
- END -
Python Programming Learning Circle
A global community of Chinese Python developers offering technical articles, columns, original video tutorials, and problem sets. Topics include web full‑stack development, web scraping, data analysis, natural language processing, image processing, machine learning, automated testing, DevOps automation, and big data.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.