How to Crack Image CAPTCHAs with Python: From Noise Reduction to OCR
This guide walks through the complete process of recognizing image CAPTCHAs using Python, covering graphics fundamentals, noise reduction, grayscale conversion, binarization, image segmentation, and OCR with PIL and pytesser, complete with installation steps and code examples.
Recognizing image CAPTCHAs is a fundamental skill for web crawling and involves computer graphics, machine learning, computer vision, and artificial intelligence.
The core of computer graphics is representing and processing geometric elements (points, lines, surfaces) and non‑geometric attributes (color, shading). CAPTCHA cracking mainly uses 2‑D elements and color‑difference analysis.
Common tools include:
Support Vector Machine (SVM)
OpenCV
Image editing software (Photoshop, GIMP)
Python Imaging Library (PIL)
PIL installation
On Debian/Ubuntu:
<code>sudo apt-get install python-imaging</code>
On other Linux distributions you can use easy_install PIL or pip install Pillow after installing the build environment. For Windows download the installer from the official PIL site (32‑bit) or the 64‑bit Pillow wheels from Gohlke .
General CAPTCHA recognition workflow
Image denoising
Image segmentation (cutting)
Text extraction
1. Denoising
Convert the color image to grayscale (using the intensity component of the HSI color space):
<code>from PIL import Image im = Image.open('7039.jpg') imgry = im.convert('L') imgry.show()</code>
2. Binarization
Apply a fixed threshold (e.g., 140) to create a binary image:
<code>threshold = 140 table = [] for i in range(256): if i < threshold: table.append(0) else: table.append(1) out = imgry.point(table, '1') out.show()</code>
3. Segmentation
Segmentation is the most challenging step, especially for tightly connected characters; simple CAPTCHAs can be processed without complex cutting. For advanced methods see related blogs.
4. OCR with pytesser
pytesser is a Python wrapper for Google’s Tesseract OCR engine.
Installation steps:
Download and install PIL (or Pillow) if not present.
Download pytesser, extract it into your project or Python’s site‑packages directory, and add it to PYTHONPATH.
Download the Tesseract OCR engine, replace the tessdata folder inside pytesser with the one from the engine.
Modify pytesser.py to import Image from PIL and add an empty __init__.py in the pytesser package.
Example usage:
<code>from PIL import Image from pytesser import pytesser image = Image.open('7039.jpg') print(pytesser.image_file_to_string('7039.jpg')) print(pytesser.image_to_string(image))</code>
pytesser also supports other languages, including Chinese.
Source: j_hao104 – my.oschina.net/jhao104/blog/647326
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
MaGe Linux Operations
Founded in 2009, MaGe Education is a top Chinese high‑end IT training brand. Its graduates earn 12K+ RMB salaries, and the school has trained tens of thousands of students. It offers high‑pay courses in Linux cloud operations, Python full‑stack, automation, data analysis, AI, and Go high‑concurrency architecture. Thanks to quality courses and a solid reputation, it has talent partnerships with numerous internet firms.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
