Artificial Intelligence 4 min read

Solving Image Captchas in Selenium Automation with Python and OCR

This tutorial demonstrates how to use Python's urllib to download captcha images, apply pytesseract OCR for text extraction, and integrate the result into Selenium scripts to automate the entry of image captchas during web testing.

Test Development Learning Exchange
Test Development Learning Exchange
Test Development Learning Exchange
Solving Image Captchas in Selenium Automation with Python and OCR

Selenium automation testing is convenient, but handling image captchas that require human interaction can be problematic. This article shows how to overcome this limitation by using Python scripts to download, recognize, and input captcha text automatically.

First, use Python's built‑in urllib.request module to fetch the captcha image and save it locally:

import urllib.request
def download_image(url):
    image_name = "./captcha.png"  # define local path and filename
    urllib.request.urlretrieve(url, image_name)  # download the captcha image
    return image_name

After obtaining the image, install and use the pytesseract library together with the Tesseract OCR engine to extract the characters. Install the dependencies:

# Install pytesseract
!pip install pytesseract
# Install tesseract-OCR (Windows users must download manually)
!apt-get install tesseract-ocr

Then apply OCR to the saved image:

import pytesseract
from PIL import Image

def handle_image(image_path):
    image = Image.open(image_path)  # open the captcha image
    image = image.convert('L')     # convert to grayscale
    threshold = 127               # binary threshold
    image = image.point(lambda x: 0 if x < threshold else 255)  # binarize
    result = pytesseract.image_to_string(image)  # extract text
    return result

Finally, feed the recognized text back into the web page using Selenium:

from selenium import webdriver
url = "http://www.example.com"  # test page URL
driver = webdriver.Chrome()    # launch Chrome
driver.get(url)                # open the page
image_url = "http://www.example.com/getCaptcha"  # captcha image URL
# Download and process the captcha
capcha = handle_image(download_image(image_url))
# Input the captcha via Selenium
capcha_input = driver.find_element_by_xpath("//input[@id='captcha']")
capcha_input.send_keys(capcha)

The complete code example illustrates a practical way to automate image captcha handling in Selenium tests, while noting that different sites may require customized recognition strategies.

AutomationtestingOCRcaptchaSeleniumpytesseract
Test Development Learning Exchange
Written by

Test Development Learning Exchange

Test Development Learning Exchange

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.