Artificial Intelligence 15 min read

Building a CNN for Captcha Recognition: From Data Prep to Deployment

This article walks through building a convolutional neural network in Python to label and recognize image captchas, covering data collection, preprocessing, model coding, training on GPU/CPU, testing accuracy, and deploying the model as a Flask API.

Python Crawling & Data Mining

Sep 18, 2021

Building a CNN for Captcha Recognition: From Data Prep to Deployment

Introduction

In this tutorial the author shares a practical project that uses a convolutional neural network (CNN) to label and recognize image captchas. The article builds on three previous posts that introduced the problem, performed data collection, preprocessing, and character image segmentation.

Background Knowledge

The author briefly explains the motivation behind CNNs, the difference between traditional feature descriptors (e.g., SIFT) and deep learning, and lists useful learning resources such as mathematics fundamentals, OpenCV tutorials, and PyTorch courses.

Data Preparation

Prepared datasets include a training set of over 500 images, a test set of about 30 images, and a small prediction set. The images are stored in src_img (training), test_src_img (testing), and usage_src_img (prediction). Character segmentation is performed with a split_image_dir function:

if __name__ == '__main__':
    split_image_dir(SRC_IMG_DIR)
    split_test_image()

The dataset class VerCodeDataset loads images, normalizes them, and assigns label indices for digits 2‑9 and letters A‑Z:

labels = []
#2-9
for i in range(8):
    labels.append(50 + i)
#A-Z
for i in range(26):
    labels.append(65 + i)

class VerCodeDataset(Dataset):
    def __init__(self, image_dir="./letter_template/"):
        l = os.listdir(image_dir)
        self.data = []
        self.label = []
        for d in l:
            fs = os.listdir("{}{}".format(image_dir, d))
            for f in fs:
                fup = "{}{}/{}".format(image_dir, d, f)
                t = torch.from_numpy(io.imread(fup)).float() / 255
                norl = transforms.Normalize(t.mean(), t.std())
                self.data.append(norl(t.reshape(1, 40, 40)))
                self.label.append(labels.index(ord(d)))

CNN Model Definition

The network is a simple feed‑forward CNN defined in net_train.py:

class Net(nn.Module):
    def __init__(self, dropout=0.1):
        super(Net, self).__init__()
        self.dropout = nn.Dropout(dropout)
        self.conv1 = nn.Conv2d(1, 10, 5)   # first conv layer
        self.conv2 = nn.Conv2d(10, 25, 5)  # second conv layer
        self.fc1 = nn.Linear(1 * 25 * 7 * 7, 120)
        self.fc2 = nn.Linear(120, 84)
        self.fc3 = nn.Linear(84, 34)      # 34 classes: 2‑9 + A‑Z

    def forward(self, x):
        x = F.max_pool2d(F.relu(self.conv1(x)), (2, 2))
        x = self.dropout(x)
        x = F.max_pool2d(F.relu(self.conv2(x)), (2, 2))
        x = self.dropout(x)
        x = x.view(-1, self.num_flat_features(x))
        x = F.relu(self.fc1(x))
        x = F.relu(self.fc2(x))
        x = self.fc3(x)
        return x

    def num_flat_features(self, x):
        size = x.size()[1:]
        num_features = 1
        for s in size:
            num_features *= s
        return num_features

Training and Testing

The training script detects whether a CUDA‑compatible GPU is available and falls back to CPU if not. Example training configuration:

data amount: 2286 images (40×40, single channel)
batch_size: 50
epoch: 200

Device   Time
GTX 1070TI   25s
AMD R7 4750U PRO   4min

Training on GPU finishes in about 25 seconds, while CPU takes around 4 minutes. The loss converges to ~0.0016 after 200 epochs.

Testing on the test set (152 characters) yields a 97% accuracy. Sample test result images are shown below:

Deployment with Flask

The trained model is wrapped in a Flask API that accepts a JSON list of image file paths and returns the recognized characters. Core endpoint implementation:

@app.route('/recognize/path', methods=['POST'])
def recognize_path():
    filePathList = request.json['filePathList']
    code = CODE_SUCCESS
    msg = MSG_SUCCESS
    data = []
    for filePath in filePathList:
        if not os.path.exists(filePath):
            print('File not found:', filePath)
            data.append('')
            continue
        labels = usage_model.usage(filePath)
        data.append(''.join(labels))
    result = {'code': code, 'msg': msg, 'data': data}
    return jsonify(result)

Running the Flask app and sending requests via Postman or a web page demonstrates successful batch predictions, with all five sample captchas being correctly recognized.

Conclusion

The complete workflow—from data collection and annotation, through CNN model coding, training, testing, to deployment as a web service—shows that a modestly sized CNN can achieve high accuracy on captcha recognition with relatively low training time. The author encourages readers to experiment further by expanding the dataset, tweaking the model architecture, or integrating the service into larger applications.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

CNN Flask PyTorch captcha recognition

Written by

Python Crawling & Data Mining

Life's short, I code in Python. This channel shares Python web crawling, data mining, analysis, processing, visualization, automated testing, DevOps, big data, AI, cloud computing, machine learning tools, resources, news, technical articles, tutorial videos and learning materials. Join us!

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.