Building a CNN for Captcha Recognition: From Data Prep to Deployment
This article walks through building a convolutional neural network in Python to label and recognize image captchas, covering data collection, preprocessing, model coding, training on GPU/CPU, testing accuracy, and deploying the model as a Flask API.
Introduction
In this tutorial the author shares a practical project that uses a convolutional neural network (CNN) to label and recognize image captchas. The article builds on three previous posts that introduced the problem, performed data collection, preprocessing, and character image segmentation.
Background Knowledge
The author briefly explains the motivation behind CNNs, the difference between traditional feature descriptors (e.g., SIFT) and deep learning, and lists useful learning resources such as mathematics fundamentals, OpenCV tutorials, and PyTorch courses.
Data Preparation
Prepared datasets include a training set of over 500 images, a test set of about 30 images, and a small prediction set. The images are stored in src_img (training), test_src_img (testing), and usage_src_img (prediction). Character segmentation is performed with a split_image_dir function:
if __name__ == '__main__':
split_image_dir(SRC_IMG_DIR)
split_test_image()The dataset class VerCodeDataset loads images, normalizes them, and assigns label indices for digits 2‑9 and letters A‑Z:
labels = []
#2-9
for i in range(8):
labels.append(50 + i)
#A-Z
for i in range(26):
labels.append(65 + i)
class VerCodeDataset(Dataset):
def __init__(self, image_dir="./letter_template/"):
l = os.listdir(image_dir)
self.data = []
self.label = []
for d in l:
fs = os.listdir("{}{}".format(image_dir, d))
for f in fs:
fup = "{}{}/{}".format(image_dir, d, f)
t = torch.from_numpy(io.imread(fup)).float() / 255
norl = transforms.Normalize(t.mean(), t.std())
self.data.append(norl(t.reshape(1, 40, 40)))
self.label.append(labels.index(ord(d)))CNN Model Definition
The network is a simple feed‑forward CNN defined in net_train.py:
class Net(nn.Module):
def __init__(self, dropout=0.1):
super(Net, self).__init__()
self.dropout = nn.Dropout(dropout)
self.conv1 = nn.Conv2d(1, 10, 5) # first conv layer
self.conv2 = nn.Conv2d(10, 25, 5) # second conv layer
self.fc1 = nn.Linear(1 * 25 * 7 * 7, 120)
self.fc2 = nn.Linear(120, 84)
self.fc3 = nn.Linear(84, 34) # 34 classes: 2‑9 + A‑Z
def forward(self, x):
x = F.max_pool2d(F.relu(self.conv1(x)), (2, 2))
x = self.dropout(x)
x = F.max_pool2d(F.relu(self.conv2(x)), (2, 2))
x = self.dropout(x)
x = x.view(-1, self.num_flat_features(x))
x = F.relu(self.fc1(x))
x = F.relu(self.fc2(x))
x = self.fc3(x)
return x
def num_flat_features(self, x):
size = x.size()[1:]
num_features = 1
for s in size:
num_features *= s
return num_featuresTraining and Testing
The training script detects whether a CUDA‑compatible GPU is available and falls back to CPU if not. Example training configuration:
data amount: 2286 images (40×40, single channel)
batch_size: 50
epoch: 200
Device Time
GTX 1070TI 25s
AMD R7 4750U PRO 4minTraining on GPU finishes in about 25 seconds, while CPU takes around 4 minutes. The loss converges to ~0.0016 after 200 epochs.
Testing on the test set (152 characters) yields a 97% accuracy. Sample test result images are shown below:
Deployment with Flask
The trained model is wrapped in a Flask API that accepts a JSON list of image file paths and returns the recognized characters. Core endpoint implementation:
@app.route('/recognize/path', methods=['POST'])
def recognize_path():
filePathList = request.json['filePathList']
code = CODE_SUCCESS
msg = MSG_SUCCESS
data = []
for filePath in filePathList:
if not os.path.exists(filePath):
print('File not found:', filePath)
data.append('')
continue
labels = usage_model.usage(filePath)
data.append(''.join(labels))
result = {'code': code, 'msg': msg, 'data': data}
return jsonify(result)Running the Flask app and sending requests via Postman or a web page demonstrates successful batch predictions, with all five sample captchas being correctly recognized.
Conclusion
The complete workflow—from data collection and annotation, through CNN model coding, training, testing, to deployment as a web service—shows that a modestly sized CNN can achieve high accuracy on captcha recognition with relatively low training time. The author encourages readers to experiment further by expanding the dataset, tweaking the model architecture, or integrating the service into larger applications.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Python Crawling & Data Mining
Life's short, I code in Python. This channel shares Python web crawling, data mining, analysis, processing, visualization, automated testing, DevOps, big data, AI, cloud computing, machine learning tools, resources, news, technical articles, tutorial videos and learning materials. Join us!
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
