Turn Screenshots into Editable Text Instantly with TextShot – A Python OCR Tool
TextShot is a Python-based OCR utility that captures a screen region and instantly converts the image into editable text, leveraging Tesseract and optional language parameters, with installation steps, hotkey integration, and guidance on image preprocessing for improved accuracy.
When working at the office, you often need to convert image content into text. TextShot, a new tool released by GitHub user ianzhao05, lets you capture a screen region and instantly generate editable text using OCR.
Usage
Run textshot.py to open an overlay on the screen, draw a rectangle around the area you want to extract, and the tool will output the recognized text. You can specify languages with optional command‑line arguments, e.g., python textshot.py eng + fra to use English as the primary language and French as secondary. The default language is English.
For convenience, you can bind the tool to a hotkey. Windows users often use an AutoHotkey script; a sample textshot.ahk is provided. On Ubuntu, add a custom shortcut in the Keyboard Settings that runs /usr/bin/python3 <path-to-textshot.py>, ensuring the correct Python interpreter is used if you work inside a virtual environment.
Installation
Install Python 3.
Clone the TextShot repository and cd into the directory.
(Optional) Create a virtual environment, e.g., python -m venv .venv.
Install required packages with pip install -r requirements.txt.
Install Google’s Tesseract OCR engine ( https://github.com/tesseract-ocr/tesseract ) and add its binary directory to your system PATH so that the tesseract command is reachable.
Tesseract Overview
Tesseract is the leading open‑source OCR engine for printed text. Originally developed by Hewlett‑Packard in the 1980s, it was open‑sourced in 2005 and has been sponsored by Google since 2006. While it works well under controlled conditions, noisy or poorly pre‑processed images can degrade its accuracy.
Modern Tesseract (v4+) incorporates deep‑learning models (LSTM‑based RNN) that significantly improve recognition rates. It supports Unicode (UTF‑8), can recognize over 100 languages, and offers multiple output formats such as plain text, PDF, and TSV. To achieve the best results, the input image should be of high quality.
Common Image‑Preprocessing Steps
Invert the image.
Resize.
Binarize.
Remove noise.
Rotate / deskew.
Crop edges.
These operations can be performed with OpenCV or directly via NumPy in Python.
Chinese OCR Projects
Chinese OCR use cases include ID card and train ticket recognition, as well as more advanced scenarios like translating a line of text from a book in real time. A popular open‑source Chinese OCR project is chineseocr , which combines YOLO‑v3 for detection and CRNN for recognition and has garnered over 2.5 K stars.
An even lighter alternative, built on top of chineseocr, is chineseocr_lite . You can explore it at https://github.com/ouyanghuiyu/chineseocr_lite .
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Programmer DD
A tinkering programmer and author of "Spring Cloud Microservices in Action"
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
