Integrate Tess4J OCR into Spring Boot: Step‑by‑Step Guide
This tutorial walks you through setting up a Spring Boot project with Tess4J, adding required dependencies, configuring language data, implementing an OCR service and REST controller, and testing both local file and remote URL image recognition, all with complete code examples.
Background
With the rapid advancement of information technology, extracting text from images is increasingly used for data entry and automation. Tess4J, a Java JNA wrapper for the Tesseract OCR engine, provides a powerful interface for this purpose, and can be seamlessly integrated into a Spring Boot application.
Part 1: Environment Setup
Before starting, ensure you have the following:
JDK 1.8 or higher
Maven
Latest Spring Boot version
Tess4J version 4.x or higher
Part 2: Add Dependency
Include the Tess4J dependency in your pom.xml:
<dependencies>
<dependency>
<groupId>net.sourceforge.tess4j</groupId>
<artifactId>tess4j</artifactId>
<version>4.5.4</version>
</dependency>
<!-- other dependencies -->
</dependencies>Make sure the versions match your development environment.
Part 3: Add Tessdata Language Pack
Download the language data files, for example from:
https://gitcode.com/tesseract-ocr/tessdata/tree/main
or a Baidu Cloud link (password: 8v8u).
Part 4: Create OCR Service Class
@Service
public class OcrService {
public String recognizeText(File imageFile) throws TesseractException {
Tesseract tesseract = new Tesseract();
// Set path to tessdata if using non‑default language packs
tesseract.setDatapath("<your_tessdata_path>");
tesseract.setLanguage("chi_sim");
return tesseract.doOCR(imageFile);
}
public String recognizeTextFromUrl(String imageUrl) throws Exception {
URL url = new URL(imageUrl);
InputStream in = url.openStream();
Files.copy(in, Paths.get("downloaded.jpg"), StandardCopyOption.REPLACE_EXISTING);
File imageFile = new File("downloaded.jpg");
return recognizeText(imageFile);
}
}The recognizeText(File imageFile) method performs OCR on a local file, while recognizeTextFromUrl(String imageUrl) first downloads a remote image before processing.
Part 5: Build REST Controller
@RestController
@RequestMapping("/api/ocr")
public class OcrController {
private final OcrService ocrService;
public OcrController(OcrService ocrService) {
this.ocrService = ocrService;
}
@PostMapping("/upload")
public ResponseEntity<String> uploadImage(@RequestParam("file") MultipartFile file) {
try {
File convFile = new File(System.getProperty("java.io.tmpdir") + "/" + file.getOriginalFilename());
file.transferTo(convFile);
String result = ocrService.recognizeText(convFile);
return ResponseEntity.ok(result);
} catch (Exception e) {
e.printStackTrace();
return ResponseEntity.badRequest().body("Recognition error: " + e.getMessage());
}
}
@GetMapping("/recognize-url")
public ResponseEntity<String> recognizeFromUrl(@RequestParam("imageUrl") String imageUrl) {
try {
String result = ocrService.recognizeTextFromUrl(imageUrl);
return ResponseEntity.ok(result);
} catch (Exception e) {
e.printStackTrace();
return ResponseEntity.badRequest().body("URL recognition error: " + e.getMessage());
}
}
}The controller provides two endpoints: /api/ocr/upload for local image uploads and /api/ocr/recognize-url for processing images from a URL.
Part 6: Testing
Local test result:
Remote test result:
Conclusion
Following these steps gives you a Spring Boot service capable of recognizing text from both local and remote images. Adjust configurations such as language packs as needed for multilingual scenarios. While OCR still has room for improvement, Tess4J provides a solid starting point.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Code Ape Tech Column
Former Ant Group P8 engineer, pure technologist, sharing full‑stack Java, job interview and career advice through a column. Site: java-family.cn
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
