Integrating Tess4J OCR into a Spring Boot Backend Service
This guide demonstrates how to integrate Tess4J OCR into a Spring Boot application, covering environment setup, Maven dependencies, adding language data, creating an OCR service class, building REST endpoints for local and remote image processing, and testing the solution.
Overview – This article explains how to integrate Tess4J , a Java wrapper for the Tesseract OCR engine, into a Spring Boot project to recognize text from both local and remote images.
Background – With the growing need for automated text extraction from images, using Tess4J provides a powerful and easy‑to‑use interface for OCR within Java applications.
Part 1: Environment Setup
Ensure the following tools are installed before starting:
JDK 1.8 or higher
Maven
The latest version of Spring Boot
Tess4J version 4.x or newer
Part 2: Add Dependencies
Add the Tess4J dependency to your pom.xml :
<dependencies>
<dependency>
<groupId>net.sourceforge.tess4j</groupId>
<artifactId>tess4j</artifactId>
<version>4.5.4</version>
</dependency>
<!-- other dependencies -->
</dependencies>Download the appropriate language data (e.g., chi_sim.traineddata ) from the official Tessdata repository or the provided Baidu Cloud link.
Part 3: Create OCR Service Class
Implement a Spring service that wraps Tess4J calls:
@Service
public class OcrService {
public String recognizeText(File imageFile) throws TesseractException {
Tesseract tesseract = new Tesseract();
// Set the path to the tessdata folder (optional for English)
tesseract.setDatapath("
");
tesseract.setLanguage("chi_sim");
return tesseract.doOCR(imageFile);
}
public String recognizeTextFromUrl(String imageUrl) throws Exception {
URL url = new URL(imageUrl);
InputStream in = url.openStream();
Files.copy(in, Paths.get("downloaded.jpg"), StandardCopyOption.REPLACE_EXISTING);
File imageFile = new File("downloaded.jpg");
return recognizeText(imageFile);
}
}The recognizeText method processes a local file, while recognizeTextFromUrl downloads a remote image before performing OCR.
Part 4: Build REST Controller
Create endpoints for uploading an image or providing a URL:
@RestController
@RequestMapping("/api/ocr")
public class OcrController {
private final OcrService ocrService;
public OcrController(OcrService ocrService) {
this.ocrService = ocrService;
}
@PostMapping("/upload")
public ResponseEntity
uploadImage(@RequestParam("file") MultipartFile file) {
try {
File convFile = new File(System.getProperty("java.io.tmpdir") + "/" + file.getOriginalFilename());
file.transferTo(convFile);
String result = ocrService.recognizeText(convFile);
return ResponseEntity.ok(result);
} catch (Exception e) {
e.printStackTrace();
return ResponseEntity.badRequest().body("Recognition error: " + e.getMessage());
}
}
@GetMapping("/recognize-url")
public ResponseEntity
recognizeFromUrl(@RequestParam("imageUrl") String imageUrl) {
try {
String result = ocrService.recognizeTextFromUrl(imageUrl);
return ResponseEntity.ok(result);
} catch (Exception e) {
e.printStackTrace();
return ResponseEntity.badRequest().body("URL recognition error: " + e.getMessage());
}
}
}Part 5: Testing
Test the service locally by uploading an image file via the /api/ocr/upload endpoint, and test remote image processing using /api/ocr/recognize-url with a publicly accessible image URL.
Conclusion
Following these steps gives you a functional Spring Boot service capable of OCR on both local and remote images. Adjust the tessdata path and language settings as needed for multilingual scenarios.
Top Architect
Top Architect focuses on sharing practical architecture knowledge, covering enterprise, system, website, large‑scale distributed, and high‑availability architectures, plus architecture adjustments using internet technologies. We welcome idea‑driven, sharing‑oriented architects to exchange and learn together.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.