Integrating Tess4J OCR into a Spring Boot Application

This guide walks through setting up a Spring Boot project, adding Tess4J dependencies, configuring language data, implementing an OCR service class, exposing REST endpoints for local and remote image recognition, and testing the OCR functionality end‑to‑end.

Code Ape Tech Column
Code Ape Tech Column
Code Ape Tech Column
Integrating Tess4J OCR into a Spring Boot Application

In this tutorial we explore how to integrate Tess4J, a Java wrapper for the Tesseract OCR engine, into a Spring Boot application to recognize text from both local and remote images.

Background : As image‑based text extraction becomes increasingly important for data entry and automation, Tess4J provides a powerful interface for OCR in Java applications. Integrating it with Spring Boot enables a clean, service‑oriented solution.

Part 1 – Environment Setup : Ensure you have JDK 1.8+, Maven, the latest Spring Boot version, and Tess4J 4.x or newer.

Part 2 – Add Dependency (pom.xml):

<dependencies>
    <dependency>
        <groupId>net.sourceforge.tess4j</groupId>
        <artifactId>tess4j</artifactId>
        <version>4.5.4</version>
    </dependency>
    <!-- other dependencies -->
</dependencies>

Make sure the versions match your development environment.

Part 3 – Add Tessdata Language Pack : Download the required language files (e.g., chi_sim.traineddata) from the official tessdata repository (https://gitcode.com/tesseract-ocr/tessdata/tree/main) or the provided Baidu Cloud link.

Part 4 – Create OCR Service Class :

@Service
public class OcrService {

    public String recognizeText(File imageFile) throws TesseractException {
        Tesseract tesseract = new Tesseract();
        // Set the path to tessdata (optional for standard English)
        tesseract.setDatapath("<your tessdata directory>");
        tesseract.setLanguage("chi_sim");
        return tesseract.doOCR(imageFile);
    }

    public String recognizeTextFromUrl(String imageUrl) throws Exception {
        URL url = new URL(imageUrl);
        InputStream in = url.openStream();
        Files.copy(in, Paths.get("downloaded.jpg"), StandardCopyOption.REPLACE_EXISTING);
        File imageFile = new File("downloaded.jpg");
        return recognizeText(imageFile);
    }
}

The recognizeText(File) method handles OCR for a local file, while recognizeTextFromUrl(String) downloads a remote image before processing.

Part 5 – Build REST Controller :

@RestController
@RequestMapping("/api/ocr")
public class OcrController {

    private final OcrService ocrService;

    // Constructor injection
    public OcrController(OcrService ocrService) {
        this.ocrService = ocrService;
    }

    @PostMapping("/upload")
    public ResponseEntity<String> uploadImage(@RequestParam("file") MultipartFile file) {
        try {
            File convFile = new File(System.getProperty("java.io.tmpdir") + "/" + file.getOriginalFilename());
            file.transferTo(convFile);
            String result = ocrService.recognizeText(convFile);
            return ResponseEntity.ok(result);
        } catch (Exception e) {
            e.printStackTrace();
            return ResponseEntity.badRequest().body("Recognition error: " + e.getMessage());
        }
    }

    @GetMapping("/recognize-url")
    public ResponseEntity<String> recognizeFromUrl(@RequestParam("imageUrl") String imageUrl) {
        try {
            String result = ocrService.recognizeTextFromUrl(imageUrl);
            return ResponseEntity.ok(result);
        } catch (Exception e) {
            e.printStackTrace();
            return ResponseEntity.badRequest().body("URL recognition error: " + e.getMessage());
        }
    }
}

The controller exposes two endpoints: /api/ocr/upload for local file uploads and /api/ocr/recognize-url for processing images from a URL.

Part 6 – Testing : Use tools like Postman or curl to POST a local image to /api/ocr/upload and GET /api/ocr/recognize-url?imageUrl=YOUR_IMAGE_URL for remote testing. Screenshots in the original article illustrate successful local and remote OCR results.

Conclusion : Following these steps gives you a functional Spring Boot service capable of OCR on both local and remote images. Adjust the language pack and configuration as needed for multilingual scenarios, and consider further optimizations for production use.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

javaOCRSpring BootREST APItesseracttess4j
Code Ape Tech Column
Written by

Code Ape Tech Column

Former Ant Group P8 engineer, pure technologist, sharing full‑stack Java, job interview and career advice through a column. Site: java-family.cn

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.