Backend Development 8 min read

Integrating Tess4J OCR into a Spring Boot Backend Service

This guide demonstrates how to integrate Tess4J OCR into a Spring Boot application, covering environment setup, Maven dependencies, adding language data, creating an OCR service class, building REST endpoints for local and remote image processing, and testing the solution.

Top Architect
Top Architect
Top Architect
Integrating Tess4J OCR into a Spring Boot Backend Service

Overview – This article explains how to integrate Tess4J , a Java wrapper for the Tesseract OCR engine, into a Spring Boot project to recognize text from both local and remote images.

Background – With the growing need for automated text extraction from images, using Tess4J provides a powerful and easy‑to‑use interface for OCR within Java applications.

Part 1: Environment Setup

Ensure the following tools are installed before starting:

JDK 1.8 or higher

Maven

The latest version of Spring Boot

Tess4J version 4.x or newer

Part 2: Add Dependencies

Add the Tess4J dependency to your pom.xml :

<dependencies>
    <dependency>
        <groupId>net.sourceforge.tess4j</groupId>
        <artifactId>tess4j</artifactId>
        <version>4.5.4</version>
    </dependency>
    <!-- other dependencies -->
</dependencies>

Download the appropriate language data (e.g., chi_sim.traineddata ) from the official Tessdata repository or the provided Baidu Cloud link.

Part 3: Create OCR Service Class

Implement a Spring service that wraps Tess4J calls:

@Service
public class OcrService {
    public String recognizeText(File imageFile) throws TesseractException {
        Tesseract tesseract = new Tesseract();
        // Set the path to the tessdata folder (optional for English)
        tesseract.setDatapath("
");
        tesseract.setLanguage("chi_sim");
        return tesseract.doOCR(imageFile);
    }

    public String recognizeTextFromUrl(String imageUrl) throws Exception {
        URL url = new URL(imageUrl);
        InputStream in = url.openStream();
        Files.copy(in, Paths.get("downloaded.jpg"), StandardCopyOption.REPLACE_EXISTING);
        File imageFile = new File("downloaded.jpg");
        return recognizeText(imageFile);
    }
}

The recognizeText method processes a local file, while recognizeTextFromUrl downloads a remote image before performing OCR.

Part 4: Build REST Controller

Create endpoints for uploading an image or providing a URL:

@RestController
@RequestMapping("/api/ocr")
public class OcrController {
    private final OcrService ocrService;

    public OcrController(OcrService ocrService) {
        this.ocrService = ocrService;
    }

    @PostMapping("/upload")
    public ResponseEntity
uploadImage(@RequestParam("file") MultipartFile file) {
        try {
            File convFile = new File(System.getProperty("java.io.tmpdir") + "/" + file.getOriginalFilename());
            file.transferTo(convFile);
            String result = ocrService.recognizeText(convFile);
            return ResponseEntity.ok(result);
        } catch (Exception e) {
            e.printStackTrace();
            return ResponseEntity.badRequest().body("Recognition error: " + e.getMessage());
        }
    }

    @GetMapping("/recognize-url")
    public ResponseEntity
recognizeFromUrl(@RequestParam("imageUrl") String imageUrl) {
        try {
            String result = ocrService.recognizeTextFromUrl(imageUrl);
            return ResponseEntity.ok(result);
        } catch (Exception e) {
            e.printStackTrace();
            return ResponseEntity.badRequest().body("URL recognition error: " + e.getMessage());
        }
    }
}

Part 5: Testing

Test the service locally by uploading an image file via the /api/ocr/upload endpoint, and test remote image processing using /api/ocr/recognize-url with a publicly accessible image URL.

Conclusion

Following these steps gives you a functional Spring Boot service capable of OCR on both local and remote images. Adjust the tessdata path and language settings as needed for multilingual scenarios.

BackendJavaOCRSpring BootRESTTess4J
Top Architect
Written by

Top Architect

Top Architect focuses on sharing practical architecture knowledge, covering enterprise, system, website, large‑scale distributed, and high‑availability architectures, plus architecture adjustments using internet technologies. We welcome idea‑driven, sharing‑oriented architects to exchange and learn together.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.