Implement OCR in Spring Boot with Tess4J for Image Text Recognition

This guide shows how to integrate the open‑source Tesseract OCR engine into a Spring Boot application using the Tess4J Java wrapper, covering Chinese language data setup, Maven dependency configuration, bean creation, service implementation, and a unit test to verify image text extraction.

The Dominant Programmer
The Dominant Programmer
The Dominant Programmer
Implement OCR in Spring Boot with Tess4J for Image Text Recognition

Tesseract is an open‑source optical character recognition (OCR) engine that converts image text into machine‑readable strings and supports multiple languages. Tess4J provides a Java API wrapper around Tesseract, enabling direct calls from Java code.

Because the default Tesseract distribution cannot recognize Chinese characters, the Chinese simplified trained data ( chi_sim.traineddata) must be downloaded from the official tessdata repository and placed in a local directory, e.g., D:/tessdata:

1. Add Tess4J to the Spring Boot project

<dependency>
  <groupId>net.sourceforge.tess4j</groupId>
  <artifactId>tess4j</artifactId>
  <version>4.5.4</version>
</dependency>

2. Configure the trained‑data path in application.yml:

# Path to the trained data folder
tess4j:
  datapath: D:/tessdata

3. Create a Spring bean that initializes Tesseract with the data path and language settings:

import net.sourceforge.tess4j.Tesseract;
import org.springframework.beans.factory.annotation.Value;
import org.springframework.context.annotation.Bean;
import org.springframework.context.annotation.Configuration;

@Configuration
public class TesseractOcrConfiguration {
    @Value("${tess4j.datapath}")
    private String dataPath;

    @Bean
    public Tesseract tesseract() {
        Tesseract tesseract = new Tesseract();
        tesseract.setDatapath(dataPath); // set trained data folder
        tesseract.setLanguage("chi_sim"); // Chinese simplified
        return tesseract;
    }
}

4. Define a service interface for OCR operations:

import java.io.InputStream;

public interface IOcrService {
    String recognizeText(InputStream sbs);
}

5. Implement the service using the injected Tesseract bean:

import net.sourceforge.tess4j.Tesseract;
import net.sourceforge.tess4j.TesseractException;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.stereotype.Service;
import javax.imageio.ImageIO;
import java.awt.image.BufferedImage;
import java.io.IOException;
import java.io.InputStream;

@Service
public class OcrServiceImpl implements IOcrService {
    @Autowired
    private Tesseract tesseract;

    @Override
    public String recognizeText(InputStream sbs) {
        try {
            BufferedImage bufferedImage = ImageIO.read(sbs);
            return tesseract.doOCR(bufferedImage);
        } catch (IOException | TesseractException e) {
            e.printStackTrace();
            return null;
        }
    }
}

6. Write a unit test to verify the OCR flow:

import org.junit.Test;
import org.junit.runner.RunWith;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.boot.test.context.SpringBootTest;
import org.springframework.test.context.junit4.SpringRunner;
import java.io.FileInputStream;
import java.io.FileNotFoundException;
import java.io.InputStream;

@RunWith(SpringRunner.class)
@SpringBootTest(classes = RuoYiApplication.class, webEnvironment = SpringBootTest.WebEnvironment.RANDOM_PORT)
public class Tess4JOcrTest {
    @Autowired
    private IOcrService iOcrService;

    @Test
    public void ocrLocalPng() {
        try {
            InputStream inputStream = new FileInputStream("D://tess4j.png");
            String result = iOcrService.recognizeText(inputStream);
            System.out.println(result);
        } catch (FileNotFoundException e) {
            e.printStackTrace();
        }
    }
}

Place any PNG image (e.g., a screenshot) at the specified path and run the test. The console will print the recognized text, though accuracy may vary. The same approach can be extended to scenarios such as front‑end image uploads where the back‑end returns OCR results.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

OCRspring-bootimage recognitiontesseracttess4j
The Dominant Programmer
Written by

The Dominant Programmer

Resources and tutorials for programmers' advanced learning journey. Advanced tracks in Java, Python, and C#. Blog: https://blog.csdn.net/badao_liumang_qizhi

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.