Artificial Intelligence 9 min read

How to Integrate Tess4j OCR into a Spring Boot 3 Application

This guide explains the fundamentals of OCR, introduces Tesseract and its Java wrapper Tess4j, shows how to download language data files, configure a Spring Boot 3 project with Maven dependencies and YAML settings, and provides comprehensive test code for Chinese, English, and mixed‑language image recognition.

Architecture Digest

Mar 26, 2026

How to Integrate Tess4j OCR into a Spring Boot 3 Application

What is Tess4j

OCR

Optical Character Recognition (OCR) converts printed, handwritten, or image‑based text into editable and searchable digital formats.

Tesseract OCR

Tesseract is an open‑source OCR engine originally developed by HP and now maintained by Google. It is one of the most accurate and widely used OCR tools.

Tess4j

Tess4j is a Java wrapper for the Tesseract engine, providing a simple API for Java applications to perform OCR.

Easy integration : Simple API for Java projects.

Cross‑platform : Runs on any OS that supports Java (Windows, macOS, Linux).

Rich functionality : Supports all major Tesseract features such as multi‑language recognition and custom training.

Active community : Ongoing support and updates from the open‑source community.

Download language data files

Official repository: https://github.com/tesseract-ocr/tessdata

Download chi_sim.traineddata for Chinese and eng.traineddata for English. Place the files in a directory referenced by the application, for example F:/HeiMaTouTiao/tessdata.

Integrate Tess4j into a Spring Boot project

Add Maven dependency

<dependency>
    <groupId>net.sourceforge.tess4j</groupId>
    <artifactId>tess4j</artifactId>
    <version>4.1.1</version>
</dependency>

Configure application.yml

server:
  port: 11014

tess4j:
  data-path: F:/HeiMaTouTiao/tessdata
  chinese-train-data: chi_sim
  english-train-data: eng

Create configuration class

import org.springframework.boot.context.properties.ConfigurationProperties;
import org.springframework.context.annotation.Configuration;

@Configuration
@ConfigurationProperties(prefix = "tess4j")
public class Tess4jConfiguration {
    private String dataPath;
    private String chineseTrainData;
    private String englishTrainData;

    public String getDataPath() { return dataPath; }
    public void setDataPath(String dataPath) { this.dataPath = dataPath; }
    public String getChineseTrainData() { return chineseTrainData; }
    public void setChineseTrainData(String chineseTrainData) { this.chineseTrainData = chineseTrainData; }
    public String getEnglishTrainData() { return englishTrainData; }
    public void setEnglishTrainData(String englishTrainData) { this.englishTrainData = englishTrainData; }
}

Test image recognition

Chinese

import cn.edu.scau.config.Tess4jConfiguration;
import net.sourceforge.tess4j.ITesseract;
import net.sourceforge.tess4j.Tesseract;
import net.sourceforge.tess4j.TesseractException;
import org.junit.jupiter.api.Test;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.boot.test.context.SpringBootTest;
import java.io.File;

@SpringBootTest
public class Tess4jApplicationTests {
    @Autowired
    private Tess4jConfiguration tess4jConfiguration;

    @Test
    public void testChinese() throws TesseractException {
        long start = System.currentTimeMillis();
        ITesseract iTesseract = new Tesseract();
        iTesseract.setDatapath(tess4jConfiguration.getDataPath());
        iTesseract.setLanguage(tess4jConfiguration.getChineseTrainData());
        File file = new File("F:/HeiMaTouTiao/tessdata/CaiXuKun-Chinese.png");
        String result = iTesseract.doOCR(file);
        long end = System.currentTimeMillis();
        System.err.println("Time elapsed: " + (end - start) + "ms");
        System.out.println(result);
    }
}

Result: OCR on the Chinese image succeeds; execution time is logged.

English

@Test
public void testEnglish() throws TesseractException {
    long start = System.currentTimeMillis();
    ITesseract iTesseract = new Tesseract();
    iTesseract.setDatapath(tess4jConfiguration.getDataPath());
    iTesseract.setLanguage(tess4jConfiguration.getEnglishTrainData());
    File file = new File("F:/HeiMaTouTiao/tessdata/CaiXuKun-English.png");
    String result = iTesseract.doOCR(file);
    long end = System.currentTimeMillis();
    System.err.println("Time elapsed: " + (end - start) + "ms");
    System.out.println(result);
}

Result: English OCR produces the expected text with similar performance.

Mixed Chinese and English

@Test
public void testChineseAndEnglish() throws TesseractException {
    long start = System.currentTimeMillis();
    ITesseract iTesseract = new Tesseract();
    iTesseract.setDatapath(tess4jConfiguration.getDataPath());
    // Use Chinese trained data for mixed content
    iTesseract.setLanguage(tess4jConfiguration.getChineseTrainData());
    File file = new File("F:/HeiMaTouTiao/tessdata/ParagraphWithChineseAndEnglish.png");
    String result = iTesseract.doOCR(file);
    long end = System.currentTimeMillis();
    System.err.println("Time elapsed: " + (end - start) + "ms");
    System.out.println(result);
}

Result: When both languages appear, the Chinese trained data must be used to correctly recognize the mixed text.

Precautions

The language data files must have the .traineddata suffix, and the file name prefix (e.g., chi_sim, eng) must match the language identifier used in the Java code.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Java Artificial Intelligence OCR spring-boot image recognition tesseract tess4j

Written by

Architecture Digest

Focusing on Java backend development, covering application architecture from top-tier internet companies (high availability, high performance, high stability), big data, machine learning, Java architecture, and other popular fields.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.