Backend Development 9 min read

How to Seamlessly Integrate Tess4j OCR into a SpringBoot Application

This tutorial walks through the fundamentals of OCR, explains how to download the required Tesseract traineddata files, shows how to add Tess4j as a Maven dependency, configure SpringBoot with custom properties, and provides complete Java test code for Chinese, English, and mixed‑language image recognition, highlighting performance considerations and file‑naming requirements.

Java Companion

Mar 22, 2026

How to Seamlessly Integrate Tess4j OCR into a SpringBoot Application

1. OCR and Tesseract

OCR (Optical Character Recognition) converts printed, handwritten or image‑based text into editable, searchable text. The typical OCR pipeline consists of:

Image preprocessing – acquisition, binarization, denoising, rotation correction, segmentation.

Text detection – locating text lines, words or character boundaries.

Feature extraction – extracting visual features for each character.

Character recognition – matching extracted features against known character models.

Post‑processing – proofreading, formatting and layout analysis to improve readability.

Tesseract OCR is an open‑source engine originally developed by HP, open‑sourced in 2005 and now maintained by Google. It is one of the most accurate and widely used open‑source OCR engines.

Tess4j is a Java wrapper for Tesseract that provides a simple API for Java applications.

Easy integration via a straightforward API.

Cross‑platform support (Windows, macOS, Linux).

Full Tesseract functionality including multilingual recognition and custom training.

Active open‑source community.

2. Download traineddata files

Traineddata files are hosted at https://github.com/tesseract-ocr/tessdata. For the demonstration the following files are required:

Chinese simplified: chi_sim.traineddata English: eng.traineddata Copy the files to an absolute directory, e.g. F:/HeiMaTouTiao/tessdata, and reference this path in the Spring Boot configuration.

3. Integrate Tess4j into a Spring Boot project

3.1 Maven dependency

<dependency>
  <groupId>net.sourceforge.tess4j</groupId>
  <artifactId>tess4j</artifactId>
  <version>4.1.1</version>
</dependency>

3.2 Application properties (application.yml)

server:
  port: 11014

tess4j:
  data-path: F:/HeiMaTouTiao/tessdata
  chinese-train-data: chi_sim
  english-train-data: eng

3.3 Configuration class

import org.springframework.boot.context.properties.ConfigurationProperties;
import org.springframework.context.annotation.Configuration;

@Configuration
@ConfigurationProperties(prefix = "tess4j")
public class Tess4jConfiguration {
    private String dataPath;
    private String chineseTrainData;
    private String englishTrainData;

    public String getDataPath() { return dataPath; }
    public void setDataPath(String dataPath) { this.dataPath = dataPath; }

    public String getChineseTrainData() { return chineseTrainData; }
    public void setChineseTrainData(String chineseTrainData) { this.chineseTrainData = chineseTrainData; }

    public String getEnglishTrainData() { return englishTrainData; }
    public void setEnglishTrainData(String englishTrainData) { this.englishTrainData = englishTrainData; }
}

4. Test image recognition

4.1 Chinese text

4.1.1 Test code

import cn.edu.scau.config.Tess4jConfiguration;
import net.sourceforge.tess4j.ITesseract;
import net.sourceforge.tess4j.Tesseract;
import net.sourceforge.tess4j.TesseractException;
import org.junit.jupiter.api.Test;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.boot.test.context.SpringBootTest;
import java.io.File;

@SpringBootTest
public class Tess4jApplicationTests {
    @Autowired
    private Tess4jConfiguration tess4jConfiguration;

    @Test
    public void testChinese() throws TesseractException {
        long start = System.currentTimeMillis();
        ITesseract iTesseract = new Tesseract();
        iTesseract.setDatapath(tess4jConfiguration.getDataPath());
        iTesseract.setLanguage(tess4jConfiguration.getChineseTrainData());
        File file = new File("F:/HeiMaTouTiao/tessdata/CaiXuKun-Chinese.png");
        String result = iTesseract.doOCR(file);
        long end = System.currentTimeMillis();
        System.err.println("Time elapsed: " + (end - start) + " ms");
        System.out.println(result);
    }
}

4.2 English text

4.2.1 Test code

import cn.edu.scau.config.Tess4jConfiguration;
import net.sourceforge.tess4j.ITesseract;
import net.sourceforge.tess4j.Tesseract;
import net.sourceforge.tess4j.TesseractException;
import org.junit.jupiter.api.Test;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.boot.test.context.SpringBootTest;
import java.io.File;

@SpringBootTest
public class Tess4jApplicationTests {
    @Autowired
    private Tess4jConfiguration tess4jConfiguration;

    @Test
    public void testEnglish() throws TesseractException {
        long start = System.currentTimeMillis();
        ITesseract iTesseract = new Tesseract();
        iTesseract.setDatapath(tess4jConfiguration.getDataPath());
        iTesseract.setLanguage(tess4jConfiguration.getEnglishTrainData());
        File file = new File("F:/HeiMaTouTiao/tessdata/CaiXuKun-English.png");
        String result = iTesseract.doOCR(file);
        long end = System.currentTimeMillis();
        System.err.println("Time elapsed: " + (end - start) + " ms");
        System.out.println(result);
    }
}

4.3 Mixed Chinese‑English text

When the image contains both Chinese and English characters, the Chinese traineddata must be used because it includes the English character set.

4.3.1 Test code

import cn.edu.scau.config.Tess4jConfiguration;
import net.sourceforge.tess4j.ITesseract;
import net.sourceforge.tess4j.Tesseract;
import net.sourceforge.tess4j.TesseractException;
import org.junit.jupiter.api.Test;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.boot.test.context.SpringBootTest;
import java.io.File;

@SpringBootTest
public class Tess4jApplicationTests {
    @Autowired
    private Tess4jConfiguration tess4jConfiguration;

    @Test
    public void testChineseAndEnglish() throws TesseractException {
        long start = System.currentTimeMillis();
        ITesseract iTesseract = new Tesseract();
        iTesseract.setDatapath(tess4jConfiguration.getDataPath());
        iTesseract.setLanguage(tess4jConfiguration.getChineseTrainData());
        File file = new File("F:/HeiMaTouTiao/tessdata/ParagraphWithChineseAndEnglish.png");
        String result = iTesseract.doOCR(file);
        long end = System.currentTimeMillis();
        System.err.println("Time elapsed: " + (end - start) + " ms");
        System.out.println(result);
    }
}

5. Important notes

The traineddata files must retain the .traineddata suffix, and the filename prefix (e.g., chi_sim, eng) must match the language identifier supplied to setLanguage in the Java code.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

backend Java OCR springboot image recognition tesseract tess4j

Written by

Java Companion

A highly professional Java public account

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.

1. OCR and Tesseract

2. Download traineddata files

3. Integrate Tess4j into a Spring Boot project

3.1 Maven dependency

3.2 Application properties (application.yml)

3.3 Configuration class

4. Test image recognition

4.1 Chinese text

4.1.1 Test code

4.2 English text

4.2.1 Test code

4.3 Mixed Chinese‑English text

4.3.1 Test code

5. Important notes

Java Companion

How this landed with the community

Was this worth your time?

0 Comments

3. Integrate Tess4j into a Spring Boot project