How to Build a Spring Boot File Upload Service with Elasticsearch Text Extraction and Search
This guide walks through creating a Spring Boot backend that accepts PDF, Word, and TXT uploads, extracts their text using Elasticsearch's ingest‑attachment plugin, stores metadata in MySQL, and provides fuzzy search and highlighted results via Elasticsearch queries.
Requirement
The product needs a feature that allows users to upload PDF, WORD, or TXT files, then perform fuzzy searches on file names or content and view the files online.
Environment
Backend: Spring Boot + MyBatis‑Plus + MySQL + Elasticsearch
Search engine: Elasticsearch 7.9.3 with Kibana UI
Implementation Steps
1. Set up the environment
Elasticsearch and Kibana installation is omitted; ensure the Java Elasticsearch client version matches the server version.
2. File content recognition
Install the Ingest Attachment Processor Plugin to extract text from attachments. elasticsearch-plugin install ingest-attachment When using Docker, install the plugin inside the container:
# docker exec -it es bash
cd bin/
elasticsearch-plugin install ingest-attachmentAfter installation, restart Elasticsearch.
3. Create an ingest pipeline
The pipeline extracts attachment content and removes the raw field.
{
"description": "Extract attachment information",
"processors": [
{"attachment": {"field": "content", "ignore_missing": true}},
{"remove": {"field": "content"}}
]
}4. Define the index mapping
The mapping specifies field types and analyzers (using Jieba for Chinese tokenization).
{
"mappings": {
"properties": {
"id": {"type": "keyword"},
"fileName": {"type": "text", "analyzer": "my_ana"},
"contentType": {"type": "text", "analyzer": "my_ana"},
"fileUrl": {"type": "text"},
"attachment": {
"properties": {
"content": {"type": "text", "analyzer": "my_ana"}
}
}
}
},
"settings": {
"analysis": {
"filter": {
"jieba_stop": {"type": "stop", "stopwords_path": "stopword/stopwords.txt"},
"jieba_synonym": {"type": "synonym", "synonyms_path": "synonym/synonyms.txt"}
},
"analyzer": {
"my_ana": {
"tokenizer": "jieba_index",
"filter": ["lowercase", "jieba_stop", "jieba_synonym"]
}
}
}
}
}Note: Search must target the attachment.content field and use an analyzer; otherwise the content will not be searchable.
5. Test the pipeline
Upload a file, convert it to Base64, and index it with the pipeline:
{
"id": "1",
"name": "进口红酒",
"filetype": "pdf",
"contenttype": "文章",
"content": "文章内容"
}Use an online Base64 converter (e.g., https://www.zhangxinxu.com/sp/base64.html) for the file content.
6. Query uploaded files
Search the indexed documents and view highlighted matches.
{
"took": 861,
"hits": {
"total": {"value": 5, "relation": "eq"},
"hits": [
{
"_source": {
"fileName": "测试_20220809164145A002.docx",
"attachment": {"content": "内容"},
"fileUrl": "http://localhost:8092/fileInfo/profile/upload/fileInfo/2022/08/09/测试_20220809164145A002.docx",
"contentType": "文章",
"fileType": "docx"
}
}
// ... other hits ...
]
}
}Code Overview
The following snippets illustrate the main components.
YAML configuration
# Data source configuration
spring:
devtools:
restart:
enabled: true
elasticsearch:
rest:
url: 127.0.0.1
uris: 127.0.0.1:9200
connection-timeout: 1000
read-timeout: 3000
username: elastic
password: 123456Elasticsearch client bean
package com.yj.rselasticsearch.domain.config;
import org.apache.http.HttpHost;
import org.apache.http.auth.AuthScope;
import org.apache.http.auth.UsernamePasswordCredentials;
import org.apache.http.impl.client.BasicCredentialsProvider;
import org.elasticsearch.client.RestClient;
import org.elasticsearch.client.RestHighLevelClient;
import org.springframework.beans.factory.annotation.Value;
import org.springframework.context.annotation.Bean;
import org.springframework.context.annotation.Configuration;
import java.time.Duration;
@Configuration
public class ElasticsearchConfig {
@Value("${spring.elasticsearch.rest.url}")
private String edUrl;
@Value("${spring.elasticsearch.rest.username}")
private String userName;
@Value("${spring.elasticsearch.rest.password}")
private String password;
@Bean
public RestHighLevelClient restHighLevelClient() {
BasicCredentialsProvider credentialsProvider = new BasicCredentialsProvider();
credentialsProvider.setCredentials(AuthScope.ANY, new UsernamePasswordCredentials(userName, password));
RestHighLevelClient client = new RestHighLevelClient(RestClient.builder(
new HttpHost(edUrl, 9200, "http"))
.setHttpClientConfigCallback(httpClientBuilder -> {
httpClientBuilder.disableAuthCaching();
httpClientBuilder.setKeepAliveStrategy((response, context) -> Duration.ofMinutes(5).toMillis());
return httpClientBuilder.setDefaultCredentialsProvider(credentialsProvider);
})
);
return client;
}
}Entity class
package com.yj.common.core.domain.entity;
import org.springframework.data.elasticsearch.annotations.Document;
import org.springframework.data.elasticsearch.annotations.Field;
import org.springframework.data.elasticsearch.annotations.FieldType;
import java.util.Date;
@Document(indexName = "fileinfo", createIndex = false)
public class FileInfo {
@Field(name = "id", type = FieldType.Integer)
private Integer id;
@Field(name = "fileName", type = FieldType.Text, analyzer = "jieba_index", searchAnalyzer = "jieba_index")
private String fileName;
@Field(name = "fileType", type = FieldType.Keyword)
private String fileType;
@Field(name = "contentType", type = FieldType.Text)
private String contentType;
@Field(name = "attachment.content", type = FieldType.Text, analyzer = "jieba_index", searchAnalyzer = "jieba_index")
private String content;
@Field(name = "fileUrl", type = FieldType.Text)
private String fileUrl;
private Date createTime;
private Date updateTime;
}Controller for file upload
package com.yj.rselasticsearch.controller;
import com.yj.common.core.controller.BaseController;
import com.yj.common.core.domain.AjaxResult;
import com.yj.common.core.domain.entity.FileInfo;
import com.yj.rselasticsearch.service.FileInfoService;
import org.springframework.web.bind.annotation.*;
import org.springframework.web.multipart.MultipartFile;
import javax.annotation.Resource;
@RestController
@RequestMapping("/fileInfo")
public class FileInfoController extends BaseController {
@Resource
private FileInfoService fileInfoService;
@PutMapping("uploadFile")
public AjaxResult uploadFile(String contentType, MultipartFile file) {
return fileInfoService.uploadFileInfo(contentType, file);
}
}Service implementation (upload & index)
package com.yj.rselasticsearch.service.impl;
import com.alibaba.fastjson.JSON;
import com.yj.common.core.domain.AjaxResult;
import com.yj.common.utils.file.FileUploadUtils;
import com.yj.rselasticsearch.domain.entity.FileInfo;
import com.yj.rselasticsearch.mapper.FileInfoMapper;
import com.yj.rselasticsearch.service.FileInfoService;
import org.elasticsearch.action.index.IndexRequest;
import org.elasticsearch.action.index.IndexResponse;
import org.elasticsearch.client.RequestOptions;
import org.elasticsearch.client.RestHighLevelClient;
import org.elasticsearch.common.xcontent.XContentType;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.beans.factory.annotation.Qualifier;
import org.springframework.stereotype.Service;
import org.springframework.web.multipart.MultipartFile;
import java.io.File;
import java.io.FileInputStream;
import java.io.IOException;
import java.util.Base64;
@Service
public class FileInfoServiceImpl implements FileInfoService {
@Autowired
@Qualifier("restHighLevelClient")
private RestHighLevelClient client;
@Resource
private FileInfoMapper fileInfoMapper;
@Override
public AjaxResult uploadFileInfo(String contentType, MultipartFile file) {
// Validate parameters
if (contentType == null || file == null) {
return AjaxResult.error("请求参数不能为空");
}
try {
String filePath = "/upload/fileInfo"; // simplified path
String fileName = FileUploadUtils.upload(filePath, file);
String prefix = fileName.substring(fileName.lastIndexOf('.') + 1);
File temp = File.createTempFile(fileName, prefix);
file.transferTo(temp);
String url = "http://localhost:8092" + "/fileInfo/" + fileName;
FileInfo fileInfo = new FileInfo();
fileInfo.setFileName(fileName);
fileInfo.setFileType(prefix);
fileInfo.setFileUrl(url);
fileInfo.setContentType(contentType);
fileInfoMapper.insertSelective(fileInfo);
byte[] bytes = getContent(temp);
String base64 = Base64.getEncoder().encodeToString(bytes);
fileInfo.setContent(base64);
IndexRequest request = new IndexRequest("fileinfo");
request.source(JSON.toJSONString(fileInfo), XContentType.JSON);
request.setPipeline("attachment");
IndexResponse response = client.index(request, RequestOptions.DEFAULT);
return AjaxResult.success(fileInfo);
} catch (Exception e) {
return AjaxResult.error(e.getMessage());
}
}
private byte[] getContent(File file) throws IOException {
long size = file.length();
if (size > Integer.MAX_VALUE) return null;
FileInputStream fis = new FileInputStream(file);
byte[] buffer = new byte[(int) size];
int offset = 0, read;
while (offset < buffer.length && (read = fis.read(buffer, offset, buffer.length - offset)) >= 0) {
offset += read;
}
fis.close();
return buffer;
}
}Search service (highlight & suggestion)
package com.yj.rselasticsearch.service.impl;
import com.yj.rselasticsearch.domain.dto.WarningInfoDto;
import com.yj.rselasticsearch.domain.entity.FileInfo;
import org.elasticsearch.index.query.BoolQueryBuilder;
import org.elasticsearch.index.query.QueryBuilders;
import org.elasticsearch.search.fetch.subphase.highlight.HighlightBuilder;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.data.domain.PageRequest;
import org.springframework.data.domain.Pageable;
import org.springframework.data.elasticsearch.core.ElasticsearchRestTemplate;
import org.springframework.data.elasticsearch.core.SearchHits;
import org.springframework.data.elasticsearch.core.query.NativeSearchQuery;
import org.springframework.data.elasticsearch.core.query.NativeSearchQueryBuilder;
import org.springframework.stereotype.Service;
import java.util.*;
import java.util.stream.Collectors;
@Service
public class ElasticsearchServiceImpl implements ElasticsearchService {
@Autowired
private ElasticsearchRestTemplate elasticsearchRestTemplate;
@Override
public List<String> getAssociationalWordOther(WarningInfoDto dto) {
BoolQueryBuilder qb = QueryBuilders.boolQuery()
.should(QueryBuilders.matchBoolPrefixQuery("fileName", dto.getKeyword()))
.must(QueryBuilders.termsQuery("contentType", dto.getContentType()));
NativeSearchQuery query = new NativeSearchQueryBuilder()
.withQuery(qb)
.withHighlightFields(new HighlightBuilder.Field("fileName"))
.withHighlightBuilder(new HighlightBuilder().preTags("<span style='color:red'>").postTags("</span>"))
.build();
SearchHits<FileInfo> hits = elasticsearchRestTemplate.search(query, FileInfo.class);
List<String> suggestions = new ArrayList<>();
for (var hit : hits) {
Map<String, List<String>> hl = hit.getHighlightFields();
if (hl.get("fileName") != null) {
suggestions.add(hl.get("fileName").get(0));
}
}
return suggestions.stream().distinct().limit(9).collect(Collectors.toList());
}
@Override
public IPage<FileInfo> queryHighLightWordOther(WarningInfoDto dto) {
Pageable pageable = PageRequest.of(dto.getPageIndex() - 1, dto.getPageSize());
BoolQueryBuilder qb = QueryBuilders.boolQuery()
.should(QueryBuilders.matchBoolPrefixQuery("fileName", dto.getKeyword()))
.should(QueryBuilders.matchBoolPrefixQuery("attachment.content", dto.getKeyword()))
.must(QueryBuilders.termsQuery("contentType", dto.getContentType()));
NativeSearchQuery query = new NativeSearchQueryBuilder()
.withQuery(qb)
.withHighlightFields(new HighlightBuilder.Field("fileName"), new HighlightBuilder.Field("attachment.content"))
.withHighlightBuilder(new HighlightBuilder().preTags("<span style='color:red'>").postTags("</span>"))
.withPageable(pageable)
.build();
SearchHits<FileInfo> hits = elasticsearchRestTemplate.search(query, FileInfo.class);
List<FileInfo> results = new ArrayList<>();
for (var hit : hits) {
Map<String, List<String>> hl = hit.getHighlightFields();
FileInfo fi = hit.getContent();
if (hl.get("fileName") != null) {
fi.setFileName(hl.get("fileName").get(0));
}
if (hl.get("attachment.content") != null) {
fi.setContent(hl.get("attachment.content").get(0));
}
results.add(fi);
}
IPage<FileInfo> page = new com.baomidou.mybatisplus.extension.plugins.pagination.Page<>();
page.setCurrent(dto.getPageIndex());
page.setSize(dto.getPageSize());
page.setTotal(hits.getTotalHits());
page.setRecords(results);
return page;
}
}The implementation demonstrates uploading files, converting them to Base64, indexing with Elasticsearch's attachment pipeline, and performing fuzzy, highlighted searches on both file names and extracted content.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Java High-Performance Architecture
Sharing Java development articles and resources, including SSM architecture and the Spring ecosystem (Spring Boot, Spring Cloud, MyBatis, Dubbo, Docker), Zookeeper, Redis, architecture design, microservices, message queues, Git, etc.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
