Tagged articles
2 articles
Page 1 of 1
Java Backend Technology
Java Backend Technology
Feb 1, 2025 · Backend Development

Unlock Apache Tika: Extract Text, Metadata, and Detect Sensitive Data in Java

This article introduces Apache Tika, a powerful Java library for parsing many file formats, extracting text and metadata, performing OCR and language detection, and shows how to integrate it with Spring Boot to automatically detect sensitive information such as ID numbers, credit cards, and phone numbers.

Apache TikaFile ParsingMetadata Extraction
0 likes · 22 min read
Unlock Apache Tika: Extract Text, Metadata, and Detect Sensitive Data in Java
Lobster Programming
Lobster Programming
Nov 1, 2024 · Backend Development

How to Parse PDFs and Extract Metadata with Apache Tika and Spring Boot

This guide explains Apache Tika's document parsing capabilities, shows how to download and run the Tika app, demonstrates extracting text and metadata from a PDF, and provides step‑by‑step instructions for integrating Tika into a Spring Boot project with full code examples.

Apache TikaDocument ProcessingJava
0 likes · 7 min read
How to Parse PDFs and Extract Metadata with Apache Tika and Spring Boot