How to Build a High‑Performance Distributed Log Query System with Lucene, Ignite, and Log4j2
This article presents a design for a transparent, flexible, and low‑resource distributed logging solution that uses Lucene for indexing, Apache Ignite for service and compute grids, and a custom Log4j2 appender, enabling fast, unified log queries across clustered applications.
Background
Typical application development records logs by calling a logging system API and configuring the output via files such as log4j2.xml. Logs are written to files on each server, requiring manual login to each node to view them, which is cumbersome for clustered deployments.
Enterprise applications often do not need deep log analysis; most required data can be obtained from normal storage like databases.
Goals
The solution must be transparent to applications (developers continue to log as usual), highly flexible (configurable query dimensions such as keywords, time ranges, business metrics), provide a unified query interface across the cluster, deliver high query performance, consume minimal CPU and memory, and be simple to deploy without complex configuration.
Architecture
After evaluating requirements and existing open‑source options, the chosen stack combines Lucene, Apache Ignite, and Log4j2. The overall architecture is illustrated below:
Key Technologies
Apache Ignite
Ignite provides a high‑performance, integrated, hybrid in‑memory platform that makes distributed caching, computation, and storage transparent to developers, reducing the complexity of building, testing, and deploying distributed applications.
Ignite Service Mesh
Ignite Service Mesh offers elegant distributed RPC. Defining a service is straightforward:
public interface MyCounterService {
int get() throws CacheException;
}Implementation:
public class MyCounterServiceImpl implements Service, MyCounterService {
@Override public int get() {
return 0;
}
}Deployment:
ClusterGroup cacheGrp = ignite.cluster().forCache("myCounterService");
IgniteServices svcs = ignite.services(cacheGrp);
svcs.deployNodeSingleton("myCounterService", new MyCounterServiceImpl());Invocation:
MyCounterService cntrSvc = ignite.services()
.serviceProxy("myCounterService", MyCounterService.class, false);
System.out.println("value : " + cntrSvc.get());Ignite Compute Grid
IgniteCompute enables distributed tasks using a MapReduce‑like model. The solution uses a ComputeTask to execute jobs on cluster nodes:
IgniteCompute compute = ignite.compute();
int cnt = compute.execute(CharacterCountTask.class, "Hello Grid Enabled World!");
System.out.println(">>> Total number of characters in the phrase is '" + cnt + "'.");
private static class CharacterCountTask extends ComputeTaskSplitAdapter<String, Integer> {
@Override public List<ClusterNode> split(int gridSize, String arg) {
String[] words = arg.split(" ");
List<ComputeJob> jobs = new ArrayList<>(words.length);
for (final String word : arg.split(" ")) {
jobs.add(new ComputeJobAdapter() {
@Override public Object execute() {
System.out.println(">>> Printing '" + word + "' from compute job.");
return word.length();
}
});
}
return jobs;
}
@Override public Integer reduce(List<ComputeJobResult> results) {
int sum = 0;
for (ComputeJobResult res : results)
sum += res.<Integer>getData();
return sum;
}
}Custom Log4j2 LuceneAppender
The custom appender makes logging transparent, highly flexible, and performant. Configuration is done in log4j2.xml:
<Lucene name="luceneAppender" ignoreExceptions="true" target="target/lucene/index" expiryTime="1296000">
<IndexField name="logId" pattern="$${ctx:logId}" />
<IndexField name="time" pattern="%d{UNIX_MILLIS}" type="LongField"/>
<IndexField name="level" pattern="%-5level" />
<IndexField name="content" pattern="%class{36} %L %M - %msg%xEx%n" />
</Lucene>The target attribute specifies the index location, expiryTime defines index TTL, and each IndexField maps a log attribute to an index field.
Lucene Analyzer
For log content, a KeywordAnalyzer is chosen because it preserves the original token, respects case sensitivity, and meets the 256‑character limit for keywords. This fits the requirement of exact matching for structured fields and fuzzy matching for free‑text log messages.
Advantages and Disadvantages
Advantages
Low resource consumption: logging behaves like standard logging; only Lucene index files use disk space, which can be expired.
Simple deployment: only a proxy module and optional query UI are needed; no extra servers.
Strong flexibility: logging and query behavior are fully configurable via log4j2.xml and UI settings.
Easy learning curve: developers only need to master Ignite and Lucene.
Disadvantages
Requires development effort: the solution is not plug‑and‑play; a custom UI and integration are needed.
Depends on Ignite cluster deployment: applications must run within an Ignite cluster and may need careful grouping to avoid interference.
Other Related Solutions
Alternatives such as the Elastic Stack or Flume provide richer features but are heavier, require more resources, and involve higher development and operational costs.
Applicable Scenarios
The approach suits enterprise‑grade, cluster‑deployed software where low deployment cost and full control are desired. It scales to many applications and large clusters, making it a viable alternative to heavyweight ELK customizations for modest logging needs.
Conclusion
The proposed stack offers a concise, code‑light solution to the long‑standing challenge of distributed log querying, while showcasing the broader potential of embedded in‑memory compute platforms like Ignite for various data‑collection scenarios.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
ITFLY8 Architecture Home
ITFLY8 Architecture Home - focused on architecture knowledge sharing and exchange, covering project management and product design. Includes large-scale distributed website architecture (high performance, high availability, caching, message queues...), design patterns, architecture patterns, big data, project management (SCRUM, PMP, Prince2), product design, and more.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
