Backend Development 12 min read

Optimizing Apache Solr Schema and Configuration for Performance

This article explains how to tune Apache Solr by configuring caches, SolrCloud, commit strategies, dynamic fields, indexed versus stored fields, copyField, filter queries, and faceting to achieve maximum search performance for production workloads.

Architects Research Society

May 20, 2022

Optimizing Apache Solr Schema and Configuration for Performance

1. Configure Caches

Solr caches are tied to a specific IndexSearcher instance; configuring them is crucial for performance.

Configure filterCache

The filterCache is used by SolrIndexSearcher for filter queries, allowing pre‑warming from the previous searcher. Example configuration:

<filterCache class="solr.FastLRUCache" size="512" initialSize="512" autowarmCount="0" />

class: SolrCache implementation (LRUCache or FastLRUCache) size: maximum entries initialSize: initial capacity autowarmCount: entries to pre‑warm from old cache

Configure queryResultCache and documentCache

queryResultCache stores ordered DocList of document IDs for previous searches; documentCache stores Lucene Document objects. They are useful for read‑heavy use cases such as a blog with many reads.

<queryResultCache class="solr.LRUCache" size="512" initialSize="512" autowarmCount="0" />
<documentCache class="solr.LRUCache" size="512" initialSize="512" autowarmCount="0" />

For write‑heavy workloads, disable these caches during soft commits.

2. Configure SolrCloud

SolrCloud enables fault‑tolerant, highly available clusters with master‑slave replication. In the master solrconfig.xml, include the following:

<str name="confFiles">solrconfig_slave.xml:solrconfig.xml,x.xml,y.xml</str>

Refer to Solr documentation for details.

3. Configure Commits

Commit controls when indexed data becomes searchable. Hard commit (commit=true) writes all Lucene files to stable storage; soft commit (softCommit=true) makes changes visible quickly without durable storage, enabling near‑real‑time search.

AutoCommit

autoCommit can be set to trigger based on maxDocs or maxTime. Example:

<autoCommit>
  <maxDocs>20000</maxDocs>
  <maxTime>50000</maxTime>
  <openSearcher>false</openSearcher>
</autoCommit>

Disabling autoCommit during bulk imports can improve migration speed.

4. Configure Dynamic Fields

Dynamic fields allow wildcard field names, reducing the need to define every field explicitly. Example:

<dynamicField name="*.fieldname" type="boolean" multiValued="true" stored="true" />

Use them judiciously, as each unique field name consumes Lucene memory.

5. Configure Indexed vs Stored Fields

Set indexed="true" for fields that need to be searchable or faceted; set indexed="false" for fields only needed for retrieval. Example:

<field name="foo" type="int" stored="true" indexed="false"/>

This can reduce re‑indexing time.

6. Configure copyField

copyField copies values from source fields to a destination field, often used to create a catch‑all search field. Example:

<copyField source="*_abcd" dest="wxyz"/>

7. Use Filter Queries (fq)

fq limits the result set without affecting scoring and is cached independently. Example curl request:

POST
{
 "form_params": {
   "fq": "id=1234",
   "fl": "abc cde",
   "wt": "json"
 },
 "query": {
   "q": "*:*"
 }
}

8. Use Faceting

Faceting groups results by field values for aggregation. Example request:

{
 "form_params": {
   "fq": "fieldName:value",
   "fl": "fieldName",
   "facet": "true",
   "facet.mincount": 1,
   "facet.limit": -1,
   "facet.field": "fieldName",
   "wt": "json"
 },
 "query": {
   "q": "*:*"
 }
}

Conclusion

When deploying Solr to production, tuning caches, commit settings, schema definitions, and using filter queries and faceting are essential for optimal performance, depending on the specific application workload.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Performance Indexing Caching schema Search SolrCloud

Written by

Architects Research Society

A daily treasure trove for architects, expanding your view and depth. We share enterprise, business, application, data, technology, and security architecture, discuss frameworks, planning, governance, standards, and implementation, and explore emerging styles such as microservices, event‑driven, micro‑frontend, big data, data warehousing, IoT, and AI architecture.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.