Databases 38 min read

Why We Switched from SQL Server to Elasticsearch: A Real‑World Scaling Journey

This article shares a practical experience of replacing a massive SQL Server data store with Elasticsearch, covering the motivations, architectural design, performance advantages, indexing mechanisms, data synchronization strategies, implementation details, and operational considerations for handling tens of billions of records.

Java Interview Crash Guide
Java Interview Crash Guide
Java Interview Crash Guide
Why We Switched from SQL Server to Elasticsearch: A Real‑World Scaling Journey

Preface

I was once interviewed for a technical position at Anta, and the interviewer asked: "If you want to use a new technology but the leader refuses, what would you do?" Even if you haven't faced this question in an interview, you will encounter similar collaboration challenges at work.

From my perspective, I answer with four points:

Justified Reasoning : In software engineering, technology selection must be based on concrete problem scenarios; otherwise it is reckless.

Trust : Leaders may reject new technology for many reasons—lack of understanding, conservatism, or not being technical. Building trust is essential before proposing a solution.

Do Your Part : Gather comprehensive data, present it clearly, and provide quantitative evidence to strengthen the proposal.

People Are Complex : No one can stay completely objective; therefore, learn to manage upwards and maintain good collaboration.

Background

Our company has used SQL Server as the primary storage for many years, now reaching tens of millions of rows. Some non‑core business tables have been compressed to keep the size stable, but compression reduces scalability.

Examples:

We store a user's last accessed work record and total reading duration, but not detailed reading logs. Adding a lottery feature that requires per‑chapter reading time would be impossible with the current schema.

Operations need to analyze user preferences and popular works, which the current design cannot satisfy.

Search queries like LIKE '%keyword%' trigger full table scans. With 420,000 rows a full scan is still acceptable on a powerful server, but under high concurrency or when crawlers hit the site, resource consumption skyrockets.

Relational databases provide ACID guarantees but lack horizontal scalability. In massive data scenarios, a single instance cannot solve the problem.

NoSQL fills this gap. Martin Fowler’s "The Essentials of NoSQL" classifies NoSQL into document, graph, column‑family, and key‑value stores. Search engines also meet the four NoSQL characteristics: easy scaling, high performance on large data, flexible data models, and high availability.

We evaluated MongoDB and Elasticsearch; Elasticsearch was chosen for three reasons:

Our operations team is more familiar with Elasticsearch.

Elastic Stack’s reporting tools meet our statistical reporting needs.

Our workload is read‑heavy and near‑real‑time, which Elasticsearch handles well.

Elasticsearch Pros and Cons

Pros

Description

Horizontal scalability

Can run as a single node or a cluster; shards are automatically balanced.

Speed

Indexes are sharded; multiple servers process queries in parallel.

Diverse search

Supports full‑text search, synonym handling, relevance ranking, complex analytics, and near‑real‑time processing.

High availability

Replica mechanism ensures the cluster works even if a node fails.

Out‑of‑the‑box

Easy deployment with RESTful API and SDKs for many languages.

The main drawback is high memory consumption.

Why Elasticsearch Is Fast

Memory reads : Elasticsearch relies on the operating system’s filesystem cache.

Multiple indexes : Uses inverted index and doc values.

Cluster sharding : Distributes load across nodes.

Memory Reads

Elasticsearch (built on Lucene) caches data structures in the OS filesystem cache. Queries benefit from this cache, but sufficient RAM is required to hold the cached data.

Example: three 64 GB servers (total 192 GB). If each node allocates 32 GB JVM heap, the remaining 32 GB per node is available for filesystem cache, totaling 96 GB. With 1 TB of data, only about 10 % can be cached, the rest must hit disk.

Performance comparison (highlighted): memory read is ~200× faster than mechanical disk and ~30× faster than SSD.

Inverted Index

Lucene stores all indexed information as an inverted index, mapping terms to the documents that contain them. This structure enables fast full‑text search.

Example documents:

Elasticsearch Server (doc 1)

Masterring Elasticsearch (doc 2)

Apache Solr 4 Cookbook (doc 3)

After indexing, the term‑to‑document mapping allows rapid retrieval.

Doc Values

For aggregations, sorting, and grouping, a column‑oriented “doc values” structure is more efficient than the inverted index. Doc values are stored per field and support fast numeric and keyword operations.

Cluster Sharding

Elasticsearch can form a cluster of multiple nodes, distributing shards to balance execution pressure. Queries are executed in parallel across primary shards.

Query Process

Elasticsearch queries consist of two phases: the “scatter” phase (parallel sub‑queries sent to each shard) and the “gather” phase (collect, merge, sort, and return results).

Pagination Depth Trap

Deep pagination multiplies the amount of data read: pageSize × pageIndex × shardCount. Elasticsearch limits total returned hits to 10,000. Use search_after or scroll for deep navigation.

Elasticsearch vs. RDBMS

Elasticsearch

RDBMS

Index

Table

Document

Row

Field

Column

Mapping

Table schema

Before version 7.0, Elasticsearch had a type concept (type = table). After 7.0, type defaults to _doc and will be removed in 8.0.

Server Selection

Official docs recommend the JVM heap not exceed 32 GB and no more than half of the physical memory. For 64 GB servers we set a 32 GB heap; the remaining memory is used for the filesystem cache.

Scenario Selection

Write scenario : User reading logs generate massive data (70 GB per month, >1.1 billion rows).

Read scenario : Future lottery feature needs to query reading duration and chapter count.

Search scenario : Existing LIKE queries cause performance issues; we need full‑text and phonetic search.

Design Solution

Common Design

We wrap Elasticsearch in a .NET 5 WebAPI to hide technical details (time zones, analyzers, type conversion) and expose coarse‑grained read/write APIs. This follows the "application‑database" pattern described by Martin Fowler.

Reading‑Record Details

We use RabbitMQ to decouple writes: the platform API publishes messages, and an asynchronous consumer writes them to Elasticsearch. The consumer creates monthly indices (e.g., userviewrecord-2021-12) for easier management.

Reading record flow
Reading record flow

Search Implementation

We adopt CQRS: the existing relational data is synchronized to Elasticsearch, and a dedicated search API replaces the original platform API.

Data sync can be push‑based (CDC, Debezium, Kafka) or pull‑based (scheduled jobs). Push offers real‑time updates but adds middleware complexity; pull is simpler but may cause data gaps. We chose pull with Quartz.NET, performing full sync every four hours (≈2 minutes for 420 k rows).

Entity Definitions

All entities inherit from an abstract ElasticsearchEntity that provides Id and a microsecond‑precision Timestamp to avoid missing records during search_after pagination.

public abstract class ElasticsearchEntity
{
    private Guid? _id;
    public Guid Id { get => _id ??= Guid.NewGuid(); set => _id = value; }
    private long? _timestamp;
    [Number(Name = "timestamp")]
    public long Timestamp { get => _timestamp ??= DateTime.Now.DateTimeToTimestampOfMicrosecond(); set => _timestamp = value; }
}

Reading‑record entity:

[ElasticsearchType(RelationName = "user_view_duration")]
public class UserViewDuration : ElasticsearchEntity
{
    [Number(Name = "entity_id")]
    public long EntityId { get; set; }
    [Number(Name = "entity_type")]
    public long EntityType { get; set; }
    [Number(Name = "charpter_id")]
    public long CharpterId { get; set; }
    [Number(Name = "user_id")]
    public long UserId { get; set; }
    [Date(Name = "create_datetime")]
    public DateTimeOffset CreateDateTime { get; set; }
    [Number(Name = "duration")]
    public long Duration { get; set; }
    [Ip(Name = "Ip")]
    public string Ip { get; set; }
}

Search‑key entity (document‑type for works):

[ElasticsearchType(RelationName = "search_key")]
public class SearchKey : ElasticsearchEntity
{
    [Number(Name = "key_id")]
    public int KeyId { get; set; }
    [Number(Name = "entity_id")]
    public int EntityId { get; set; }
    [Number(Name = "entity_type")]
    public int EntityType { get; set; }
    [Text(Name = "key_name")]
    public string KeyName { get; set; }
    [Number(Name = "weight")]
    public int Weight { get; set; }
    [Boolean(Name = "is_subsidiary")]
    public bool IsSubsidiary { get; set; }
    [Date(Name = "active_date")]
    public DateTimeOffset? ActiveDate { get; set; }
    [Number(Name = "sys_tag_id")]
    public List<int> SysTagId { get; set; }
}

Asynchronous Writing

The consumer receives UserViewDurationMessage from RabbitMQ, maps it to the entity, and writes to the monthly index:

public class UserViewDurationConsumer : BaseConsumer<UserViewDurationMessage>
{
    private readonly ElasticClient _elasticClient;
    public UserViewDurationConsumer(ElasticClient elasticClient) => _elasticClient = elasticClient;
    public override void Excute(UserViewDurationMessage msg)
    {
        var document = msg.MapTo<UserViewDuration>();
        var result = _elasticClient.Create(document, a => a.Index($"{typeof(UserViewDuration).GetRelationName()}-{msg.CreateDateTime:yyyy-MM}")).GetApiResult();
        if (result.Failed) LoggerHelper.WriteToFile(result.Message);
    }
}

Search API

Reading‑record query supports filtering by user, entity, chapter, and time range, using search_after for pagination and sorting by timestamp descending.

[HttpGet]
[Route("record")]
public ApiResult<List<UserMarkRecordGetRecordResponse>> GetRecord([FromQuery] UserViewDurationRecordGetRequest request)
{
    var must = new List<Func<QueryContainerDescriptor<UserViewDuration>, QueryContainer>>();
    if (request.UserId.HasValue) must.Add(a => a.Term(t => t.Field(f => f.UserId).Value(request.UserId.Value)));
    // ... other filters omitted for brevity ...
    var result = _elasticClient.Search<UserViewDuration>(s => s
        .Index($"{typeof(UserViewDuration).GetRelationName()}-{dateTime}")
        .Size(request.Size)
        .Query(q => q.Bool(b => b.Must(must)))
        .SearchAfterTimestamp(request.Timestamp)
        .Sort(srt => srt.Field(f => f.Timestamp, SortOrder.Descending)));
    var apiResult = result.GetApiResult<UserViewDuration, List<UserMarkRecordGetRecordResponse>>();
    return apiResult.Success ? ApiResult<List<UserMarkRecordGetRecordResponse>>.IsSuccess(apiResult.Data) : ApiResult<List<UserMarkRecordGetRecordResponse>>.IsFailed();
}

Search‑Key Sync Job

Quartz.NET job reads batches of SearchKey rows from SQL Server, enriches them with tag data, and bulk‑writes to a new index. After completion, an alias is switched atomically to the new index and the old index is deleted.

public class SearchKeySynchronousJob : BaseJob
{
    public override void Execute()
    {
        var maxId = 0;
        var size = 1000;
        string indexName = "";
        while (true)
        {
            var batch = sm.searchKey.GetList(size, maxId);
            if (!batch.Any()) break;
            var entityIds = batch.Select(a => a.EntityID).Distinct().ToList();
            var sysTagRecord = rm.Novel.GetSysTagRecord(entityIds);
            var items = batch.Select(a => new SearchKeyPostItem
            {
                Weight = a.Weight,
                EntityType = a.EntityType,
                EntityId = a.EntityID,
                IsSubsidiary = a.IsSubsidiary ?? false,
                KeyName = a.KeyName,
                ActiveDate = a.ActiveDate,
                SysTagId = sysTagRecord.Where(c => c.EntityID == a.EntityID).Select(c => c.SysTagID).ToList(),
                KeyID = a.KeyID
            }).ToList();
            var postResult = new SearchKeyPostRequest { IndexName = indexName, Items = items }.Excute();
            if (postResult.Success)
            {
                indexName = (string)postResult.Data;
                maxId = batch.Max(a => a.KeyID);
            }
        }
        var renameResult = new SearchKeyRenameRequest { IndexName = indexName }.Excute();
    }
}

Search‑Key API

Batch insert creates the index (if not provided) with a custom mapping that defines a pinyin analyzer alongside the standard analyzer for key_name. The index name includes a timestamp for uniqueness.

[HttpPost]
public ApiResult Post(SearchKeyPostRequest request)
{
    if (!request.Items.Any()) return ApiResult.IsFailed("No data");
    var indexName = string.IsNullOrWhiteSpace(request.IndexName)
        ? $"{typeof(SearchKey).GetRelationName()}-{DateTime.Now:yyyyMMddHHmmss}" : request.IndexName;
    if (string.IsNullOrWhiteSpace(request.IndexName))
    {
        var createResult = _elasticClient.Indices.Create(indexName, a => a
            .Map<SearchKey>(m => m.AutoMap().Properties(p => p
                .Custom(new TextProperty
                {
                    Name = "key_name",
                    Analyzer = "standard",
                    Fields = new Properties(new Dictionary<PropertyName, IProperty>
                    {
                        { "pinyin", new TextProperty { Analyzer = "pinyin" } },
                        { "standard", new TextProperty { Analyzer = "standard" } }
                    })
                }))));
        if (!createResult.IsValid) return ApiResult.IsFailed("Index creation failed");
    }
    var docs = request.Items.MapTo<List<SearchKey>>();
    var result = _elasticClient.BulkAll(indexName, docs);
    return result ? ApiResult.IsSuccess(indexName) : ApiResult.IsFailed();
}

The alias‑switch endpoint adds the new index to the searchkey alias, removes the old index from the alias, and finally deletes the old index.

[HttpPut]
public ApiResult Rename(SearchKeyRanameRequest request)
{
    var aliasName = typeof(SearchKey).GetRelationName();
    var getAliasResult = _elasticClient.Indices.GetAlias(aliasName);
    var bulkAlias = new BulkAliasRequest
    {
        Actions = new List<IAliasAction>
        {
            new AliasAddDescriptor().Index(request.IndexName).Alias(aliasName)
        }
    };
    if (getAliasResult.IsValid)
    {
        foreach (var idx in getAliasResult.Indices.Keys)
            bulkAlias.Actions.Add(new AliasRemoveDescriptor().Index(idx.Name).Alias(aliasName));
    }
    var result = _elasticClient.Indices.BulkAlias(bulkAlias);
    if (getAliasResult.IsValid)
    {
        foreach (var idx in getAliasResult.Indices.Keys)
            _elasticClient.Indices.Delete(idx);
    }
    return result?.ApiCall.Success == true ? ApiResult.IsSuccess() : ApiResult.IsFailed();
}

Search Query

The search endpoint combines should (OR) and must (AND) clauses. Setting minimumShouldMatch = 1 ensures that at least one should condition matches, avoiding unwanted results.

[HttpPost]
[Route("search")]
public ApiResult<List<SearchKeyGetResponse>> Get(SearchKeyGetRequest request)
{
    var should = new List<Func<QueryContainerDescriptor<SearchKey>, QueryContainer>>();
    int minimumShouldMatch = 0;
    if (!string.IsNullOrWhiteSpace(request.KeyName))
    {
        should.Add(a => a.MatchPhrase(m => m.Field("key_name.pinyin").Query(request.KeyName)));
        should.Add(a => a.MatchPhrase(m => m.Field("key_name.standard").Query(request.KeyName)));
        minimumShouldMatch = 1;
    }
    var must = new List<Func<QueryContainerDescriptor<SearchKey>, QueryContainer>>
    {
        a => a.Range(r => r.Field(f => f.Weight).GreaterThanOrEquals(0))
    };
    // additional filters omitted for brevity
    var sort = request.Sort == ESearchKey.Sort.Weight
        ? new SortDescriptor<SearchKey>().Field(f => f.Weight, SortOrder.Descending)
        : new SortDescriptor<SearchKey>().Field(f => f.ActiveDate, SortOrder.Descending);
    var result = _elasticClient.Search<SearchKey>(s => s
        .Index(typeof(SearchKey).GetRelationName())
        .From(request.Size * request.Page)
        .Size(request.Size)
        .Query(q => q.Bool(b => b.Should(should).Must(must).MinimumShouldMatch(minimumShouldMatch)))
        .Sort(srt => sort));
    var apiResult = result.GetApiResult<SearchKey, List<SearchKeyGetResponse>>();
    return apiResult.Success ? apiResult : ApiResult<List<SearchKeyGetResponse>>.IsSuccess("Empty result");
}

APM Monitoring

We use Elastic APM + Kibana (v7.4) for observability, similar to the SkyWalking setup we used previously.

Elastic APM dashboard
Elastic APM dashboard

Conclusion

The migration to Elasticsearch was smooth and lossless, and the system now handles massive read/write workloads with high performance and scalability.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

search engineElasticsearchNoSQLdatabase scaling
Java Interview Crash Guide
Written by

Java Interview Crash Guide

Dedicated to sharing Java interview Q&A; follow and reply "java" to receive a free premium Java interview guide.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.