Why We Switched from SQL Server to Elasticsearch: A Real‑World Scaling Journey
This article shares a practical experience of replacing a massive SQL Server data store with Elasticsearch, covering the motivations, architectural design, performance advantages, indexing mechanisms, data synchronization strategies, implementation details, and operational considerations for handling tens of billions of records.
Preface
I was once interviewed for a technical position at Anta, and the interviewer asked: "If you want to use a new technology but the leader refuses, what would you do?" Even if you haven't faced this question in an interview, you will encounter similar collaboration challenges at work.
From my perspective, I answer with four points:
Justified Reasoning : In software engineering, technology selection must be based on concrete problem scenarios; otherwise it is reckless.
Trust : Leaders may reject new technology for many reasons—lack of understanding, conservatism, or not being technical. Building trust is essential before proposing a solution.
Do Your Part : Gather comprehensive data, present it clearly, and provide quantitative evidence to strengthen the proposal.
People Are Complex : No one can stay completely objective; therefore, learn to manage upwards and maintain good collaboration.
Background
Our company has used SQL Server as the primary storage for many years, now reaching tens of millions of rows. Some non‑core business tables have been compressed to keep the size stable, but compression reduces scalability.
Examples:
We store a user's last accessed work record and total reading duration, but not detailed reading logs. Adding a lottery feature that requires per‑chapter reading time would be impossible with the current schema.
Operations need to analyze user preferences and popular works, which the current design cannot satisfy.
Search queries like LIKE '%keyword%' trigger full table scans. With 420,000 rows a full scan is still acceptable on a powerful server, but under high concurrency or when crawlers hit the site, resource consumption skyrockets.
Relational databases provide ACID guarantees but lack horizontal scalability. In massive data scenarios, a single instance cannot solve the problem.
NoSQL fills this gap. Martin Fowler’s "The Essentials of NoSQL" classifies NoSQL into document, graph, column‑family, and key‑value stores. Search engines also meet the four NoSQL characteristics: easy scaling, high performance on large data, flexible data models, and high availability.
We evaluated MongoDB and Elasticsearch; Elasticsearch was chosen for three reasons:
Our operations team is more familiar with Elasticsearch.
Elastic Stack’s reporting tools meet our statistical reporting needs.
Our workload is read‑heavy and near‑real‑time, which Elasticsearch handles well.
Elasticsearch Pros and Cons
Pros
Description
Horizontal scalability
Can run as a single node or a cluster; shards are automatically balanced.
Speed
Indexes are sharded; multiple servers process queries in parallel.
Diverse search
Supports full‑text search, synonym handling, relevance ranking, complex analytics, and near‑real‑time processing.
High availability
Replica mechanism ensures the cluster works even if a node fails.
Out‑of‑the‑box
Easy deployment with RESTful API and SDKs for many languages.
The main drawback is high memory consumption.
Why Elasticsearch Is Fast
Memory reads : Elasticsearch relies on the operating system’s filesystem cache.
Multiple indexes : Uses inverted index and doc values.
Cluster sharding : Distributes load across nodes.
Memory Reads
Elasticsearch (built on Lucene) caches data structures in the OS filesystem cache. Queries benefit from this cache, but sufficient RAM is required to hold the cached data.
Example: three 64 GB servers (total 192 GB). If each node allocates 32 GB JVM heap, the remaining 32 GB per node is available for filesystem cache, totaling 96 GB. With 1 TB of data, only about 10 % can be cached, the rest must hit disk.
Performance comparison (highlighted): memory read is ~200× faster than mechanical disk and ~30× faster than SSD.
Inverted Index
Lucene stores all indexed information as an inverted index, mapping terms to the documents that contain them. This structure enables fast full‑text search.
Example documents:
Elasticsearch Server (doc 1)
Masterring Elasticsearch (doc 2)
Apache Solr 4 Cookbook (doc 3)
After indexing, the term‑to‑document mapping allows rapid retrieval.
Doc Values
For aggregations, sorting, and grouping, a column‑oriented “doc values” structure is more efficient than the inverted index. Doc values are stored per field and support fast numeric and keyword operations.
Cluster Sharding
Elasticsearch can form a cluster of multiple nodes, distributing shards to balance execution pressure. Queries are executed in parallel across primary shards.
Query Process
Elasticsearch queries consist of two phases: the “scatter” phase (parallel sub‑queries sent to each shard) and the “gather” phase (collect, merge, sort, and return results).
Pagination Depth Trap
Deep pagination multiplies the amount of data read: pageSize × pageIndex × shardCount. Elasticsearch limits total returned hits to 10,000. Use search_after or scroll for deep navigation.
Elasticsearch vs. RDBMS
Elasticsearch
RDBMS
Index
Table
Document
Row
Field
Column
Mapping
Table schema
Before version 7.0, Elasticsearch had a type concept (type = table). After 7.0, type defaults to _doc and will be removed in 8.0.
Server Selection
Official docs recommend the JVM heap not exceed 32 GB and no more than half of the physical memory. For 64 GB servers we set a 32 GB heap; the remaining memory is used for the filesystem cache.
Scenario Selection
Write scenario : User reading logs generate massive data (70 GB per month, >1.1 billion rows).
Read scenario : Future lottery feature needs to query reading duration and chapter count.
Search scenario : Existing LIKE queries cause performance issues; we need full‑text and phonetic search.
Design Solution
Common Design
We wrap Elasticsearch in a .NET 5 WebAPI to hide technical details (time zones, analyzers, type conversion) and expose coarse‑grained read/write APIs. This follows the "application‑database" pattern described by Martin Fowler.
Reading‑Record Details
We use RabbitMQ to decouple writes: the platform API publishes messages, and an asynchronous consumer writes them to Elasticsearch. The consumer creates monthly indices (e.g., userviewrecord-2021-12) for easier management.
Search Implementation
We adopt CQRS: the existing relational data is synchronized to Elasticsearch, and a dedicated search API replaces the original platform API.
Data sync can be push‑based (CDC, Debezium, Kafka) or pull‑based (scheduled jobs). Push offers real‑time updates but adds middleware complexity; pull is simpler but may cause data gaps. We chose pull with Quartz.NET, performing full sync every four hours (≈2 minutes for 420 k rows).
Entity Definitions
All entities inherit from an abstract ElasticsearchEntity that provides Id and a microsecond‑precision Timestamp to avoid missing records during search_after pagination.
public abstract class ElasticsearchEntity
{
private Guid? _id;
public Guid Id { get => _id ??= Guid.NewGuid(); set => _id = value; }
private long? _timestamp;
[Number(Name = "timestamp")]
public long Timestamp { get => _timestamp ??= DateTime.Now.DateTimeToTimestampOfMicrosecond(); set => _timestamp = value; }
}Reading‑record entity:
[ElasticsearchType(RelationName = "user_view_duration")]
public class UserViewDuration : ElasticsearchEntity
{
[Number(Name = "entity_id")]
public long EntityId { get; set; }
[Number(Name = "entity_type")]
public long EntityType { get; set; }
[Number(Name = "charpter_id")]
public long CharpterId { get; set; }
[Number(Name = "user_id")]
public long UserId { get; set; }
[Date(Name = "create_datetime")]
public DateTimeOffset CreateDateTime { get; set; }
[Number(Name = "duration")]
public long Duration { get; set; }
[Ip(Name = "Ip")]
public string Ip { get; set; }
}Search‑key entity (document‑type for works):
[ElasticsearchType(RelationName = "search_key")]
public class SearchKey : ElasticsearchEntity
{
[Number(Name = "key_id")]
public int KeyId { get; set; }
[Number(Name = "entity_id")]
public int EntityId { get; set; }
[Number(Name = "entity_type")]
public int EntityType { get; set; }
[Text(Name = "key_name")]
public string KeyName { get; set; }
[Number(Name = "weight")]
public int Weight { get; set; }
[Boolean(Name = "is_subsidiary")]
public bool IsSubsidiary { get; set; }
[Date(Name = "active_date")]
public DateTimeOffset? ActiveDate { get; set; }
[Number(Name = "sys_tag_id")]
public List<int> SysTagId { get; set; }
}Asynchronous Writing
The consumer receives UserViewDurationMessage from RabbitMQ, maps it to the entity, and writes to the monthly index:
public class UserViewDurationConsumer : BaseConsumer<UserViewDurationMessage>
{
private readonly ElasticClient _elasticClient;
public UserViewDurationConsumer(ElasticClient elasticClient) => _elasticClient = elasticClient;
public override void Excute(UserViewDurationMessage msg)
{
var document = msg.MapTo<UserViewDuration>();
var result = _elasticClient.Create(document, a => a.Index($"{typeof(UserViewDuration).GetRelationName()}-{msg.CreateDateTime:yyyy-MM}")).GetApiResult();
if (result.Failed) LoggerHelper.WriteToFile(result.Message);
}
}Search API
Reading‑record query supports filtering by user, entity, chapter, and time range, using search_after for pagination and sorting by timestamp descending.
[HttpGet]
[Route("record")]
public ApiResult<List<UserMarkRecordGetRecordResponse>> GetRecord([FromQuery] UserViewDurationRecordGetRequest request)
{
var must = new List<Func<QueryContainerDescriptor<UserViewDuration>, QueryContainer>>();
if (request.UserId.HasValue) must.Add(a => a.Term(t => t.Field(f => f.UserId).Value(request.UserId.Value)));
// ... other filters omitted for brevity ...
var result = _elasticClient.Search<UserViewDuration>(s => s
.Index($"{typeof(UserViewDuration).GetRelationName()}-{dateTime}")
.Size(request.Size)
.Query(q => q.Bool(b => b.Must(must)))
.SearchAfterTimestamp(request.Timestamp)
.Sort(srt => srt.Field(f => f.Timestamp, SortOrder.Descending)));
var apiResult = result.GetApiResult<UserViewDuration, List<UserMarkRecordGetRecordResponse>>();
return apiResult.Success ? ApiResult<List<UserMarkRecordGetRecordResponse>>.IsSuccess(apiResult.Data) : ApiResult<List<UserMarkRecordGetRecordResponse>>.IsFailed();
}Search‑Key Sync Job
Quartz.NET job reads batches of SearchKey rows from SQL Server, enriches them with tag data, and bulk‑writes to a new index. After completion, an alias is switched atomically to the new index and the old index is deleted.
public class SearchKeySynchronousJob : BaseJob
{
public override void Execute()
{
var maxId = 0;
var size = 1000;
string indexName = "";
while (true)
{
var batch = sm.searchKey.GetList(size, maxId);
if (!batch.Any()) break;
var entityIds = batch.Select(a => a.EntityID).Distinct().ToList();
var sysTagRecord = rm.Novel.GetSysTagRecord(entityIds);
var items = batch.Select(a => new SearchKeyPostItem
{
Weight = a.Weight,
EntityType = a.EntityType,
EntityId = a.EntityID,
IsSubsidiary = a.IsSubsidiary ?? false,
KeyName = a.KeyName,
ActiveDate = a.ActiveDate,
SysTagId = sysTagRecord.Where(c => c.EntityID == a.EntityID).Select(c => c.SysTagID).ToList(),
KeyID = a.KeyID
}).ToList();
var postResult = new SearchKeyPostRequest { IndexName = indexName, Items = items }.Excute();
if (postResult.Success)
{
indexName = (string)postResult.Data;
maxId = batch.Max(a => a.KeyID);
}
}
var renameResult = new SearchKeyRenameRequest { IndexName = indexName }.Excute();
}
}Search‑Key API
Batch insert creates the index (if not provided) with a custom mapping that defines a pinyin analyzer alongside the standard analyzer for key_name. The index name includes a timestamp for uniqueness.
[HttpPost]
public ApiResult Post(SearchKeyPostRequest request)
{
if (!request.Items.Any()) return ApiResult.IsFailed("No data");
var indexName = string.IsNullOrWhiteSpace(request.IndexName)
? $"{typeof(SearchKey).GetRelationName()}-{DateTime.Now:yyyyMMddHHmmss}" : request.IndexName;
if (string.IsNullOrWhiteSpace(request.IndexName))
{
var createResult = _elasticClient.Indices.Create(indexName, a => a
.Map<SearchKey>(m => m.AutoMap().Properties(p => p
.Custom(new TextProperty
{
Name = "key_name",
Analyzer = "standard",
Fields = new Properties(new Dictionary<PropertyName, IProperty>
{
{ "pinyin", new TextProperty { Analyzer = "pinyin" } },
{ "standard", new TextProperty { Analyzer = "standard" } }
})
}))));
if (!createResult.IsValid) return ApiResult.IsFailed("Index creation failed");
}
var docs = request.Items.MapTo<List<SearchKey>>();
var result = _elasticClient.BulkAll(indexName, docs);
return result ? ApiResult.IsSuccess(indexName) : ApiResult.IsFailed();
}The alias‑switch endpoint adds the new index to the searchkey alias, removes the old index from the alias, and finally deletes the old index.
[HttpPut]
public ApiResult Rename(SearchKeyRanameRequest request)
{
var aliasName = typeof(SearchKey).GetRelationName();
var getAliasResult = _elasticClient.Indices.GetAlias(aliasName);
var bulkAlias = new BulkAliasRequest
{
Actions = new List<IAliasAction>
{
new AliasAddDescriptor().Index(request.IndexName).Alias(aliasName)
}
};
if (getAliasResult.IsValid)
{
foreach (var idx in getAliasResult.Indices.Keys)
bulkAlias.Actions.Add(new AliasRemoveDescriptor().Index(idx.Name).Alias(aliasName));
}
var result = _elasticClient.Indices.BulkAlias(bulkAlias);
if (getAliasResult.IsValid)
{
foreach (var idx in getAliasResult.Indices.Keys)
_elasticClient.Indices.Delete(idx);
}
return result?.ApiCall.Success == true ? ApiResult.IsSuccess() : ApiResult.IsFailed();
}Search Query
The search endpoint combines should (OR) and must (AND) clauses. Setting minimumShouldMatch = 1 ensures that at least one should condition matches, avoiding unwanted results.
[HttpPost]
[Route("search")]
public ApiResult<List<SearchKeyGetResponse>> Get(SearchKeyGetRequest request)
{
var should = new List<Func<QueryContainerDescriptor<SearchKey>, QueryContainer>>();
int minimumShouldMatch = 0;
if (!string.IsNullOrWhiteSpace(request.KeyName))
{
should.Add(a => a.MatchPhrase(m => m.Field("key_name.pinyin").Query(request.KeyName)));
should.Add(a => a.MatchPhrase(m => m.Field("key_name.standard").Query(request.KeyName)));
minimumShouldMatch = 1;
}
var must = new List<Func<QueryContainerDescriptor<SearchKey>, QueryContainer>>
{
a => a.Range(r => r.Field(f => f.Weight).GreaterThanOrEquals(0))
};
// additional filters omitted for brevity
var sort = request.Sort == ESearchKey.Sort.Weight
? new SortDescriptor<SearchKey>().Field(f => f.Weight, SortOrder.Descending)
: new SortDescriptor<SearchKey>().Field(f => f.ActiveDate, SortOrder.Descending);
var result = _elasticClient.Search<SearchKey>(s => s
.Index(typeof(SearchKey).GetRelationName())
.From(request.Size * request.Page)
.Size(request.Size)
.Query(q => q.Bool(b => b.Should(should).Must(must).MinimumShouldMatch(minimumShouldMatch)))
.Sort(srt => sort));
var apiResult = result.GetApiResult<SearchKey, List<SearchKeyGetResponse>>();
return apiResult.Success ? apiResult : ApiResult<List<SearchKeyGetResponse>>.IsSuccess("Empty result");
}APM Monitoring
We use Elastic APM + Kibana (v7.4) for observability, similar to the SkyWalking setup we used previously.
Conclusion
The migration to Elasticsearch was smooth and lossless, and the system now handles massive read/write workloads with high performance and scalability.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Java Interview Crash Guide
Dedicated to sharing Java interview Q&A; follow and reply "java" to receive a free premium Java interview guide.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
