Tagged articles
20 articles
Page 1 of 1
James' Growth Diary
James' Growth Diary
May 10, 2026 · Artificial Intelligence

Syncing Vectors with Changing Documents: Add, Update, Delete Made Simple

This article walks through why keeping a vector store consistent with a mutable knowledge base is challenging, explains the three failure points, introduces hash‑based incremental syncing, shows idempotent add, proper update and soft‑delete workflows, covers embedding model upgrades, and presents a production‑grade event‑driven architecture with common pitfalls and remedies.

Hash DeduplicationIncremental SyncLangChain
0 likes · 17 min read
Syncing Vectors with Changing Documents: Add, Update, Delete Made Simple
Architect's Guide
Architect's Guide
May 9, 2026 · Databases

Alibaba’s Open‑Source DataX: Fast, Easy Offline Data Synchronization

This article introduces Alibaba’s open‑source DataX tool, explains its framework‑plugin architecture for heterogeneous database sync, walks through Linux installation, job configuration, full‑ and incremental MySQL synchronization, and shares performance results and practical tips.

DataXETLIncremental Sync
0 likes · 15 min read
Alibaba’s Open‑Source DataX: Fast, Easy Offline Data Synchronization
Java Companion
Java Companion
Jan 20, 2026 · Backend Development

How to Integrate Spring Boot with Third‑Party APIs: HTTP Clients, Sync Strategies, and Code Samples

This article explains how to connect Spring Boot to external services by choosing the appropriate HTTP client (RestTemplate, Feign, WebClient), configuring beans, implementing service methods, and applying various data‑synchronization techniques such as full sync, UPSERT, incremental sync, webhook callbacks, and message‑queue based replication.

Incremental SyncMessage QueueSpring Boot
0 likes · 20 min read
How to Integrate Spring Boot with Third‑Party APIs: HTTP Clients, Sync Strategies, and Code Samples
Mingyi World Elasticsearch
Mingyi World Elasticsearch
Sep 24, 2025 · Big Data

How 3 Simple Tweaks Doubled Elasticsearch Scan Performance on 40M Docs

The article details a real‑world case of scanning over 40 million Elasticsearch documents, identifies four performance bottlenecks, and presents three concrete optimizations—_source filtering, precise index targeting, and batch‑size tuning—that together cut processing time in half and raise CPU utilization from 25% to 85%.

Batch Size TuningElasticsearchIncremental Sync
0 likes · 8 min read
How 3 Simple Tweaks Doubled Elasticsearch Scan Performance on 40M Docs
Su San Talks Tech
Su San Talks Tech
May 29, 2025 · Big Data

How to Sync Massive MySQL Data with Alibaba DataX – Step‑by‑Step Guide

Facing a 50‑million‑row project with inaccurate reports and cross‑database operations, this guide explains why mysqldump and simple storage methods fail, introduces Alibaba’s open‑source DataX middleware, details its architecture, installation, and step‑by‑step configurations for full and incremental MySQL data synchronization.

DataXETLIncremental Sync
0 likes · 14 min read
How to Sync Massive MySQL Data with Alibaba DataX – Step‑by‑Step Guide
Java Backend Technology
Java Backend Technology
May 21, 2025 · Big Data

Master DataX: Fast Offline Data Sync for MySQL without mysqldump

This guide explains how to use Alibaba's open‑source DataX tool to perform high‑performance offline synchronization between heterogeneous MySQL databases, covering installation, framework design, job configuration, full‑ and incremental sync, and practical command‑line examples.

Big DataDataXETL
0 likes · 15 min read
Master DataX: Fast Offline Data Sync for MySQL without mysqldump
Java Tech Enthusiast
Java Tech Enthusiast
May 13, 2025 · Big Data

Using Alibaba DataX 3.0 for MySQL Data Synchronization: Installation, Configuration, and Incremental Sync

This article introduces Alibaba DataX 3.0, explains its architecture and role‑based design, walks through Linux installation, JDK setup, MySQL preparation, and provides step‑by‑step examples of full‑load and incremental data synchronization between two MySQL instances using JSON job configurations and command‑line execution.

DataXETLIncremental Sync
0 likes · 14 min read
Using Alibaba DataX 3.0 for MySQL Data Synchronization: Installation, Configuration, and Incremental Sync
dbaplus Community
dbaplus Community
Apr 10, 2025 · Databases

How to Seamlessly Migrate MongoDB to New Data Stores Without Downtime

This article presents a complete, step‑by‑step plan for migrating a legacy MongoDB‑based system to alternative data stores—including MySQL, Elasticsearch, and JD’s JImKV—while ensuring zero service interruption through careful scope analysis, DAO refactoring, dual‑write synchronization, gray‑release rollout, and robust monitoring and rollback mechanisms.

Data MigrationDatabase RefactoringIncremental Sync
0 likes · 7 min read
How to Seamlessly Migrate MongoDB to New Data Stores Without Downtime
macrozheng
macrozheng
Sep 27, 2024 · Big Data

Master DataX: Efficient Offline Data Sync for Heterogeneous Sources

This guide walks through the challenges of synchronizing massive datasets across heterogeneous databases, introduces Alibaba's open‑source DataX tool, explains its framework‑plugin architecture, and provides step‑by‑step instructions—including environment setup, installation, job configuration, and both full and incremental MySQL synchronization—complete with code examples and performance metrics.

Big DataData IntegrationDataX
0 likes · 15 min read
Master DataX: Efficient Offline Data Sync for Heterogeneous Sources
MaGe Linux Operations
MaGe Linux Operations
Apr 28, 2023 · Big Data

How to Sync 50 Million Rows Efficiently with Alibaba’s DataX

This guide explains why traditional mysqldump and file‑based methods fail for massive cross‑database sync, introduces Alibaba’s open‑source DataX middleware, details its framework and plugin architecture, walks through installation on Linux, shows how to configure MySQL source and target, and demonstrates both full and incremental data synchronization with practical JSON job examples.

DataXETLIncremental Sync
0 likes · 14 min read
How to Sync 50 Million Rows Efficiently with Alibaba’s DataX
Code Ape Tech Column
Code Ape Tech Column
Jan 28, 2023 · Big Data

Using Alibaba DataX for Offline Data Synchronization and Incremental Sync

This article introduces Alibaba DataX, explains its architecture and role in offline heterogeneous data synchronization, provides step‑by‑step Linux installation, demonstrates full‑load and incremental MySQL‑to‑MySQL sync with JSON job templates, and shares practical tips for handling large data volumes.

Data IntegrationDataXETL
0 likes · 15 min read
Using Alibaba DataX for Offline Data Synchronization and Incremental Sync
DataFunSummit
DataFunSummit
Dec 1, 2022 · Big Data

City Data Acquisition Platform: Architecture, Core Technologies, and Incremental Synchronization Strategies

This article presents an overview of a smart city unified perception platform, detailing its modular architecture, solutions for multi-source heterogeneity, incremental synchronization strategies, and real-time API data collection, while discussing extensibility and practical implementation considerations.

Big DataData PlatformIncremental Sync
0 likes · 20 min read
City Data Acquisition Platform: Architecture, Core Technologies, and Incremental Synchronization Strategies
vivo Internet Technology
vivo Internet Technology
Mar 9, 2022 · Big Data

Incremental Synchronization of Massive HBase Data to a Data Warehouse: Solution Overview and Performance Evaluation

The paper proposes a generic, timeRange‑based incremental extraction method for synchronizing tens of billions of HBase rows to a data warehouse, demonstrating that it avoids full‑table scans, automatically detects schema changes, and delivers significantly lower latency than Hive mapping or timestamp‑based approaches, and has been integrated into a unified big‑data platform.

Big DataHBaseIncremental Sync
0 likes · 8 min read
Incremental Synchronization of Massive HBase Data to a Data Warehouse: Solution Overview and Performance Evaluation
Wukong Talks Architecture
Wukong Talks Architecture
Oct 26, 2021 · Backend Development

Eureka Client Incremental Registry Synchronization Mechanism

This article explains how Eureka clients periodically fetch incremental registry updates, the underlying 30‑second synchronization interval, the server’s recentlyChangedQueue data structure, the merging process of delta and full registries, and hash‑based consistency checks to ensure client‑server registry alignment.

Incremental SyncJavaeureka
0 likes · 10 min read
Eureka Client Incremental Registry Synchronization Mechanism
Big Data Technology & Architecture
Big Data Technology & Architecture
Jun 6, 2021 · Big Data

Understanding Data Warehouses: Concepts, Architecture, Modeling, and Governance

This article provides a comprehensive overview of data warehouses, explaining their purpose, differences from databases, OLTP vs OLAP, traditional versus internet data warehouse models, layered architecture, modeling theories, metric dictionaries, date dimensions, naming conventions, data governance, and incremental synchronization techniques with practical SQL examples.

Big DataData GovernanceETL
0 likes · 24 min read
Understanding Data Warehouses: Concepts, Architecture, Modeling, and Governance
Meituan Technology Team
Meituan Technology Team
Apr 27, 2017 · Operations

Incremental File Synchronization Using Optimized rsync and zsync Techniques

The article proposes an incremental synchronization scheme for Meituan’s Rhino Cloud Disk that shifts rsync‑style signature and delta generation to the client, leverages zsync‑like HTTP range fetching, and discusses challenges such as block fragmentation, low locality in binary files, server‑side merge overhead, and future optimizations like variable‑size chunking and Rabin fingerprint‑based content‑defined chunking.

Incremental Synccloud storagefile synchronization
0 likes · 11 min read
Incremental File Synchronization Using Optimized rsync and zsync Techniques