Tagged articles
72 articles
Page 1 of 1
DataFunSummit
DataFunSummit
May 20, 2026 · Big Data

How Kuaishou’s Real‑Time Data Lake Boosts AI and BI Architecture

The article explains how Kuaishou partnered with Apache Hudi to overhaul its ODS‑based data lake, addressing latency, storage cost, and complexity for AI and BI workloads, detailing the evolution from mysql‑to‑hive to mysql‑to‑hudi 1.0 and 2.0, the resulting performance gains, cost savings, and future roadmap.

AIBIBig Data
0 likes · 20 min read
How Kuaishou’s Real‑Time Data Lake Boosts AI and BI Architecture
Shuge Unlimited
Shuge Unlimited
Apr 29, 2026 · Databases

Milvus Storage Tuning in Practice: 25× Query Speedup and Three Tricks to Cut Memory Usage by Half

This article walks through Milvus 2.3‑2.6.x storage optimizations—Mmap, tiered storage, and clustering compaction—explaining their principles, configuration hierarchy, benchmark results, and concrete deployment templates that together can boost query performance up to 25‑fold while halving memory consumption.

MilvusStorage Optimizationclustering compaction
0 likes · 24 min read
Milvus Storage Tuning in Practice: 25× Query Speedup and Three Tricks to Cut Memory Usage by Half
dbaplus Community
dbaplus Community
Apr 25, 2026 · Backend Development

From Zero to One: Complete Architecture Design for a Billion‑Scale Short‑Video System

This article dissects the end‑to‑end architecture of a billion‑scale short‑video platform, detailing layered design, core services such as upload, transcoding, recommendation, interaction, storage, and the key challenges of massive video storage, high‑concurrency streaming, low‑latency playback, and real‑time recommendation reliability.

MicroservicesStorage OptimizationSystem Architecture
0 likes · 19 min read
From Zero to One: Complete Architecture Design for a Billion‑Scale Short‑Video System
DataFunSummit
DataFunSummit
Sep 23, 2025 · Artificial Intelligence

How PCache Supercharges Large‑Scale AI Training Storage Performance

This talk explores large‑scale AI training storage challenges and presents PCache, a high‑performance, cloud‑native caching system that optimizes metadata, read/write paths, deployment, and high‑availability, delivering significant throughput gains and cost savings for massive model training workloads.

AI trainingPCacheStorage Optimization
0 likes · 25 min read
How PCache Supercharges Large‑Scale AI Training Storage Performance
DataFunSummit
DataFunSummit
Sep 2, 2025 · Big Data

How Xiaomi Cuts Costs and Boosts Performance with Cloud‑Native Data Lake Architecture

Xiaomi’s engineers explain how they tackled data‑lake challenges—small files, metadata latency, and multi‑cloud costs—by combining compact storage, Gravitino‑based metadata governance, Iceberg and Paimon formats, and JuiceFS abstraction, achieving lower storage expenses, faster queries, and a roadmap toward intelligent, real‑time, multimodal lakehouses.

Big DataData LakeStorage Optimization
0 likes · 14 min read
How Xiaomi Cuts Costs and Boosts Performance with Cloud‑Native Data Lake Architecture
Su San Talks Tech
Su San Talks Tech
Sep 1, 2025 · Backend Development

Build a Scalable Short‑Video System: Architecture, Storage, and Real‑Time Recommendations

This article dissects the architecture of a modern short‑video backend, covering layered system design, core services such as video production, distribution, interaction, storage strategies, real‑time and offline recommendation engines, high‑concurrency streaming solutions, and practical techniques for cost control, scalability, and fault tolerance.

Backend ArchitectureStorage Optimizationhigh concurrency
0 likes · 22 min read
Build a Scalable Short‑Video System: Architecture, Storage, and Real‑Time Recommendations
MaGe Linux Operations
MaGe Linux Operations
Aug 30, 2025 · Operations

Master RAID Configuration & Performance: From Beginner to Pro

This comprehensive guide walks you through RAID fundamentals, hardware and software configuration, performance tuning, cost‑benefit analysis, fault diagnosis, and real‑world case studies, providing actionable commands and best‑practice recommendations to help you boost storage performance and reliability by up to 300%.

HardwareRAIDStorage Optimization
0 likes · 23 min read
Master RAID Configuration & Performance: From Beginner to Pro
DataFunSummit
DataFunSummit
Jul 20, 2025 · Big Data

How Beike Scaled to 600 PB: The Evolution of a Data‑Fusion Architecture

This article details Beike's data‑fusion architecture evolution, covering industry trends, multi‑stage Hadoop upgrades, storage cost optimization with erasure coding, remote shuffle integration, GPU‑centric training stability, and future hybrid‑cloud strategies, while also sharing organizational and operational lessons learned.

AIData ArchitectureHadoop
0 likes · 16 min read
How Beike Scaled to 600 PB: The Evolution of a Data‑Fusion Architecture
Alibaba Cloud Big Data AI Platform
Alibaba Cloud Big Data AI Platform
Jul 15, 2025 · Big Data

How MaxCompute’s Append DeltaTable Transforms BigQuery Migration

This article details the complex migration of a leading Southeast Asian tech group's data warehouse from Google BigQuery to Alibaba Cloud MaxCompute, outlining challenges such as storage format differences, SQL compatibility, and performance tuning, and explains how the new Append DeltaTable format with dynamic bucketing and incremental reclustering resolves these issues.

Big DataData MigrationData Warehouse
0 likes · 19 min read
How MaxCompute’s Append DeltaTable Transforms BigQuery Migration
Alibaba Cloud Infrastructure
Alibaba Cloud Infrastructure
Jul 11, 2025 · Cloud Native

How Alibaba Cloud’s AI Infra Innovations Are Transforming Kubernetes Workloads

This article summarizes Alibaba Cloud’s key technical contributions at KubeCon China 2025, covering AI‑focused Kubernetes optimizations, Argo Workflows enhancements, storage strategies for large models, Fluid’s data orchestration, multi‑tenant security, and the RoleBasedGroup framework for PD‑separated AI inference.

AI InfrastructureArgo WorkflowsFluid
0 likes · 20 min read
How Alibaba Cloud’s AI Infra Innovations Are Transforming Kubernetes Workloads
360 Zhihui Cloud Developer
360 Zhihui Cloud Developer
Jun 6, 2025 · Fundamentals

How Erasure Coding Cuts Storage Costs in Ozone: A Deep Dive

This article explains how Erasure Coding (EC) improves data reliability and dramatically reduces storage overhead in Ozone by leveraging hot‑cold data characteristics, intelligent tiering, dynamic EC ratios, and repair throttling, while also discussing performance trade‑offs and limitations.

Data ReliabilityOzoneStorage Optimization
0 likes · 9 min read
How Erasure Coding Cuts Storage Costs in Ozone: A Deep Dive
Kuaishou Tech
Kuaishou Tech
May 28, 2025 · Databases

Optimizing Kuaishou's Photo Object Storage: Reducing Size and Boosting Cache Hit Rate

This article details how Kuaishou dramatically cut storage costs and improved cache efficiency for its core Photo data object by cleaning up redundant JSON fields, applying selective serialization, and performing large‑scale data cleaning, achieving a 25% size reduction, a 2% cache‑hit increase, and multi‑hundred‑TB savings.

Cache Hit RateKuaishouPhoto Object
0 likes · 20 min read
Optimizing Kuaishou's Photo Object Storage: Reducing Size and Boosting Cache Hit Rate
Baidu Intelligent Cloud Tech Hub
Baidu Intelligent Cloud Tech Hub
Sep 25, 2024 · Big Data

How Cold‑Hot Data Separation Boosts Cost Efficiency in Baidu Palo for Apache Doris

This article explains the principles, configuration steps, monitoring metrics, leader selection, data migration granularity, compaction, invalid data cleanup, and cache mechanisms of cold‑hot data separation in Baidu Intelligent Cloud's Palo for Apache Doris, illustrating how tiered storage reduces costs while maintaining query performance.

Apache DorisData TieringPalo
0 likes · 21 min read
How Cold‑Hot Data Separation Boosts Cost Efficiency in Baidu Palo for Apache Doris
Data Thinking Notes
Data Thinking Notes
Jul 4, 2024 · Big Data

How Active Metadata Revolutionizes Data Governance and Cuts Costs

This article examines the growing challenges of data management—such as asset discoverability, architectural rigidity, development quality, and rising resource costs—and presents a comprehensive data‑governance framework that leverages standards, agile architecture, development isolation, and active‑metadata‑driven lifecycle evaluation to improve efficiency, reduce expenses, and enable intelligent, automated data back‑filling.

Big DataData GovernanceStorage Optimization
0 likes · 17 min read
How Active Metadata Revolutionizes Data Governance and Cuts Costs
DataFunSummit
DataFunSummit
Jun 3, 2024 · Big Data

Data Governance and Active Metadata Practices at JD Retail

The article outlines JD Retail's data management challenges—including asset awareness, architectural agility, development quality, and rising resource costs—and presents a comprehensive data governance framework that leverages data standards, agile architecture, development isolation, resource optimization, and active metadata to achieve intelligent lifecycle evaluation, automated back‑fill, and future‑oriented data fabric improvements.

Data GovernanceData LifecycleStorage Optimization
0 likes · 18 min read
Data Governance and Active Metadata Practices at JD Retail
JD Tech
JD Tech
Feb 21, 2024 · Operations

Storage Model Optimization and Performance Testing for Hot SKU Inventory Pre‑occupancy

This article explores practical performance testing and tuning techniques, focusing on storage model optimization and call‑chain analysis to improve hot‑SKU inventory pre‑occupancy throughput, presenting detailed pressure‑testing scenarios, results, cache‑layer redesign, and strategies for identifying and mitigating system bottlenecks.

Load TestingPerformance TestingStorage Optimization
0 likes · 15 min read
Storage Model Optimization and Performance Testing for Hot SKU Inventory Pre‑occupancy
ITPUB
ITPUB
Jan 20, 2024 · Databases

How to Cut MySQL Storage Costs by 50%: A Systematic Approach

This article outlines a comprehensive, data‑driven methodology for reducing MySQL storage expenses—including background analysis, challenge identification, a nine‑grid systematic framework, benefit estimation, safety and stability verification, gray‑scale rollout, and rollback strategies—demonstrating over 50% disk space savings in a large‑scale billing system.

Data SafetyDatabase Cost ReductionGray Deployment
0 likes · 15 min read
How to Cut MySQL Storage Costs by 50%: A Systematic Approach
Didi Tech
Didi Tech
Jan 9, 2024 · Big Data

Introducing Apache Pulsar: Technical Benefits and Solutions for Didi Big Data Messaging System

Apache Pulsar, a cloud‑native distributed messaging platform, solves Didi Big Data’s DKafka bottlenecks by separating compute and storage, using sequential log writes, heterogeneous disks, multi‑level caching, bundle‑based load balancing and automatic scaling, dramatically improving stability while introducing richer monitoring complexity.

Apache PulsarCluster ManagementDKafka
0 likes · 17 min read
Introducing Apache Pulsar: Technical Benefits and Solutions for Didi Big Data Messaging System
dbaplus Community
dbaplus Community
Jan 7, 2024 · Databases

How to Cut MySQL Storage Costs by Over 50%: A Practical Framework

This article presents a systematic, nine‑grid method for reducing MySQL storage expenses—including table compression, JSON field serialization, and hot‑cold data separation—while quantifying benefits, ensuring data safety, and validating system stability through staged testing and SRE metrics.

Database Cost ReductionPerformance TestingStorage Optimization
0 likes · 13 min read
How to Cut MySQL Storage Costs by Over 50%: A Practical Framework
JD Retail Technology
JD Retail Technology
Dec 28, 2023 · Databases

Methods and Practices for Reducing MySQL Database Storage Costs

This article outlines the background, challenges, systematic methods, benefit calculations, data‑safety and stability checks, verification steps, rollback strategies, and gray‑deployment practices for lowering MySQL storage expenses in large‑scale billing systems while maintaining system reliability.

Data SafetyDatabase Cost ReductionGray Deployment
0 likes · 12 min read
Methods and Practices for Reducing MySQL Database Storage Costs
Tencent Cloud Developer
Tencent Cloud Developer
Sep 20, 2023 · Operations

Storage Governance and Optimization Practices for Meeting Control Systems

The article explains how a meeting control system tackled severe storage pressure from high concurrent traffic by introducing a proxy layer, multi‑active disaster‑recovery, identity‑based data isolation, dynamic‑static key separation, multi‑level caching, overload protection, sharding with dual‑write migration, and extensive monitoring to meet 100k QPS and ensure reliability.

Storage Optimizationdatabase shardingdisaster recovery
0 likes · 49 min read
Storage Governance and Optimization Practices for Meeting Control Systems
iQIYI Technical Product Team
iQIYI Technical Product Team
Aug 25, 2023 · Big Data

Venus Log Platform Architecture Evolution: From ELK to Data Lake

The Venus log platform at iQiyi migrated from an ElasticSearch‑Kibana architecture to an Iceberg‑based data lake with Trino, cutting storage and compute costs by over 70%, boosting stability by 85%, and efficiently supporting billions of daily logs through write‑heavy, low‑query workloads.

Big DataElasticsearchIceberg
0 likes · 22 min read
Venus Log Platform Architecture Evolution: From ELK to Data Lake
Baidu Intelligent Cloud Tech Hub
Baidu Intelligent Cloud Tech Hub
Aug 22, 2023 · Fundamentals

How Baidu’s CDS Uses Erasure Coding to Cut Storage Costs and I/O Amplification

This article explains Baidu Intelligent Cloud's block storage (CDS) architecture, comparing fault‑tolerance methods, detailing the challenges of large‑scale erasure‑coded storage, and describing Baidu's two‑layer append‑engine solution that reduces I/O amplification while keeping costs low.

I/O amplificationStorage Optimizationappend engine
0 likes · 15 min read
How Baidu’s CDS Uses Erasure Coding to Cut Storage Costs and I/O Amplification
Didi Tech
Didi Tech
May 26, 2023 · Big Data

Design and Optimization of Didi's Spatial‑Temporal Supply‑Demand System

Didi’s redesigned Spatial‑Temporal Supply‑Demand System replaces a single‑Redis bottleneck with a multi‑cluster routing layer, semantic sharding, multi‑level caching and delayed queues, achieving higher horizontal scalability, fault isolation, ~30 % latency reduction, increased cache hit rates, fewer query nodes, and faster, code‑free feature configuration.

Configuration ManagementDistributed SystemsGolang
0 likes · 19 min read
Design and Optimization of Didi's Spatial‑Temporal Supply‑Demand System
DataFunTalk
DataFunTalk
May 22, 2023 · Big Data

Alibaba Cloud Data Lake: Unified Metadata and Storage Management Practices

This article explains Alibaba Cloud's data lake architecture, unified metadata services, storage management optimizations, and format handling techniques, illustrating how lakehouse concepts, multi‑engine support, and lifecycle policies enable efficient, secure, and cost‑effective big data processing in the cloud.

Big DataCloud ServicesData Lake
0 likes · 22 min read
Alibaba Cloud Data Lake: Unified Metadata and Storage Management Practices
Coolpad Technology Team
Coolpad Technology Team
Apr 27, 2023 · Cloud Computing

EROFS Cluster Mode Analysis in Linux Kernel 6.x

This article analyzes the EROFS cluster modes (INFLIGHT, HOOKED, FOLLOWED, FOLLOWED_NOINPLACE) in Linux kernel 6.x, explaining how they determine whether in-place I/O can be used based on the current status of pclusters in the chain.

Cluster ModesEROFSIn-place I/O
0 likes · 6 min read
EROFS Cluster Mode Analysis in Linux Kernel 6.x
Data Thinking Notes
Data Thinking Notes
Apr 19, 2023 · Big Data

How Bilibili Transformed Big Data Governance: From Reactive Storage Management to Proactive Multi‑Dimensional Control

This article details Bilibili's evolution of big data governance, describing the early data growth challenges, the launch of the "Wanglou" project, the development of asset metadata and governance indicator frameworks, storage cost reduction strategies, scoring models, and the shift from passive, single‑point fixes to proactive, multi‑dimensional governance across the organization.

Big DataBilibiliCost Management
0 likes · 22 min read
How Bilibili Transformed Big Data Governance: From Reactive Storage Management to Proactive Multi‑Dimensional Control
Bilibili Tech
Bilibili Tech
Apr 11, 2023 · Big Data

Bilibili Big Data Governance: From Reactive Storage Management to Proactive Multi‑Dimensional Governance

Bilibili’s exabyte‑scale big‑data platform, after rapid growth created fragmented ownership and costly storage, launched the Wanglou project to build a metadata‑driven, indicator‑based governance framework that cut storage use by half, introduced compliance scoring and automation, and now plans to extend proactive, multi‑dimensional governance to compute, traffic and lake‑house resources.

BilibiliData GovernanceStorage Optimization
0 likes · 21 min read
Bilibili Big Data Governance: From Reactive Storage Management to Proactive Multi‑Dimensional Governance
DeWu Technology
DeWu Technology
Feb 15, 2023 · Backend Development

E-commerce Product Ranking System Migration: Technical Implementation and Storage Optimization

The article describes how an e‑commerce product ranking system was migrated to the new “Liao Yue” platform, decoupling it from the search module, introducing fresh metrics and Elasticsearch‑based sorting, then optimizing storage by separating B‑end and C‑end data—cutting costs 60%—with a gray‑scale rollout, dual‑read validation, rollback safeguards, and completing the two‑week, zero‑failure migration that delivered a closed‑loop, faster iteration system.

Backend DevelopmentElasticsearchStorage Optimization
0 likes · 15 min read
E-commerce Product Ranking System Migration: Technical Implementation and Storage Optimization
Architecture Digest
Architecture Digest
Aug 14, 2022 · Big Data

Replacing Classic Data Warehouse Dimensional Model with a Single Wide Table: Architecture, Benefits, and Challenges

This article analyzes the shift from traditional multi‑layer data warehouse dimensional modeling to a single-layer wide‑table approach, detailing business drivers, technical architecture, storage and query performance gains, as well as the development, maintenance, and operational challenges involved.

Storage OptimizationWide Tablequery-performance
0 likes · 10 min read
Replacing Classic Data Warehouse Dimensional Model with a Single Wide Table: Architecture, Benefits, and Challenges
vivo Internet Technology
vivo Internet Technology
Jun 29, 2022 · Big Data

Lossless Image Compression Overview and Lepton Optimization for Large‑Scale Storage

The article explains JPEG’s lossy fundamentals, introduces Lepton’s lossless layer and its optimizations—such as arithmetic coding and multithreaded Huffman switching—and describes how vivo’s hybrid physical‑server and Kubernetes deployment achieves roughly 22 % storage reduction across petabytes of JPEG images despite high CPU demands.

Huffman codingJPEGLepton
0 likes · 13 min read
Lossless Image Compression Overview and Lepton Optimization for Large‑Scale Storage
Baidu Geek Talk
Baidu Geek Talk
Jun 15, 2022 · Big Data

Replacing Classic Data Warehouse with a One‑Layer Wide Table Model: Architecture, Benefits, and Challenges

The article proposes replacing the traditional multi‑layered data‑warehouse architecture (ODS‑DWD‑DWS‑ADS) with a single, column‑store wide‑table per business theme, achieving roughly 30 % storage savings and faster queries, while acknowledging higher ETL complexity, back‑tracking costs, and production timing challenges.

Big DataData WarehouseETL
0 likes · 11 min read
Replacing Classic Data Warehouse with a One‑Layer Wide Table Model: Architecture, Benefits, and Challenges
Zuoyebang Tech Team
Zuoyebang Tech Team
May 13, 2022 · Operations

Build a Scalable, Cost‑Effective Log Retrieval System Without Elasticsearch

This article explains how to design a high‑performance, low‑cost log retrieval architecture that avoids Elasticsearch by partitioning logs into time‑based chunks, indexing only metadata, using multi‑tier storage (local, remote, archive), and orchestrating queries through GD‑Search, Local‑Search, Remote‑Search and Log‑Manager components.

Distributed SystemsStorage Optimizationcost efficiency
0 likes · 14 min read
Build a Scalable, Cost‑Effective Log Retrieval System Without Elasticsearch
ByteDance Data Platform
ByteDance Data Platform
Apr 27, 2022 · Big Data

How ByteDance Built a Scalable Data Catalog: Key Technologies and Future Plans

ByteDance’s Data Catalog article details the system’s unified metadata model, standardized ingestion connectors, search optimization techniques, lineage capabilities, and storage layer enhancements, highlighting key technical designs, performance improvements, and future work to advance data governance and asset utilization.

Data CatalogData LineageStorage Optimization
0 likes · 12 min read
How ByteDance Built a Scalable Data Catalog: Key Technologies and Future Plans
DeWu Technology
DeWu Technology
Apr 18, 2022 · Artificial Intelligence

Warehouse Storage Location Recommendation: Architecture, Recall, and Ranking Strategies

The article outlines DeWu’s warehouse‑management recommendation system, which combines an online‑near‑line‑offline architecture to quickly recall viable shelf slots and rank them by space utilization, travel time, and sales potential, enabling automated, constraint‑aware placement that cuts picking time and inventory costs.

AIBig DataStorage Optimization
0 likes · 16 min read
Warehouse Storage Location Recommendation: Architecture, Recall, and Ranking Strategies
Bilibili Tech
Bilibili Tech
Mar 30, 2022 · Big Data

HDFS Architecture, Optimizations, and Future Plans at Bilibili

Bilibili’s HDFS now runs a three‑tier architecture—access, metadata, and data layers—enhanced with a custom MergeFS router, observer NameNode, dynamic load balancing, fast‑failover pipelines, and storage‑aware policies, while future work targets transparent erasure coding, tiered data routing, lock refinements, and a Hadoop 3.x migration.

Big DataDistributed File SystemHDFS
0 likes · 22 min read
HDFS Architecture, Optimizations, and Future Plans at Bilibili
dbaplus Community
dbaplus Community
Jan 12, 2022 · Big Data

How ClickHouse Powers YiBei's Scalable Advertising Data Platform

This article details YiBei's advertising data platform built on ClickHouse, covering business requirements, why ClickHouse was chosen over Druid, storage engine and compression choices, real‑time and offline ingestion pipelines, partitioning, Zookeeper bottlenecks, atomic data replacement, and testing and release strategies for a high‑throughput, low‑latency ad analytics system.

AdvertisingLambda architectureReal-Time
0 likes · 28 min read
How ClickHouse Powers YiBei's Scalable Advertising Data Platform
Programmer DD
Programmer DD
Dec 22, 2021 · Backend Development

Designing a Space‑Efficient Read/Unread Store for Large Group Chats

This article examines how to model read/unread status for group chat messages using bitmap techniques, evaluates memory costs of naive approaches, and presents a compact storage schema that handles member joins, exits, and scalability while drastically reducing per‑message overhead.

BitmapStorage Optimizationbackend design
0 likes · 7 min read
Designing a Space‑Efficient Read/Unread Store for Large Group Chats
Big Data Technology Architecture
Big Data Technology Architecture
Jul 15, 2021 · Big Data

Building Data Lake Solutions with Iceberg and Object Storage: Architecture, Write/Read Processes, and Storage Optimization

This article presents a comprehensive overview of using Apache Iceberg with object storage to construct scalable data lake solutions, covering lake architecture, Iceberg table organization, Flink‑based write and read workflows, catalog abstractions, object storage versus HDFS comparisons, append‑upload and atomic‑commit challenges, a demonstration setup, and ideas for storage optimization.

CatalogFlinkIceberg
0 likes · 16 min read
Building Data Lake Solutions with Iceberg and Object Storage: Architecture, Write/Read Processes, and Storage Optimization
Big Data Technology Architecture
Big Data Technology Architecture
Mar 25, 2021 · Big Data

Implementing Erasure Coding in HDFS: Migration, Testing, and Data Lifecycle Management at JD

This article details JD's end‑to‑end implementation of HDFS erasure coding, covering the migration from replication to EC, the three‑phase upgrade and rollback process, comprehensive automated testing, a custom data‑lifecycle management system for hot‑warm‑cold data, and multi‑layer integrity safeguards to achieve significant storage cost reduction while maintaining reliability.

Data LifecycleHDFSStorage Optimization
0 likes · 17 min read
Implementing Erasure Coding in HDFS: Migration, Testing, and Data Lifecycle Management at JD
JD Tech
JD Tech
Mar 20, 2021 · Big Data

Implementing Erasure Coding in HDFS: Migration Strategy, Testing Framework, and Data Lifecycle Management

This article details JD's practical experience migrating HDFS to erasure coding, covering the decision between upgrade and porting, the step‑by‑step upgrade and rollback procedures, automated testing, a custom data‑lifecycle management system for hot‑warm‑cold data, and comprehensive data‑integrity safeguards to achieve significant storage cost reductions while maintaining production reliability.

Cluster UpgradeData Lifecycle ManagementHDFS
0 likes · 17 min read
Implementing Erasure Coding in HDFS: Migration Strategy, Testing Framework, and Data Lifecycle Management
Tencent Cloud Developer
Tencent Cloud Developer
Dec 7, 2020 · Big Data

Searchable Snapshots in Elasticsearch 7.10: Features, Usage, and Future Outlook

Elasticsearch 7.10 adds searchable snapshots, letting users query indices stored directly in remote repositories such as S3 or COS, which halves storage costs, decouples storage from compute, supports manual mounting and ILM cold‑phase policies, and promises future full storage‑compute separation without local caching.

Big DataData TieringElasticsearch
0 likes · 12 min read
Searchable Snapshots in Elasticsearch 7.10: Features, Usage, and Future Outlook
Java Backend Technology
Java Backend Technology
Sep 26, 2020 · Databases

Master Redis Bitmaps: SETBIT, GETBIT, BITCOUNT, BITOP Explained

This article introduces Redis's advanced bitmap capabilities, detailing the SETBIT, GETBIT, BITCOUNT, and BITOP commands, their syntax, underlying SDS data structure, performance characteristics, storage calculations, and practical use cases such as user sign‑in tracking and online status monitoring.

Storage Optimizationbit operationsperformance
0 likes · 8 min read
Master Redis Bitmaps: SETBIT, GETBIT, BITCOUNT, BITOP Explained
Didi Tech
Didi Tech
Mar 31, 2020 · Big Data

Elasticsearch Version Upgrade: Architecture, Challenges, and Performance Optimization at Didi

Over seven months, Didi’s Elasticsearch team upgraded more than 30 clusters, 2,000 nodes and 4 PB of data from version 2.3.3 to 6.6.1, overcoming protocol and mapping incompatibilities with a multi‑version Arius Gateway, custom Java SDK, ECM and AMS, while saving 1 PB of storage, decommissioning 400 machines, boosting query speed by 40 %, write throughput by 30 % and cutting CPU use 10 % for an estimated 80 w/month cost reduction.

ElasticsearchPerformance OptimizationStorage Optimization
0 likes · 18 min read
Elasticsearch Version Upgrade: Architecture, Challenges, and Performance Optimization at Didi
Architects' Tech Alliance
Architects' Tech Alliance
Jul 28, 2019 · Big Data

Alluxio: A Virtual Distributed File System for Unified Big Data Access and Cost‑Effective Storage

The article explains how Alluxio, a memory‑speed virtual distributed file system, acts as a virtual data lake to unify access to structured and unstructured big‑data across heterogeneous storage systems, offering on‑demand fast local access, intelligent caching, reduced storage costs, and enterprise‑grade security and fault tolerance.

AlluxioBig DataData Lake
0 likes · 15 min read
Alluxio: A Virtual Distributed File System for Unified Big Data Access and Cost‑Effective Storage
Architects' Tech Alliance
Architects' Tech Alliance
Jul 20, 2019 · Industry Insights

Why Continuous Data Protection Is the Future of Enterprise Backup

The article analyzes the evolution of data protection—from manual copies and scripts to snapshots, Continuous Data Protection (CDP), and Copy Data Management (CDM)—highlighting their technical mechanisms, benefits such as near‑zero RPO, implementation models, vendor landscape, and key considerations for selecting the right solution in modern cloud‑centric environments.

Backup StrategiesContinuous Data ProtectionCopy Data Management
0 likes · 20 min read
Why Continuous Data Protection Is the Future of Enterprise Backup
58 Tech
58 Tech
Mar 27, 2019 · Databases

OpenTSDB Architecture, Data Model, Storage Optimizations, and Practical Use Cases

This article introduces OpenTSDB as a distributed, scalable time‑series database built on HBase, explains its architecture, data model, and storage optimizations, presents real‑world monitoring use cases, analyzes performance issues caused by high‑cardinality tags, and details the solution steps taken to restore query speed.

HBaseOpenTSDBStorage Optimization
0 likes · 9 min read
OpenTSDB Architecture, Data Model, Storage Optimizations, and Practical Use Cases
iQIYI Technical Product Team
iQIYI Technical Product Team
Jan 4, 2019 · Artificial Intelligence

Building a Deep Learning Training Platform on Cloud: Challenges, Runonce Service, and Storage Optimization

iQIYI built a cloud‑based deep‑learning training platform called Jarvis, replacing the initial Runonce service, by containerizing GPU tasks, adopting Ceph S3 storage with FUSE, optimizing data pipelines, and addressing compute, storage, and networking challenges to improve scalability and reduce GPU idle time.

AI trainingDeep LearningGPU computing
0 likes · 9 min read
Building a Deep Learning Training Platform on Cloud: Challenges, Runonce Service, and Storage Optimization
Architects' Tech Alliance
Architects' Tech Alliance
Nov 5, 2018 · Big Data

Alluxio as a Virtual Distributed File System for Data Lake Solutions

The article explains how Alluxio provides a virtual distributed file system that acts as a "virtual data lake," enabling unified, high‑performance access to structured and unstructured data across heterogeneous storage back‑ends while reducing storage costs through intelligent caching and eliminating the need for permanent data copies.

AlluxioBig DataData Lake
0 likes · 16 min read
Alluxio as a Virtual Distributed File System for Data Lake Solutions
JD Tech
JD Tech
Sep 20, 2018 · Big Data

Optimizing Local Storage Systems for Large‑Scale Hadoop HDFS Clusters

This article explains the architecture of Hadoop HDFS, identifies performance bottlenecks in page cache and metadata handling on DataNodes, and presents four practical optimization techniques—including cache‑buffer separation, barrier disabling, directory restructuring, and real‑time monitoring—demonstrating significant throughput and latency improvements in large‑scale clusters.

HDFSHadoopLinux kernel
0 likes · 14 min read
Optimizing Local Storage Systems for Large‑Scale Hadoop HDFS Clusters
Architects' Tech Alliance
Architects' Tech Alliance
Aug 7, 2018 · Operations

Lustre Performance Optimization Guide

This article provides a comprehensive guide to optimizing Lustre, the leading open‑source parallel file system for high‑performance computing, covering network bandwidth, stripe settings, client configuration, RAID choices, small‑file handling, and practical system commands to improve aggregate I/O performance.

HPCLustreStorage Optimization
0 likes · 8 min read
Lustre Performance Optimization Guide
MaGe Linux Operations
MaGe Linux Operations
Aug 3, 2017 · Operations

How to Boost Linux Server Performance by Tuning I/O Schedulers

This guide explains why Linux I/O scheduler selection matters for virtualized servers, compares deadline, CFQ, noop and anticipatory schedulers, and shows how to configure them globally or per‑disk to improve storage performance in modern data‑center environments.

I/O schedulerLinuxSAN
0 likes · 6 min read
How to Boost Linux Server Performance by Tuning I/O Schedulers
Architects' Tech Alliance
Architects' Tech Alliance
Jul 14, 2017 · Industry Insights

How a New ‘Non‑Balance’ Wear‑Leveling Algorithm Can Triple SSD Lifespan

The article explains the background of flash‑memory wear‑leveling, reviews common garbage‑collection strategies, compares classic algorithms such as Greedy, Cost‑Benefit, CAT and CICL, and introduces the Non‑Balance method that evaluates real block endurance to extend SSD life up to three times.

Garbage CollectionSSDStorage Optimization
0 likes · 11 min read
How a New ‘Non‑Balance’ Wear‑Leveling Algorithm Can Triple SSD Lifespan
dbaplus Community
dbaplus Community
Nov 14, 2016 · Databases

Why Oracle Log File Sync Bottlenecks Appear and How to Eliminate Them

During high‑concurrency flash‑sale events, Oracle’s log file sync became a performance bottleneck; the article analyzes storage, OS, and Oracle Disk Manager factors, presents AWR metrics, demonstrates tuning steps—including disabling adaptive log file sync and enabling ODM—and shows measurable latency reductions.

Database PerformanceLog File SyncODM
0 likes · 18 min read
Why Oracle Log File Sync Bottlenecks Appear and How to Eliminate Them
Art of Distributed System Architecture Design
Art of Distributed System Architecture Design
Apr 25, 2016 · Backend Development

Twitter’s Media Platform: Scaling Image Uploads and Storage Optimizations

The talk by Twitter engineer Henna Kermani outlines how the Media Platform’s decoupled, resumable upload pipeline, handle‑based storage, TTL policies, on‑demand processing, and Progressive JPEG adoption enabled processing 3,000 images per second while cutting storage, compute costs and improving operational flexibility.

Backend EngineeringMedia PlatformStorage Optimization
0 likes · 4 min read
Twitter’s Media Platform: Scaling Image Uploads and Storage Optimizations
21CTO
21CTO
Mar 15, 2016 · Databases

How Database Compression Boosts Performance While Cutting Storage Costs

This article examines why storage capacity limits IT systems, explains how database compression reduces disk usage and I/O time, discusses various dictionary‑based compression methods, related operations and commands, and evaluates compression ratios and overall impact on system performance.

DB2I/O reductionStorage Optimization
0 likes · 11 min read
How Database Compression Boosts Performance While Cutting Storage Costs
Baidu Tech Salon
Baidu Tech Salon
Apr 30, 2014 · Backend Development

Logical Coupling, Service Layer Design, and Distributed System Architecture for Large-Scale Web Applications

The article examines the inevitability of service coupling in large‑scale web applications and proposes a two‑dimensional architecture that separates business and logic layers, uses internal data stores, introduces a naming‑and‑location service, selects appropriate transport and RPC protocols, and automates operations with health checks, load balancing, and failover to achieve continuous reliability.

Backend ArchitectureDistributed SystemsJava
0 likes · 29 min read
Logical Coupling, Service Layer Design, and Distributed System Architecture for Large-Scale Web Applications