Tagged articles

Storage Optimization

72 articles · Page 1 of 1

May 20, 2026 · Big Data

How Kuaishou’s Real‑Time Data Lake Boosts AI and BI Architecture

The article explains how Kuaishou partnered with Apache Hudi to overhaul its ODS‑based data lake, addressing latency, storage cost, and complexity for AI and BI workloads, detailing the evolution from mysql‑to‑hive to mysql‑to‑hudi 1.0 and 2.0, the resulting performance gains, cost savings, and future roadmap.

AIBIBig Data

0 likes · 20 min read

How Kuaishou’s Real‑Time Data Lake Boosts AI and BI Architecture

Shuge Unlimited

Apr 29, 2026 · Databases

Milvus Storage Tuning in Practice: 25× Query Speedup and Three Tricks to Cut Memory Usage by Half

This article walks through Milvus 2.3‑2.6.x storage optimizations—Mmap, tiered storage, and clustering compaction—explaining their principles, configuration hierarchy, benchmark results, and concrete deployment templates that together can boost query performance up to 25‑fold while halving memory consumption.

MilvusPerformance TuningStorage Optimization

0 likes · 24 min read

Milvus Storage Tuning in Practice: 25× Query Speedup and Three Tricks to Cut Memory Usage by Half

dbaplus Community

Apr 25, 2026 · Backend Development

From Zero to One: Complete Architecture Design for a Billion‑Scale Short‑Video System

This article dissects the end‑to‑end architecture of a billion‑scale short‑video platform, detailing layered design, core services such as upload, transcoding, recommendation, interaction, storage, and the key challenges of massive video storage, high‑concurrency streaming, low‑latency playback, and real‑time recommendation reliability.

High concurrencyMicroservicesStorage Optimization

0 likes · 19 min read

From Zero to One: Complete Architecture Design for a Billion‑Scale Short‑Video System

Java Tech Enthusiast

Nov 20, 2025 · Databases

When to Store NULL vs Default Values in MySQL: Row Format and Storage Implications

This article explains how MySQL stores rows, compares the four InnoDB row formats, and evaluates the trade‑offs of using NULL versus explicit default values for nullable columns, covering storage overhead, indexing behavior, and best‑practice design recommendations.

Database DesignInnoDBNULL

0 likes · 7 min read

When to Store NULL vs Default Values in MySQL: Row Format and Storage Implications

Su San Talks Tech

Nov 19, 2025 · Databases

When Should MySQL Store NULL vs a Default Value? A Deep Dive into Row Formats and Storage Impact

This article explains how MySQL stores rows using different InnoDB row formats, illustrates the internal layout of variable‑length columns and hidden fields, and compares the trade‑offs of defining columns as NOT NULL versus allowing NULL values, offering practical guidance for database design.

Database DesignInnoDBNULL

0 likes · 8 min read

When Should MySQL Store NULL vs a Default Value? A Deep Dive into Row Formats and Storage Impact

DataFunSummit

Sep 23, 2025 · Artificial Intelligence

How PCache Supercharges Large‑Scale AI Training Storage Performance

This talk explores large‑scale AI training storage challenges and presents PCache, a high‑performance, cloud‑native caching system that optimizes metadata, read/write paths, deployment, and high‑availability, delivering significant throughput gains and cost savings for massive model training workloads.

AI trainingCachingPCache

0 likes · 25 min read

How PCache Supercharges Large‑Scale AI Training Storage Performance

DataFunSummit

Sep 2, 2025 · Big Data

How Xiaomi Cuts Costs and Boosts Performance with Cloud‑Native Data Lake Architecture

Xiaomi’s engineers explain how they tackled data‑lake challenges—small files, metadata latency, and multi‑cloud costs—by combining compact storage, Gravitino‑based metadata governance, Iceberg and Paimon formats, and JuiceFS abstraction, achieving lower storage expenses, faster queries, and a roadmap toward intelligent, real‑time, multimodal lakehouses.

Big DataData LakeMulti-Cloud

0 likes · 14 min read

How Xiaomi Cuts Costs and Boosts Performance with Cloud‑Native Data Lake Architecture

Su San Talks Tech

Sep 1, 2025 · Backend Development

Build a Scalable Short‑Video System: Architecture, Storage, and Real‑Time Recommendations

This article dissects the architecture of a modern short‑video backend, covering layered system design, core services such as video production, distribution, interaction, storage strategies, real‑time and offline recommendation engines, high‑concurrency streaming solutions, and practical techniques for cost control, scalability, and fault tolerance.

High concurrencyStorage Optimizationbackend-architecture

0 likes · 22 min read

Build a Scalable Short‑Video System: Architecture, Storage, and Real‑Time Recommendations

MaGe Linux Operations

Aug 30, 2025 · Operations

Master RAID Configuration & Performance: From Beginner to Pro

This comprehensive guide walks you through RAID fundamentals, hardware and software configuration, performance tuning, cost‑benefit analysis, fault diagnosis, and real‑world case studies, providing actionable commands and best‑practice recommendations to help you boost storage performance and reliability by up to 300%.

HardwareRAIDSoftware

0 likes · 23 min read

Master RAID Configuration & Performance: From Beginner to Pro

DataFunSummit

Jul 20, 2025 · Big Data

How Beike Scaled to 600 PB: The Evolution of a Data‑Fusion Architecture

This article details Beike's data‑fusion architecture evolution, covering industry trends, multi‑stage Hadoop upgrades, storage cost optimization with erasure coding, remote shuffle integration, GPU‑centric training stability, and future hybrid‑cloud strategies, while also sharing organizational and operational lessons learned.

AICloud ComputingData Architecture

0 likes · 16 min read

How Beike Scaled to 600 PB: The Evolution of a Data‑Fusion Architecture

Alibaba Cloud Big Data AI Platform

Jul 15, 2025 · Big Data

How MaxCompute’s Append DeltaTable Transforms BigQuery Migration

This article details the complex migration of a leading Southeast Asian tech group's data warehouse from Google BigQuery to Alibaba Cloud MaxCompute, outlining challenges such as storage format differences, SQL compatibility, and performance tuning, and explains how the new Append DeltaTable format with dynamic bucketing and incremental reclustering resolves these issues.

Big DataData MigrationData Warehouse

0 likes · 19 min read

How MaxCompute’s Append DeltaTable Transforms BigQuery Migration

Alibaba Cloud Infrastructure

Jul 11, 2025 · Cloud Native

How Alibaba Cloud’s AI Infra Innovations Are Transforming Kubernetes Workloads

This article summarizes Alibaba Cloud’s key technical contributions at KubeCon China 2025, covering AI‑focused Kubernetes optimizations, Argo Workflows enhancements, storage strategies for large models, Fluid’s data orchestration, multi‑tenant security, and the RoleBasedGroup framework for PD‑separated AI inference.

AI InfrastructureArgo WorkflowsFluid

0 likes · 20 min read

How Alibaba Cloud’s AI Infra Innovations Are Transforming Kubernetes Workloads

360 Zhihui Cloud Developer

Jun 6, 2025 · Fundamentals

How Erasure Coding Cuts Storage Costs in Ozone: A Deep Dive

This article explains how Erasure Coding (EC) improves data reliability and dramatically reduces storage overhead in Ozone by leveraging hot‑cold data characteristics, intelligent tiering, dynamic EC ratios, and repair throttling, while also discussing performance trade‑offs and limitations.

Data ReliabilityOzoneStorage Optimization

0 likes · 9 min read

How Erasure Coding Cuts Storage Costs in Ozone: A Deep Dive

Kuaishou Tech

May 28, 2025 · Databases

Optimizing Kuaishou's Photo Object Storage: Reducing Size and Boosting Cache Hit Rate

This article details how Kuaishou dramatically cut storage costs and improved cache efficiency for its core Photo data object by cleaning up redundant JSON fields, applying selective serialization, and performing large‑scale data cleaning, achieving a 25% size reduction, a 2% cache‑hit increase, and multi‑hundred‑TB savings.

Cache Hit RateKuaishouPhoto Object

0 likes · 20 min read

Optimizing Kuaishou's Photo Object Storage: Reducing Size and Boosting Cache Hit Rate

Baidu Intelligent Cloud Tech Hub

Sep 25, 2024 · Big Data

How Cold‑Hot Data Separation Boosts Cost Efficiency in Baidu Palo for Apache Doris

This article explains the principles, configuration steps, monitoring metrics, leader selection, data migration granularity, compaction, invalid data cleanup, and cache mechanisms of cold‑hot data separation in Baidu Intelligent Cloud's Palo for Apache Doris, illustrating how tiered storage reduces costs while maintaining query performance.

Apache DorisData TieringPalo

0 likes · 21 min read

How Cold‑Hot Data Separation Boosts Cost Efficiency in Baidu Palo for Apache Doris

Data Thinking Notes

Jul 4, 2024 · Big Data

How Active Metadata Revolutionizes Data Governance and Cuts Costs

This article examines the growing challenges of data management—such as asset discoverability, architectural rigidity, development quality, and rising resource costs—and presents a comprehensive data‑governance framework that leverages standards, agile architecture, development isolation, and active‑metadata‑driven lifecycle evaluation to improve efficiency, reduce expenses, and enable intelligent, automated data back‑filling.

Big DataData GovernanceStorage Optimization

0 likes · 17 min read

How Active Metadata Revolutionizes Data Governance and Cuts Costs

DataFunSummit

Jun 3, 2024 · Big Data

Data Governance and Active Metadata Practices at JD Retail

The article outlines JD Retail's data management challenges—including asset awareness, architectural agility, development quality, and rising resource costs—and presents a comprehensive data governance framework that leverages data standards, agile architecture, development isolation, resource optimization, and active metadata to achieve intelligent lifecycle evaluation, automated back‑fill, and future‑oriented data fabric improvements.

Data GovernanceData LifecycleStorage Optimization

0 likes · 18 min read

Data Governance and Active Metadata Practices at JD Retail

JD Tech

Feb 21, 2024 · Operations

Storage Model Optimization and Performance Testing for Hot SKU Inventory Pre‑occupancy

This article explores practical performance testing and tuning techniques, focusing on storage model optimization and call‑chain analysis to improve hot‑SKU inventory pre‑occupancy throughput, presenting detailed pressure‑testing scenarios, results, cache‑layer redesign, and strategies for identifying and mitigating system bottlenecks.

CachingStorage OptimizationSystem Bottleneck

0 likes · 15 min read

Storage Model Optimization and Performance Testing for Hot SKU Inventory Pre‑occupancy

ITPUB

Jan 20, 2024 · Databases

How to Cut MySQL Storage Costs by 50%: A Systematic Approach

This article outlines a comprehensive, data‑driven methodology for reducing MySQL storage expenses—including background analysis, challenge identification, a nine‑grid systematic framework, benefit estimation, safety and stability verification, gray‑scale rollout, and rollback strategies—demonstrating over 50% disk space savings in a large‑scale billing system.

Data SafetyDatabase Cost ReductionGray Deployment

0 likes · 15 min read

How to Cut MySQL Storage Costs by 50%: A Systematic Approach

Didi Tech

Jan 9, 2024 · Big Data

Introducing Apache Pulsar: Technical Benefits and Solutions for Didi Big Data Messaging System

Apache Pulsar, a cloud‑native distributed messaging platform, solves Didi Big Data’s DKafka bottlenecks by separating compute and storage, using sequential log writes, heterogeneous disks, multi‑level caching, bundle‑based load balancing and automatic scaling, dramatically improving stability while introducing richer monitoring complexity.

Apache PulsarDKafkaMessaging System

0 likes · 17 min read

Introducing Apache Pulsar: Technical Benefits and Solutions for Didi Big Data Messaging System

dbaplus Community

Jan 7, 2024 · Databases

How to Cut MySQL Storage Costs by Over 50%: A Practical Framework

This article presents a systematic, nine‑grid method for reducing MySQL storage expenses—including table compression, JSON field serialization, and hot‑cold data separation—while quantifying benefits, ensuring data safety, and validating system stability through staged testing and SRE metrics.

Database Cost ReductionRisk ManagementStorage Optimization

0 likes · 13 min read

How to Cut MySQL Storage Costs by Over 50%: A Practical Framework

JD Retail Technology

Dec 28, 2023 · Databases

Methods and Practices for Reducing MySQL Database Storage Costs

This article outlines the background, challenges, systematic methods, benefit calculations, data‑safety and stability checks, verification steps, rollback strategies, and gray‑deployment practices for lowering MySQL storage expenses in large‑scale billing systems while maintaining system reliability.

Data SafetyDatabase Cost ReductionGray Deployment

0 likes · 12 min read

Methods and Practices for Reducing MySQL Database Storage Costs

Tencent Cloud Developer

Sep 20, 2023 · Operations

Storage Governance and Optimization Practices for Meeting Control Systems

The article explains how a meeting control system tackled severe storage pressure from high concurrent traffic by introducing a proxy layer, multi‑active disaster‑recovery, identity‑based data isolation, dynamic‑static key separation, multi‑level caching, overload protection, sharding with dual‑write migration, and extensive monitoring to meet 100k QPS and ensure reliability.

Disaster RecoveryRedisStorage Optimization

0 likes · 49 min read

Storage Governance and Optimization Practices for Meeting Control Systems

Liangxu Linux

Sep 18, 2023 · Operations

Why Your Disk Shows 30% Free Space Yet Won’t Write Files – Boost Inodes with ext4 -T small

The article explains why a Linux server can run out of inodes despite having plenty of free disk space, demonstrates how creating an ext4 filesystem with the -T small option dramatically increases inode count, and discusses the performance trade‑offs and quick fixes for inode exhaustion.

FilesystemStorage Optimizationext4

0 likes · 6 min read

Why Your Disk Shows 30% Free Space Yet Won’t Write Files – Boost Inodes with ext4 -T small

iQIYI Technical Product Team

Aug 25, 2023 · Big Data

Venus Log Platform Architecture Evolution: From ELK to Data Lake

The Venus log platform at iQiyi migrated from an ElasticSearch‑Kibana architecture to an Iceberg‑based data lake with Trino, cutting storage and compute costs by over 70%, boosting stability by 85%, and efficiently supporting billions of daily logs through write‑heavy, low‑query workloads.

Big DataElasticsearchIceberg

0 likes · 22 min read

Venus Log Platform Architecture Evolution: From ELK to Data Lake

Baidu Intelligent Cloud Tech Hub

Aug 22, 2023 · Fundamentals

How Baidu’s CDS Uses Erasure Coding to Cut Storage Costs and I/O Amplification

This article explains Baidu Intelligent Cloud's block storage (CDS) architecture, comparing fault‑tolerance methods, detailing the challenges of large‑scale erasure‑coded storage, and describing Baidu's two‑layer append‑engine solution that reduces I/O amplification while keeping costs low.

I/O amplificationStorage Optimizationappend engine

0 likes · 15 min read

How Baidu’s CDS Uses Erasure Coding to Cut Storage Costs and I/O Amplification

Didi Tech

May 26, 2023 · Big Data

Design and Optimization of Didi's Spatial‑Temporal Supply‑Demand System

Didi’s redesigned Spatial‑Temporal Supply‑Demand System replaces a single‑Redis bottleneck with a multi‑cluster routing layer, semantic sharding, multi‑level caching and delayed queues, achieving higher horizontal scalability, fault isolation, ~30 % latency reduction, increased cache hit rates, fewer query nodes, and faster, code‑free feature configuration.

Storage Optimizationconfiguration managementdistributed systems

0 likes · 19 min read

Design and Optimization of Didi's Spatial‑Temporal Supply‑Demand System

DataFunTalk

May 22, 2023 · Big Data

Alibaba Cloud Data Lake: Unified Metadata and Storage Management Practices

This article explains Alibaba Cloud's data lake architecture, unified metadata services, storage management optimizations, and format handling techniques, illustrating how lakehouse concepts, multi‑engine support, and lifecycle policies enable efficient, secure, and cost‑effective big data processing in the cloud.

Big DataCloud ServicesData Lake

0 likes · 22 min read

Alibaba Cloud Data Lake: Unified Metadata and Storage Management Practices

Coolpad Technology Team

Apr 27, 2023 · Cloud Computing

EROFS Cluster Mode Analysis in Linux Kernel 6.x

This article analyzes the EROFS cluster modes (INFLIGHT, HOOKED, FOLLOWED, FOLLOWED_NOINPLACE) in Linux kernel 6.x, explaining how they determine whether in-place I/O can be used based on the current status of pclusters in the chain.

Cloud ComputingCluster ModesEROFS

0 likes · 6 min read

EROFS Cluster Mode Analysis in Linux Kernel 6.x

Data Thinking Notes

Apr 19, 2023 · Big Data

How Bilibili Transformed Big Data Governance: From Reactive Storage Management to Proactive Multi‑Dimensional Control

This article details Bilibili's evolution of big data governance, describing the early data growth challenges, the launch of the "Wanglou" project, the development of asset metadata and governance indicator frameworks, storage cost reduction strategies, scoring models, and the shift from passive, single‑point fixes to proactive, multi‑dimensional governance across the organization.

Big DataBilibiliData Governance

0 likes · 22 min read

How Bilibili Transformed Big Data Governance: From Reactive Storage Management to Proactive Multi‑Dimensional Control

Bilibili Tech

Apr 11, 2023 · Big Data

Bilibili Big Data Governance: From Reactive Storage Management to Proactive Multi‑Dimensional Governance

Bilibili’s exabyte‑scale big‑data platform, after rapid growth created fragmented ownership and costly storage, launched the Wanglou project to build a metadata‑driven, indicator‑based governance framework that cut storage use by half, introduced compliance scoring and automation, and now plans to extend proactive, multi‑dimensional governance to compute, traffic and lake‑house resources.

BilibiliData GovernanceStorage Optimization

0 likes · 21 min read

Bilibili Big Data Governance: From Reactive Storage Management to Proactive Multi‑Dimensional Governance

DeWu Technology

Feb 15, 2023 · Backend Development

E-commerce Product Ranking System Migration: Technical Implementation and Storage Optimization

The article describes how an e‑commerce product ranking system was migrated to the new “Liao Yue” platform, decoupling it from the search module, introducing fresh metrics and Elasticsearch‑based sorting, then optimizing storage by separating B‑end and C‑end data—cutting costs 60%—with a gray‑scale rollout, dual‑read validation, rollback safeguards, and completing the two‑week, zero‑failure migration that delivered a closed‑loop, faster iteration system.

Backend DevelopmentElasticsearchHigh concurrency

0 likes · 15 min read

E-commerce Product Ranking System Migration: Technical Implementation and Storage Optimization

Architecture Digest

Aug 14, 2022 · Big Data

Replacing Classic Data Warehouse Dimensional Model with a Single Wide Table: Architecture, Benefits, and Challenges

This article analyzes the shift from traditional multi‑layer data warehouse dimensional modeling to a single-layer wide‑table approach, detailing business drivers, technical architecture, storage and query performance gains, as well as the development, maintenance, and operational challenges involved.

Storage OptimizationWide Tablequery performance

0 likes · 10 min read

Replacing Classic Data Warehouse Dimensional Model with a Single Wide Table: Architecture, Benefits, and Challenges

vivo Internet Technology

Jun 29, 2022 · Big Data

Lossless Image Compression Overview and Lepton Optimization for Large‑Scale Storage

The article explains JPEG’s lossy fundamentals, introduces Lepton’s lossless layer and its optimizations—such as arithmetic coding and multithreaded Huffman switching—and describes how vivo’s hybrid physical‑server and Kubernetes deployment achieves roughly 22 % storage reduction across petabytes of JPEG images despite high CPU demands.

Huffman codingJPEGLepton

0 likes · 13 min read

Lossless Image Compression Overview and Lepton Optimization for Large‑Scale Storage

Baidu Geek Talk

Jun 15, 2022 · Big Data

Replacing Classic Data Warehouse with a One‑Layer Wide Table Model: Architecture, Benefits, and Challenges

The article proposes replacing the traditional multi‑layered data‑warehouse architecture (ODS‑DWD‑DWS‑ADS) with a single, column‑store wide‑table per business theme, achieving roughly 30 % storage savings and faster queries, while acknowledging higher ETL complexity, back‑tracking costs, and production timing challenges.

Big DataData WarehouseETL

0 likes · 11 min read

Replacing Classic Data Warehouse with a One‑Layer Wide Table Model: Architecture, Benefits, and Challenges

Zuoyebang Tech Team

May 13, 2022 · Operations

Build a Scalable, Cost‑Effective Log Retrieval System Without Elasticsearch

This article explains how to design a high‑performance, low‑cost log retrieval architecture that avoids Elasticsearch by partitioning logs into time‑based chunks, indexing only metadata, using multi‑tier storage (local, remote, archive), and orchestrating queries through GD‑Search, Local‑Search, Remote‑Search and Log‑Manager components.

Storage Optimizationcost efficiencydistributed systems

0 likes · 14 min read

Build a Scalable, Cost‑Effective Log Retrieval System Without Elasticsearch

ByteDance Data Platform

Apr 27, 2022 · Big Data

How ByteDance Built a Scalable Data Catalog: Key Technologies and Future Plans

ByteDance’s Data Catalog article details the system’s unified metadata model, standardized ingestion connectors, search optimization techniques, lineage capabilities, and storage layer enhancements, highlighting key technical designs, performance improvements, and future work to advance data governance and asset utilization.

Data CatalogStorage Optimizationdata lineage

0 likes · 12 min read

How ByteDance Built a Scalable Data Catalog: Key Technologies and Future Plans

DeWu Technology

Apr 18, 2022 · Artificial Intelligence

Warehouse Storage Location Recommendation: Architecture, Recall, and Ranking Strategies

The article outlines DeWu’s warehouse‑management recommendation system, which combines an online‑near‑line‑offline architecture to quickly recall viable shelf slots and rank them by space utilization, travel time, and sales potential, enabling automated, constraint‑aware placement that cuts picking time and inventory costs.

AIBig DataRanking

0 likes · 16 min read

Warehouse Storage Location Recommendation: Architecture, Recall, and Ranking Strategies

Bilibili Tech

Mar 30, 2022 · Big Data

HDFS Architecture, Optimizations, and Future Plans at Bilibili

Bilibili’s HDFS now runs a three‑tier architecture—access, metadata, and data layers—enhanced with a custom MergeFS router, observer NameNode, dynamic load balancing, fast‑failover pipelines, and storage‑aware policies, while future work targets transparent erasure coding, tiered data routing, lock refinements, and a Hadoop 3.x migration.

Big DataDistributed File SystemHDFS

0 likes · 22 min read

HDFS Architecture, Optimizations, and Future Plans at Bilibili

Java Architect Essentials

Feb 27, 2022 · Backend Development

Designing Efficient Read/Unread Tracking for Group Chat Messages Using Bitmaps

The article analyzes the memory overhead of naïvely storing per‑user read/unread lists for group chat messages and proposes a bitmap‑based scheme with user‑to‑map ID mapping, quit flags, and compact storage that can reduce per‑message space by over 95 percent.

Storage Optimizationbackendbitmap

0 likes · 6 min read

Designing Efficient Read/Unread Tracking for Group Chat Messages Using Bitmaps

dbaplus Community

Jan 12, 2022 · Big Data

How ClickHouse Powers YiBei's Scalable Advertising Data Platform

This article details YiBei's advertising data platform built on ClickHouse, covering business requirements, why ClickHouse was chosen over Druid, storage engine and compression choices, real‑time and offline ingestion pipelines, partitioning, Zookeeper bottlenecks, atomic data replacement, and testing and release strategies for a high‑throughput, low‑latency ad analytics system.

AdvertisingLambda architectureReal-time

0 likes · 28 min read

How ClickHouse Powers YiBei's Scalable Advertising Data Platform

Programmer DD

Dec 22, 2021 · Backend Development

Designing a Space‑Efficient Read/Unread Store for Large Group Chats

This article examines how to model read/unread status for group chat messages using bitmap techniques, evaluates memory costs of naive approaches, and presents a compact storage schema that handles member joins, exits, and scalability while drastically reducing per‑message overhead.

Storage Optimizationbackend designbitmap

0 likes · 7 min read

Designing a Space‑Efficient Read/Unread Store for Large Group Chats

Selected Java Interview Questions

Oct 15, 2021 · Backend Development

Understanding Message Queues: Concepts, Models, and Storage in MQ, Kafka, and RocketMQ

This article explains the fundamentals of message queues, describing how MQ, Kafka, and RocketMQ handle asynchronous processing, traffic shaping, and decoupling through topics, partitions, consumer groups, and storage mechanisms such as sequential writes and page‑cache optimization.

Message QueueRocketMQStorage Optimization

0 likes · 10 min read

Understanding Message Queues: Concepts, Models, and Storage in MQ, Kafka, and RocketMQ

Big Data Technology Architecture

Jul 15, 2021 · Big Data

Building Data Lake Solutions with Iceberg and Object Storage: Architecture, Write/Read Processes, and Storage Optimization

This article presents a comprehensive overview of using Apache Iceberg with object storage to construct scalable data lake solutions, covering lake architecture, Iceberg table organization, Flink‑based write and read workflows, catalog abstractions, object storage versus HDFS comparisons, append‑upload and atomic‑commit challenges, a demonstration setup, and ideas for storage optimization.

CatalogFlinkIceberg

0 likes · 16 min read

Building Data Lake Solutions with Iceberg and Object Storage: Architecture, Write/Read Processes, and Storage Optimization

Big Data Technology Architecture

Mar 25, 2021 · Big Data

Implementing Erasure Coding in HDFS: Migration, Testing, and Data Lifecycle Management at JD

This article details JD's end‑to‑end implementation of HDFS erasure coding, covering the migration from replication to EC, the three‑phase upgrade and rollback process, comprehensive automated testing, a custom data‑lifecycle management system for hot‑warm‑cold data, and multi‑layer integrity safeguards to achieve significant storage cost reduction while maintaining reliability.

Data LifecycleHDFSStorage Optimization

0 likes · 17 min read

Implementing Erasure Coding in HDFS: Migration, Testing, and Data Lifecycle Management at JD

JD Tech

Mar 20, 2021 · Big Data

Implementing Erasure Coding in HDFS: Migration Strategy, Testing Framework, and Data Lifecycle Management

This article details JD's practical experience migrating HDFS to erasure coding, covering the decision between upgrade and porting, the step‑by‑step upgrade and rollback procedures, automated testing, a custom data‑lifecycle management system for hot‑warm‑cold data, and comprehensive data‑integrity safeguards to achieve significant storage cost reductions while maintaining production reliability.

Data Lifecycle ManagementHDFSStorage Optimization

0 likes · 17 min read

Implementing Erasure Coding in HDFS: Migration Strategy, Testing Framework, and Data Lifecycle Management

Tencent Cloud Developer

Dec 7, 2020 · Big Data

Searchable Snapshots in Elasticsearch 7.10: Features, Usage, and Future Outlook

Elasticsearch 7.10 adds searchable snapshots, letting users query indices stored directly in remote repositories such as S3 or COS, which halves storage costs, decouples storage from compute, supports manual mounting and ILM cold‑phase policies, and promises future full storage‑compute separation without local caching.

Big DataData TieringElasticsearch

0 likes · 12 min read

Searchable Snapshots in Elasticsearch 7.10: Features, Usage, and Future Outlook

Java Backend Technology

Sep 26, 2020 · Databases

Master Redis Bitmaps: SETBIT, GETBIT, BITCOUNT, BITOP Explained

This article introduces Redis's advanced bitmap capabilities, detailing the SETBIT, GETBIT, BITCOUNT, and BITOP commands, their syntax, underlying SDS data structure, performance characteristics, storage calculations, and practical use cases such as user sign‑in tracking and online status monitoring.

RedisStorage Optimizationbit operations

0 likes · 8 min read

Master Redis Bitmaps: SETBIT, GETBIT, BITCOUNT, BITOP Explained

Didi Tech

Mar 31, 2020 · Big Data

Elasticsearch Version Upgrade: Architecture, Challenges, and Performance Optimization at Didi

Over seven months, Didi’s Elasticsearch team upgraded more than 30 clusters, 2,000 nodes and 4 PB of data from version 2.3.3 to 6.6.1, overcoming protocol and mapping incompatibilities with a multi‑version Arius Gateway, custom Java SDK, ECM and AMS, while saving 1 PB of storage, decommissioning 400 machines, boosting query speed by 40 %, write throughput by 30 % and cutting CPU use 10 % for an estimated 80 w/month cost reduction.

ElasticsearchPerformance OptimizationStorage Optimization

0 likes · 18 min read

Elasticsearch Version Upgrade: Architecture, Challenges, and Performance Optimization at Didi

Tencent Cloud Middleware

Mar 6, 2020 · Operations

Choosing the Right Disk Strategy for High‑Throughput Kafka Clusters

This article examines how to select and configure disk solutions—single‑disk, multi‑directory, RAID, and LVM—for Apache Kafka deployments, comparing performance, cost, scalability, and reliability to help operators build stable, high‑throughput messaging infrastructures.

Big DataCloud ComputingDisk Design

0 likes · 16 min read

Choosing the Right Disk Strategy for High‑Throughput Kafka Clusters

Architects' Tech Alliance

Jul 28, 2019 · Big Data

Alluxio: A Virtual Distributed File System for Unified Big Data Access and Cost‑Effective Storage

The article explains how Alluxio, a memory‑speed virtual distributed file system, acts as a virtual data lake to unify access to structured and unstructured big‑data across heterogeneous storage systems, offering on‑demand fast local access, intelligent caching, reduced storage costs, and enterprise‑grade security and fault tolerance.

AlluxioBig DataCaching

0 likes · 15 min read

Alluxio: A Virtual Distributed File System for Unified Big Data Access and Cost‑Effective Storage

Architects' Tech Alliance

Jul 20, 2019 · Industry Insights

Why Continuous Data Protection Is the Future of Enterprise Backup

The article analyzes the evolution of data protection—from manual copies and scripts to snapshots, Continuous Data Protection (CDP), and Copy Data Management (CDM)—highlighting their technical mechanisms, benefits such as near‑zero RPO, implementation models, vendor landscape, and key considerations for selecting the right solution in modern cloud‑centric environments.

Backup StrategiesContinuous Data ProtectionCopy Data Management

0 likes · 20 min read

Why Continuous Data Protection Is the Future of Enterprise Backup

dbaplus Community

Apr 25, 2019 · Big Data

Cutting Hadoop Storage Costs: Replication, Compression, Tiering & Erasure Coding

This article shares practical strategies used in a multi‑petabyte Hadoop environment to slash storage expenses, covering reduced replication, selective compression formats, tiered storage policies, and erasure coding, while weighing trade‑offs in reliability, performance, and operational complexity.

HDFSHadoopStorage Optimization

0 likes · 10 min read

Cutting Hadoop Storage Costs: Replication, Compression, Tiering & Erasure Coding

58 Tech

Mar 27, 2019 · Databases

OpenTSDB Architecture, Data Model, Storage Optimizations, and Practical Use Cases

This article introduces OpenTSDB as a distributed, scalable time‑series database built on HBase, explains its architecture, data model, and storage optimizations, presents real‑world monitoring use cases, analyzes performance issues caused by high‑cardinality tags, and details the solution steps taken to restore query speed.

HBaseOpenTSDBStorage Optimization

0 likes · 9 min read

OpenTSDB Architecture, Data Model, Storage Optimizations, and Practical Use Cases

iQIYI Technical Product Team

Jan 4, 2019 · Artificial Intelligence

Building a Deep Learning Training Platform on Cloud: Challenges, Runonce Service, and Storage Optimization

iQIYI built a cloud‑based deep‑learning training platform called Jarvis, replacing the initial Runonce service, by containerizing GPU tasks, adopting Ceph S3 storage with FUSE, optimizing data pipelines, and addressing compute, storage, and networking challenges to improve scalability and reduce GPU idle time.

AI trainingGPU computingStorage Optimization

0 likes · 9 min read

Building a Deep Learning Training Platform on Cloud: Challenges, Runonce Service, and Storage Optimization

Architects' Tech Alliance

Nov 5, 2018 · Big Data

Alluxio as a Virtual Distributed File System for Data Lake Solutions

The article explains how Alluxio provides a virtual distributed file system that acts as a "virtual data lake," enabling unified, high‑performance access to structured and unstructured data across heterogeneous storage back‑ends while reducing storage costs through intelligent caching and eliminating the need for permanent data copies.

AlluxioBig DataCaching

0 likes · 16 min read

Alluxio as a Virtual Distributed File System for Data Lake Solutions

Tencent Cloud Developer

Nov 2, 2018 · Operations

Mastering Elasticsearch: Practical Tuning Strategies for Performance and Cost

This article shares a detailed, experience‑driven guide on Elasticsearch tuning, covering data model fundamentals, storage cost reductions, cluster stability tricks, performance‑boosting settings, and custom kernel improvements, all illustrated with real‑world diagrams and Q&A insights.

Cluster stabilityOperationsStorage Optimization

0 likes · 15 min read

Mastering Elasticsearch: Practical Tuning Strategies for Performance and Cost

JD Tech

Sep 20, 2018 · Big Data

Optimizing Local Storage Systems for Large‑Scale Hadoop HDFS Clusters

This article explains the architecture of Hadoop HDFS, identifies performance bottlenecks in page cache and metadata handling on DataNodes, and presents four practical optimization techniques—including cache‑buffer separation, barrier disabling, directory restructuring, and real‑time monitoring—demonstrating significant throughput and latency improvements in large‑scale clusters.

HDFSHadoopLinux kernel

0 likes · 14 min read

Optimizing Local Storage Systems for Large‑Scale Hadoop HDFS Clusters

Efficient Ops

Sep 9, 2018 · Operations

How to Evaluate and Optimize System I/O Performance: Models, Tools, and Best Practices

This article explains how to assess I/O capabilities, choose appropriate evaluation tools, monitor key performance metrics, and apply targeted optimization techniques for both disk and network I/O to improve system throughput and latency.

IO performanceMonitoring ToolsStorage Optimization

0 likes · 18 min read

How to Evaluate and Optimize System I/O Performance: Models, Tools, and Best Practices

Architects' Tech Alliance

Aug 7, 2018 · Operations

Lustre Performance Optimization Guide

This article provides a comprehensive guide to optimizing Lustre, the leading open‑source parallel file system for high‑performance computing, covering network bandwidth, stripe settings, client configuration, RAID choices, small‑file handling, and practical system commands to improve aggregate I/O performance.

HPCLustrePerformance Tuning

0 likes · 8 min read

Beike Product & Technology

Mar 9, 2018 · Big Data

Design and Implementation of Transparent Compression for Hadoop Using ZFS

The article presents a comprehensive solution for reducing Hadoop cluster storage consumption by applying ZFS‑based transparent compression and data‑governance techniques, detailing the technical background, design choices, implementation steps, performance optimizations, and observed storage savings.

Big DataData GovernanceHadoop

0 likes · 12 min read

Design and Implementation of Transparent Compression for Hadoop Using ZFS

Ctrip Technology

Feb 28, 2018 · Big Data

Using Alluxio to Mitigate HDFS Maintenance Impact on Real-Time Jobs in Ctrip's Big Data Platform

The article explains how Ctrip's big‑data platform introduced Alluxio to isolate real‑time Spark Streaming jobs from HDFS NameNode maintenance, reduce NameNode pressure, improve Spark SQL performance, and provide a unified storage layer across multiple HDFS clusters.

AlluxioBig DataData Lake

0 likes · 9 min read

Using Alluxio to Mitigate HDFS Maintenance Impact on Real-Time Jobs in Ctrip's Big Data Platform

MaGe Linux Operations

Aug 3, 2017 · Operations

How to Boost Linux Server Performance by Tuning I/O Schedulers

This guide explains why Linux I/O scheduler selection matters for virtualized servers, compares deadline, CFQ, noop and anticipatory schedulers, and shows how to configure them globally or per‑disk to improve storage performance in modern data‑center environments.

I/O schedulerLinuxSAN

0 likes · 6 min read

How to Boost Linux Server Performance by Tuning I/O Schedulers

Architects' Tech Alliance

Jul 14, 2017 · Industry Insights

How a New ‘Non‑Balance’ Wear‑Leveling Algorithm Can Triple SSD Lifespan

The article explains the background of flash‑memory wear‑leveling, reviews common garbage‑collection strategies, compares classic algorithms such as Greedy, Cost‑Benefit, CAT and CICL, and introduces the Non‑Balance method that evaluates real block endurance to extend SSD life up to three times.

Garbage CollectionSSDStorage Optimization

0 likes · 11 min read

How a New ‘Non‑Balance’ Wear‑Leveling Algorithm Can Triple SSD Lifespan

Alibaba Cloud Developer

Jan 24, 2017 · Fundamentals

How Alibaba Engineers Custom Processors and SSDs for World‑Scale Data Centers

This article examines Alibaba's strategies for designing and deploying custom x86 processors, SSDs, high‑performance storage solutions, and next‑generation chassis to meet the massive, diverse demands of its global data‑center infrastructure while optimizing cost, energy efficiency, and scalability.

AlibabaData CenterSSD

0 likes · 17 min read

How Alibaba Engineers Custom Processors and SSDs for World‑Scale Data Centers

dbaplus Community

Nov 14, 2016 · Databases

Why Oracle Log File Sync Bottlenecks Appear and How to Eliminate Them

During high‑concurrency flash‑sale events, Oracle’s log file sync became a performance bottleneck; the article analyzes storage, OS, and Oracle Disk Manager factors, presents AWR metrics, demonstrates tuning steps—including disabling adaptive log file sync and enabling ODM—and shows measurable latency reductions.

Database PerformanceLog File SyncODM

0 likes · 18 min read

Why Oracle Log File Sync Bottlenecks Appear and How to Eliminate Them

Art of Distributed System Architecture Design

Apr 25, 2016 · Backend Development

Twitter’s Media Platform: Scaling Image Uploads and Storage Optimizations

The talk by Twitter engineer Henna Kermani outlines how the Media Platform’s decoupled, resumable upload pipeline, handle‑based storage, TTL policies, on‑demand processing, and Progressive JPEG adoption enabled processing 3,000 images per second while cutting storage, compute costs and improving operational flexibility.

Backend EngineeringImage UploadMedia Platform

0 likes · 4 min read

Twitter’s Media Platform: Scaling Image Uploads and Storage Optimizations

21CTO

Mar 15, 2016 · Databases

How Database Compression Boosts Performance While Cutting Storage Costs

This article examines why storage capacity limits IT systems, explains how database compression reduces disk usage and I/O time, discusses various dictionary‑based compression methods, related operations and commands, and evaluates compression ratios and overall impact on system performance.

DB2I/O reductionPerformance Tuning

0 likes · 11 min read

How Database Compression Boosts Performance While Cutting Storage Costs

Tencent TDS Service

Mar 3, 2016 · Mobile Development

Why Android Phones Slow Down: Flash Fragmentation, Write Amplification, and TRIM Solutions

This article investigates how long‑term use of Android devices leads to storage fragmentation and write amplification, explains the underlying NAND flash mechanisms, and evaluates TRIM‑based solutions such as discard and fstrim for restoring I/O performance.

AndroidStorage Optimizationflash memory

0 likes · 14 min read

Why Android Phones Slow Down: Flash Fragmentation, Write Amplification, and TRIM Solutions

Architects' Tech Alliance

Jan 26, 2016 · Fundamentals

Understanding Data Deduplication: Definitions, Classifications, and Its Relationship with Compression

This article explains data deduplication technology, its definition, various classification schemes based on execution time, block size, granularity, and location, and compares it with data compression, highlighting how both techniques can be combined to maximize storage savings.

HashingStorage Optimizationbackup

0 likes · 6 min read

Understanding Data Deduplication: Definitions, Classifications, and Its Relationship with Compression

Art of Distributed System Architecture Design

Jul 10, 2015 · Big Data

Improving Hive Storage Efficiency: From RCFile to ORCFile at Facebook

Facebook’s data warehouse, storing over 300 PB and growing by 600 TB daily, transitioned from the RCFile format to an optimized ORCFile implementation, achieving 5‑8× better compression and up to three‑fold faster write performance while maintaining high read efficiency.

Big DataFacebookHive

0 likes · 14 min read

Improving Hive Storage Efficiency: From RCFile to ORCFile at Facebook

Baidu Tech Salon

Apr 30, 2014 · Backend Development

Logical Coupling, Service Layer Design, and Distributed System Architecture for Large-Scale Web Applications

The article examines the inevitability of service coupling in large‑scale web applications and proposes a two‑dimensional architecture that separates business and logic layers, uses internal data stores, introduces a naming‑and‑location service, selects appropriate transport and RPC protocols, and automates operations with health checks, load balancing, and failover to achieve continuous reliability.

JavaNaming ServiceOperations

0 likes · 29 min read

Logical Coupling, Service Layer Design, and Distributed System Architecture for Large-Scale Web Applications