Tag

Erasure Coding

0 views collected around this technical thread.

360 Zhihui Cloud Developer
360 Zhihui Cloud Developer
Jun 6, 2025 · Fundamentals

How Erasure Coding Cuts Storage Costs in Ozone: A Deep Dive

This article explains how Erasure Coding (EC) improves data reliability and dramatically reduces storage overhead in Ozone by leveraging hot‑cold data characteristics, intelligent tiering, dynamic EC ratios, and repair throttling, while also discussing performance trade‑offs and limitations.

Data ReliabilityErasure CodingOzone
0 likes · 9 min read
How Erasure Coding Cuts Storage Costs in Ozone: A Deep Dive
DataFunSummit
DataFunSummit
Jan 16, 2025 · Big Data

Zhihu Big Data Cost‑Reduction Practices: FinOps, Erasure Coding, ZSTD Compression, Spark Auto‑Tuning, and Remote Shuffle Service

This article details Zhihu's comprehensive cost‑reduction and efficiency‑boosting initiatives for its big‑data platform, covering FinOps‑driven financial operations, hybrid‑cloud architecture, cost allocation models, operational monitoring, and technical optimizations such as erasure coding, ZSTD compression, Spark auto‑tuning, and a remote shuffle service.

Big DataCloud Cost ManagementCost Optimization
0 likes · 22 min read
Zhihu Big Data Cost‑Reduction Practices: FinOps, Erasure Coding, ZSTD Compression, Spark Auto‑Tuning, and Remote Shuffle Service
Baidu Tech Salon
Baidu Tech Salon
Nov 8, 2024 · Cloud Computing

Design and Evolution of Baidu Canghai Storage Unified Technology Stack

Baidu Canghai Storage’s unified technology stack—comprising a meta‑aware distributed metadata layer, a hybrid single‑node‑distributed namespace, and an online erasure‑coding data layer—delivers AI‑driven, high‑performance, low‑cost, ZB‑scale cloud storage by modularizing metadata, namespace, and data services for object, file, and block workloads.

BaiduErasure CodingMetadata Architecture
0 likes · 16 min read
Design and Evolution of Baidu Canghai Storage Unified Technology Stack
Baidu Geek Talk
Baidu Geek Talk
Nov 6, 2024 · Cloud Computing

Baidu Canghai Storage Unified Technology Base: Architecture and Evolution of Metadata, Namespace, and Data Layers

Baidu’s Canghai Storage unifies metadata, hierarchical namespace, and data layers into a Meta‑Aware, three‑generation architecture that scales to trillions of metadata items and zettabyte‑scale data, using a distributed transactional KV store, single‑machine‑distributed namespace, and online erasure‑coding micro‑services to deliver high performance, low cost, and seamless scalability.

Big DataErasure CodingNewSQL
0 likes · 18 min read
Baidu Canghai Storage Unified Technology Base: Architecture and Evolution of Metadata, Namespace, and Data Layers
DataFunTalk
DataFunTalk
Aug 30, 2023 · Big Data

Design and Implementation of Baidu Cloud Block Storage EC System for Large‑Scale Data

This article presents Baidu Cloud's block storage architecture, comparing replication and erasure‑coding fault‑tolerance methods, detailing the challenges of applying EC to mutable block data, and describing a two‑layer append‑engine solution with selective 3‑replica caching, cost‑benefit compaction, and performance optimizations for low‑cost, high‑throughput storage.

Append EngineBig DataCompaction
0 likes · 14 min read
Design and Implementation of Baidu Cloud Block Storage EC System for Large‑Scale Data
vivo Internet Technology
vivo Internet Technology
Jun 7, 2023 · Big Data

Erasure Coding Technology in the Evolution of Vivo Storage Systems

Combining academic advances and industry practice, the article surveys erasure‑coding techniques, then details Vivo’s optimized storage stack—enhancing Reed‑Solomon with bit‑matrix scheduling, parallel cross‑AZ repair, LRC and MSR layers, and intermediate‑result optimization—to achieve high reliability while minimizing bandwidth and storage overhead.

Data RedundancyDistributed StorageErasure Coding
0 likes · 48 min read
Erasure Coding Technology in the Evolution of Vivo Storage Systems
Bilibili Tech
Bilibili Tech
Mar 14, 2023 · Big Data

Bilibili HDFS Erasure Coding Strategy and Implementation

Bilibili reduced petabyte‑scale storage costs by back‑porting erasure‑coding patches to its HDFS 2.8.4 cluster, deploying a parallel EC‑enabled cluster, adding a data‑proxy service, intelligent routing and block‑checking, and automating cold‑data migration, while noting write overhead and planning native acceleration.

Big DataData ReliabilityErasure Coding
0 likes · 14 min read
Bilibili HDFS Erasure Coding Strategy and Implementation
DataFunSummit
DataFunSummit
Feb 12, 2023 · Big Data

Applying Erasure Coding in HDFS: Strategies, Performance, and Repair Techniques

This article explains how Zhihu adopted HDFS erasure coding to reduce storage costs, outlines cold‑hot file tiering policies, describes the EC conversion workflow and the custom EC Worker tool, and details methods for detecting and repairing damaged EC files in a Hadoop environment.

Big DataErasure CodingHDFS
0 likes · 16 min read
Applying Erasure Coding in HDFS: Strategies, Performance, and Repair Techniques
DataFunTalk
DataFunTalk
Jul 4, 2022 · Big Data

Apache Ozone: Architecture, Advantages, and New Features Overcoming HDFS Limitations

This article explains the shortcomings of HDFS at large scale, describes the Federation and Scaling approaches, and details how Apache Ozone redesigns metadata storage, introduces container abstraction, object semantics, and new features such as optimized OM, streaming writes, erasure coding, and RocksDB consolidation to improve scalability and performance.

Apache OzoneErasure CodingHDFS
0 likes · 11 min read
Apache Ozone: Architecture, Advantages, and New Features Overcoming HDFS Limitations
Bilibili Tech
Bilibili Tech
May 20, 2022 · Backend Development

Design and Implementation of Bilibili Object Storage Service (BOSS): Architecture, Topology, Metadata, Erasure Coding, and Scaling

The article chronicles Bilibili’s 13‑day development of BOSS, a custom object storage service, detailing how it replaces MySQL‑based routing and ID generation with replicated etcd or Raft KV stores, models metadata via protobuf, adopts erasure coding and a Bitcask‑style engine, and implements safe delete, replica repair, and horizontal scaling for a resilient large‑scale system.

Erasure Codingbackend architecturedistributed systems
0 likes · 28 min read
Design and Implementation of Bilibili Object Storage Service (BOSS): Architecture, Topology, Metadata, Erasure Coding, and Scaling
Kuaishou Big Data
Kuaishou Big Data
Oct 28, 2021 · Big Data

How Kuaishou Cut Object Storage Costs by 50% with LRC Erasure Coding

Kuaishou reduced half of its massive object storage expenses by redesigning its architecture to use HBase indexing, HDFS large‑file storage, MemoryCache, and a cross‑IDC LRC erasure‑coding warm layer that maintains disaster‑recovery while dynamically moving data from hot to warm to cold tiers.

Big DataErasure CodingKuaishou
0 likes · 12 min read
How Kuaishou Cut Object Storage Costs by 50% with LRC Erasure Coding
DataFunTalk
DataFunTalk
Aug 11, 2021 · Big Data

OPPO CBFS: Architecture and Key Technologies of a Scalable Data Lake Storage System

This article introduces OPPO's self‑developed data lake storage system CBFS, covering the fundamentals of data lake storage, the multi‑layer CBFS architecture, its core technologies such as metadata management and erasure coding, and future directions for large‑scale, low‑cost data analytics.

CBFSErasure Codingbig data storage
0 likes · 14 min read
OPPO CBFS: Architecture and Key Technologies of a Scalable Data Lake Storage System
58 Tech
58 Tech
May 28, 2021 · Big Data

Practical Upgrade Experience of Hadoop 3.2.1 in 58.com Data Platform: HDFS, YARN, and MR3

This article details the end‑to‑end upgrade of a 5000‑node Hadoop 2.6.0 cluster to Hadoop 3.2.1 at 58.com, covering HDFS migration, RBF and EC adoption, Yarn federation and rolling upgrades, MR3 integration, extensive compatibility testing, and operational lessons learned for large‑scale big‑data platforms.

Big DataCluster UpgradeErasure Coding
0 likes · 19 min read
Practical Upgrade Experience of Hadoop 3.2.1 in 58.com Data Platform: HDFS, YARN, and MR3
Big Data Technology Architecture
Big Data Technology Architecture
Mar 25, 2021 · Big Data

Implementing Erasure Coding in HDFS: Migration, Testing, and Data Lifecycle Management at JD

This article details JD's end‑to‑end implementation of HDFS erasure coding, covering the migration from replication to EC, the three‑phase upgrade and rollback process, comprehensive automated testing, a custom data‑lifecycle management system for hot‑warm‑cold data, and multi‑layer integrity safeguards to achieve significant storage cost reduction while maintaining reliability.

Big DataErasure CodingHDFS
0 likes · 17 min read
Implementing Erasure Coding in HDFS: Migration, Testing, and Data Lifecycle Management at JD
JD Tech
JD Tech
Mar 20, 2021 · Big Data

Implementing Erasure Coding in HDFS: Migration Strategy, Testing Framework, and Data Lifecycle Management

This article details JD's practical experience migrating HDFS to erasure coding, covering the decision between upgrade and porting, the step‑by‑step upgrade and rollback procedures, automated testing, a custom data‑lifecycle management system for hot‑warm‑cold data, and comprehensive data‑integrity safeguards to achieve significant storage cost reductions while maintaining production reliability.

Cluster UpgradeData Lifecycle ManagementErasure Coding
0 likes · 17 min read
Implementing Erasure Coding in HDFS: Migration Strategy, Testing Framework, and Data Lifecycle Management
Architects' Tech Alliance
Architects' Tech Alliance
Jan 25, 2021 · Fundamentals

Ceph Storage Architecture Overview and Detailed Technical Features

This article provides a comprehensive technical overview of Red Hat Ceph, covering its distributed object storage design, cluster architecture, storage pools, authentication, placement groups, CRUSH algorithm, I/O operations, replication, erasure coding, internal management tasks, high availability, client interfaces, data striping, and encryption mechanisms.

CRUSHCephData Striping
0 likes · 42 min read
Ceph Storage Architecture Overview and Detailed Technical Features
Didi Tech
Didi Tech
Jan 22, 2021 · Big Data

Erasure Coding Practice in HDFS at Didi: Principles, Implementation, and Lessons Learned

Didi migrated HDFS to Hadoop 3.2 and implemented erasure coding—using XOR and Reed‑Solomon RS(6,3) striping—to replace three‑replica storage for cold data, building back‑ported clients, automated conversion tools, and cross‑datacenter backup pipelines, while addressing operational bugs and noting performance trade‑offs.

Big DataDidiErasure Coding
0 likes · 11 min read
Erasure Coding Practice in HDFS at Didi: Principles, Implementation, and Lessons Learned
Architects' Tech Alliance
Architects' Tech Alliance
Nov 9, 2020 · Cloud Computing

Ceph Storage Architecture: Overview, Cluster Design, Client Interfaces, and Encryption

This article provides a comprehensive technical overview of Red Hat Ceph, covering its distributed storage architecture, cluster components, storage pool types, authentication, placement algorithms, I/O paths, replication and erasure‑coding strategies, internal management operations, high‑availability mechanisms, client libraries, data striping, and encryption details.

CRUSHCephData Striping
0 likes · 39 min read
Ceph Storage Architecture: Overview, Cluster Design, Client Interfaces, and Encryption
Didi Tech
Didi Tech
Jan 5, 2020 · Big Data

Rolling Upgrade of HDFS from 2.7 to 3.2: Experience, Issues and Solutions

The team performed a rolling upgrade of HDFS from 2.7 to 3.2 on large clusters, resolving EditLog, Fsimage, StringTable and authentication incompatibilities by omitting EC data, using fallback images, rolling back commits and first upgrading to the latest 2.x release, following a staged JournalNode‑NameNode‑DataNode procedure, validating with rehearsals and a custom trash‑management tool, and achieving uninterrupted service, improved stability, performance and cost efficiency.

Big DataCluster MigrationErasure Coding
0 likes · 11 min read
Rolling Upgrade of HDFS from 2.7 to 3.2: Experience, Issues and Solutions
Architects' Tech Alliance
Architects' Tech Alliance
Dec 2, 2019 · Cloud Computing

An Overview of EMC Elastic Cloud Storage (ECS): Architecture, Features, and Performance

This article provides a detailed technical overview of EMC's Elastic Cloud Storage (ECS), covering its historical evolution, layered architecture, supported protocols, data protection mechanisms, performance characteristics, limitations, and future roadmap within the context of cloud object storage.

Distributed StorageECSElastic Cloud Storage
0 likes · 10 min read
An Overview of EMC Elastic Cloud Storage (ECS): Architecture, Features, and Performance