Tagged articles
7 articles
Page 1 of 1
DataFunSummit
DataFunSummit
Jan 1, 2023 · Big Data

Shopee Data Infra Presentation: Storage Status, Acceleration, Serviceization, and Future Plans

The Shopee Data Infra talk details the current storage architecture, Presto‑based acceleration with Alluxio caching, service‑oriented storage solutions using Alluxio Fuse and S3 APIs, and outlines future enhancements for Spark/Hive integration and CSI/Fuse optimizations, providing a comprehensive view of large‑scale big data storage engineering.

AlluxioCache ManagerKubernetes
0 likes · 16 min read
Shopee Data Infra Presentation: Storage Status, Acceleration, Serviceization, and Future Plans
DataFunTalk
DataFunTalk
Jul 4, 2022 · Big Data

Apache Ozone: Architecture, Advantages, and New Features Overcoming HDFS Limitations

This article explains the shortcomings of HDFS at large scale, describes the Federation and Scaling approaches, and details how Apache Ozone redesigns metadata storage, introduces container abstraction, object semantics, and new features such as optimized OM, streaming writes, erasure coding, and RocksDB consolidation to improve scalability and performance.

Apache OzoneHDFSRocksDB
0 likes · 11 min read
Apache Ozone: Architecture, Advantages, and New Features Overcoming HDFS Limitations
Java Baker
Java Baker
Jun 7, 2022 · Databases

Mastering HBase RowKey Design: Principles, Use Cases, and Architecture

Learn why HBase outperforms MySQL for massive, historical data, explore key rowkey design principles such as composite keys, field ordering, length alignment, and hotspot mitigation, and see practical examples like cold‑hot data separation and transaction logs, plus a concise overview of HBase’s core architecture.

Database ArchitectureHBaseNoSQL
0 likes · 5 min read
Mastering HBase RowKey Design: Principles, Use Cases, and Architecture
DataFunTalk
DataFunTalk
Jul 8, 2021 · Big Data

Design and Evolution of ByteDance's Multi‑Datacenter HDFS Architecture

This article explains how ByteDance extended the Apache HDFS architecture with a multi‑datacenter design, introducing components such as DanceNN, NNProxy, and BookKeeper to achieve scalable storage, cross‑datacenter data placement, and rack‑level disaster recovery for petabyte‑scale workloads.

ByteDanceHDFSbig data storage
0 likes · 13 min read
Design and Evolution of ByteDance's Multi‑Datacenter HDFS Architecture
Tencent Tech
Tencent Tech
Mar 31, 2020 · Big Data

How Tencent Cloud Keeps Big Data Disks Reliable: Inside Their Health Assurance Plan

This article examines the challenges of hard‑disk reliability in large‑scale big‑data services, explains how Tencent Cloud reduces failure rates through hardware and software optimizations, custom collaborations, pre‑deployment health checks, scoring systems, and usage‑pattern improvements, and reveals the comprehensive strategies that keep data storage stable and performant.

big data storagecloud infrastructuredisk health scoring
0 likes · 11 min read
How Tencent Cloud Keeps Big Data Disks Reliable: Inside Their Health Assurance Plan
Huawei Cloud Developer Alliance
Huawei Cloud Developer Alliance
Jul 14, 2016 · Big Data

What Makes Huawei’s CarbonData a Game-Changer for Big Data Analytics?

Huawei’s CarbonData, now an Apache incubator project, is a lightweight, low‑latency columnar storage format that separates storage and compute, offering multi‑dimensional analytics, high compression, and seamless integration with Spark and Hadoop, while addressing the limitations of traditional NoSQL, search engines, and SQL‑on‑Hadoop solutions.

Apache IncubatorCarbonDataOLAP
0 likes · 14 min read
What Makes Huawei’s CarbonData a Game-Changer for Big Data Analytics?