Tagged articles

Data Lake

356 articles · Page 4 of 4

Jun 16, 2021 · Big Data

Practical Experience and Optimizations of Apache Iceberg in Tencent’s Big Data Ecosystem

This article reviews the advantages of Apache Iceberg for data lake storage, details Tencent’s custom optimizations and integration with Flink and Spark, and shares multiple real‑world implementations that demonstrate how Iceberg improves data consistency, reduces small‑file overhead, and enables near‑real‑time analytics in large‑scale big‑data environments.

Apache IcebergData LakeFlink

0 likes · 18 min read

Practical Experience and Optimizations of Apache Iceberg in Tencent’s Big Data Ecosystem

dbaplus Community

Jun 5, 2021 · Big Data

How Flink + Iceberg Transform Data Lakes for Real‑Time Streaming

This article explains the concept of data lakes, outlines a four‑layer open‑source architecture, presents several classic Flink‑Iceberg use cases, details why Iceberg was chosen, and describes the design of Flink’s streaming sink and upcoming community roadmap.

Apache FlinkApache IcebergBig Data

0 likes · 14 min read

How Flink + Iceberg Transform Data Lakes for Real‑Time Streaming

Tencent Cloud Developer

May 26, 2021 · Big Data

Big Data Trends and Future Directions – Insights from the Techo TVP Developer Summit Roundtable

At the Techo TVP Developer Summit, leaders discussed how big‑data tools are evolving beyond perceived bottlenecks toward cloud‑native, specialized platforms and data lakes, emphasized open‑source collaboration, highlighted China’s capacity to spawn a Snowflake‑like service, and offered guidance on emerging real‑time, GPU‑accelerated analytics and multidisciplinary data‑career paths.

Career AdviceData LakeIndustry Trends

0 likes · 24 min read

Big Data Trends and Future Directions – Insights from the Techo TVP Developer Summit Roundtable

Tencent Cloud Developer

May 25, 2021 · Cloud Native

Next‑Generation Cloud‑Native Data Lake Architecture: Value, Principles, Challenges, and Tencent Solutions

The talk outlines a next‑generation cloud‑native data lake that leverages elastic Kubernetes compute, object‑storage, and Apache Iceberg to cut costs 3‑10× while boosting performance, and presents Tencent’s Data Lake Compute and Data Lake Fabric solutions that address scalability, reliability, and operational challenges through serverless, unified, multi‑engine architecture.

Data LakeIcebergTencent Cloud

0 likes · 13 min read

Next‑Generation Cloud‑Native Data Lake Architecture: Value, Principles, Challenges, and Tencent Solutions

Programmer DD

May 22, 2021 · Big Data

What Is a Data Lake? Origins, Architecture, and How It Powers Modern Big Data

This article explains the concept of a data lake—its origin in 2011, how it differs from traditional databases and data warehouses, its core characteristics such as raw data storage, on‑demand computing, and schema‑on‑read, as well as its advantages, challenges, architectural components, and future outlook within the big‑data ecosystem.

Big DataData ArchitectureData Governance

0 likes · 20 min read

What Is a Data Lake? Origins, Architecture, and How It Powers Modern Big Data

DataFunTalk

Apr 27, 2021 · Big Data

Implementing CDC‑to‑Hudi for Real‑Time Mutable Data in a Big Data System

This article describes how Linkflow migrated mutable customer data from MySQL to an Apache Hudi data lake using Debezium‑in‑Flink CDC, addressing challenges such as snapshot resumability, partial updates, row‑key merging, schema evolution, indexing, and concurrent writes to achieve minute‑level data freshness and improved offline processing performance.

Apache HudiBig DataCDC

0 likes · 21 min read

Implementing CDC‑to‑Hudi for Real‑Time Mutable Data in a Big Data System

DataFunTalk

Apr 26, 2021 · Big Data

Detailed Design and Practical Application of Apache Iceberg at NetEase Cloud Music

This article explains the motivations behind Apache Iceberg, its design principles such as snapshot and MVCC, compares it with Hive, and describes how NetEase Cloud Music adopted Iceberg to improve metadata handling, query performance, and operational stability for massive daily log data.

Apache IcebergBig DataData Lake

0 likes · 13 min read

Detailed Design and Practical Application of Apache Iceberg at NetEase Cloud Music

Big Data Technology & Architecture

Apr 25, 2021 · Big Data

Data Lake Storage Architecture Selection and JindoFS on Alibaba Cloud

The article explains the concept and advantages of data lakes, outlines the major storage and acceleration challenges they face, provides a checklist for ideal data‑lake solutions, and details how Alibaba Cloud's JindoFS addresses those challenges with object‑storage‑based, high‑performance, scalable features.

Alibaba CloudData LakeHadoop

0 likes · 9 min read

Data Lake Storage Architecture Selection and JindoFS on Alibaba Cloud

Big Data Technology & Architecture

Apr 24, 2021 · Big Data

Integrating Apache Flink 1.12.2 with Apache Hudi: Batch and Streaming Modes

This article walks through downloading the required Flink and Hudi components, building Hudi for Scala 2.12, and demonstrates step‑by‑step how to create, populate, query, and update Hudi tables in both batch and streaming modes using Flink SQL, complete with code snippets and result screenshots.

BatchData LakeFlink

0 likes · 8 min read

Integrating Apache Flink 1.12.2 with Apache Hudi: Batch and Streaming Modes

Big Data Technology & Architecture

Mar 30, 2021 · Big Data

Implementing Real-Time Data Ingestion with Delta Lake on EMR: Architecture, Challenges, and Solutions

This article describes how Soul's data engineering team replaced nightly batch ETL with real-time Delta Lake ingestion on EMR, detailing the motivations, comparative analysis of Delta, Hudi, Iceberg, the implementation architecture, encountered issues such as data skew and schema evolution, and the solutions adopted to improve performance and reliability.

Data LakeData SkewDelta Lake

0 likes · 13 min read

Implementing Real-Time Data Ingestion with Delta Lake on EMR: Architecture, Challenges, and Solutions

Tencent Cloud Developer

Mar 29, 2021 · Cloud Native

How Tencent Cloud’s Native Data Lake Redefines Big Data Analytics

This article examines the evolution of data lakes, outlines the challenges enterprises face with massive, heterogeneous data, and details Tencent Cloud’s native data lake architecture and its serverless Data Lake Compute service, highlighting performance, cost‑efficiency, and future development directions.

AnalyticsCloud NativeData Lake

0 likes · 10 min read

How Tencent Cloud’s Native Data Lake Redefines Big Data Analytics

Big Data Technology & Architecture

Mar 23, 2021 · Big Data

Practical Implementations of Data Lakes: Huawei Production Scenario, Real-Time Financial Data Lake, and Soul's Delta Lake

This article presents a comprehensive overview of data lake implementations, detailing Huawei's production‑scene platform, a real‑time financial data lake architecture using Kafka, Flink and Iceberg, and Soul's Delta Lake practice with Spark, Hive, and custom ETL tools, highlighting design choices, processing flows, and operational considerations.

Data LakeDelta LakeFlink

0 likes · 8 min read

Practical Implementations of Data Lakes: Huawei Production Scenario, Real-Time Financial Data Lake, and Soul's Delta Lake

Big Data Technology & Architecture

Mar 20, 2021 · Big Data

Understanding Data Lakes and a Comparative Overview of Iceberg, Hudi, and Delta Lake

This article explains what a data lake is, outlines its key characteristics, and compares three major data lake frameworks—Iceberg, Hudi, and Delta Lake—highlighting their architectures, features, and trade‑offs for large‑scale data storage and processing.

Data ArchitectureData LakeDelta Lake

0 likes · 13 min read

Understanding Data Lakes and a Comparative Overview of Iceberg, Hudi, and Delta Lake

Tencent Cloud Developer

Mar 10, 2021 · Cloud Native

How Cloud‑Native Data Lakes Slash Costs and Boost Performance on Public Cloud

The article analyzes the challenges of moving traditional on‑premise big‑data platforms to the cloud, outlines the cost‑saving opportunities of cloud‑native data lakes, presents three core architectural principles, and reviews Tencent Cloud's data lake product suite and its key use cases.

Big DataCloud NativeData Lake

0 likes · 11 min read

How Cloud‑Native Data Lakes Slash Costs and Boost Performance on Public Cloud

Architects' Tech Alliance

Feb 21, 2021 · Big Data

Data Warehouse and Data Lake: Concepts, Architecture, and Comparison

This article provides an extensive overview of data warehouse and data lake concepts, their architectures, differences, components, and implementation considerations, covering topics such as OLTP/OLAP, ETL processes, data quality, cloud solutions, and the role of data platforms in modern enterprises.

Cloud ComputingData ArchitectureData Lake

0 likes · 92 min read

Data Warehouse and Data Lake: Concepts, Architecture, and Comparison

DataFunTalk

Feb 17, 2021 · Big Data

Apache Iceberg 0.11.0: New Partition Support, SortOrder, Flink Streaming Reader, and Ecosystem Integrations

The article details Apache Iceberg 0.11.0's core enhancements—including partition changes, SortOrder, extensive Flink and Spark integrations, CDC/Upsert support, hash‑based write distribution to reduce small files, and upcoming 0.12.0 roadmap—while providing practical SQL and API examples for data‑lake practitioners.

Apache IcebergBig DataCDC

0 likes · 13 min read

Apache Iceberg 0.11.0: New Partition Support, SortOrder, Flink Streaming Reader, and Ecosystem Integrations

Big Data Technology & Architecture

Feb 2, 2021 · Big Data

An Introduction to Apache Iceberg: Features, Spark & Flink Integration, and Real‑World Use Cases

This article provides a comprehensive overview of Apache Iceberg, covering its origins, key features, practical Spark and Flink code examples, notable deployments at Alibaba and Tencent, and its future role as a universal table format for big‑data analytics.

Apache IcebergData LakeFlink

0 likes · 9 min read

An Introduction to Apache Iceberg: Features, Spark & Flink Integration, and Real‑World Use Cases

DataFunTalk

Feb 1, 2021 · Big Data

Building a Real-Time Data Warehouse with Apache Flink and Apache Iceberg: Architecture, Challenges, and Best Practices

This article presents Tencent's experience of constructing a real‑time data warehouse by integrating Apache Flink with Apache Iceberg, covering background pain points, Iceberg's table format and capabilities, Flink‑Iceberg streaming and batch processing, practical implementations, and future roadmap for data‑lake acceleration.

Apache FlinkApache IcebergBig Data

0 likes · 21 min read

Building a Real-Time Data Warehouse with Apache Flink and Apache Iceberg: Architecture, Challenges, and Best Practices

21CTO

Jan 25, 2021 · Big Data

Understanding Data Lakes vs. Data Warehouses: A Complete Guide

This article provides a comprehensive overview of data lakes and data warehouses, explaining their definitions, architectures, differences, and practical use cases, while also covering related concepts such as OLTP/OLAP, ETL processes, data governance, and modern lakehouse solutions.

Data GovernanceData LakeData Warehouse

0 likes · 95 min read

Understanding Data Lakes vs. Data Warehouses: A Complete Guide

Big Data Technology & Architecture

Jan 20, 2021 · Big Data

Understanding Data Warehouse, Data Lake, and Data Middle Platform: Concepts, Differences, and Applications

This article provides a comprehensive overview of data warehouses, data lakes, and data middle platforms, explaining their definitions, architectures, functions, differences, and the value they bring to enterprises, while also addressing common misconceptions and related concepts such as data marts and data swamps.

Data ArchitectureData LakeData Warehouse

0 likes · 37 min read

Understanding Data Warehouse, Data Lake, and Data Middle Platform: Concepts, Differences, and Applications

Big Data Technology & Architecture

Dec 31, 2020 · Big Data

Data Lake vs Data Warehouse: Evolution, Comparison, and Alibaba Cloud Lakehouse Integration

This article examines the 20‑year evolution of big data architectures, contrasts data lakes and data warehouses, explores their respective strengths and challenges, and details Alibaba Cloud’s lake‑warehouse (lakehouse) solution that unifies storage, metadata, and compute for enterprise‑grade analytics and AI workloads.

Cloud ComputingData ArchitectureData Lake

0 likes · 30 min read

Data Lake vs Data Warehouse: Evolution, Comparison, and Alibaba Cloud Lakehouse Integration

DataFunTalk

Dec 15, 2020 · Big Data

Exploring JD's Real‑Time Data Lake with Delta Lake: Architecture, Challenges, and Practical Insights

This article introduces JD's real‑time data warehouse evolution, outlines the limitations of traditional Lambda‑based warehouses, compares open‑source lake formats (Delta, Hudi, Iceberg), explains Delta Lake's transaction‑log architecture and read flow, and demonstrates how a unified batch‑stream development model simplifies data processing and improves reliability.

ACIDData LakeDelta Lake

0 likes · 12 min read

Exploring JD's Real‑Time Data Lake with Delta Lake: Architecture, Challenges, and Practical Insights

DataFunTalk

Dec 3, 2020 · Big Data

Streaming Data Lake Ingestion with Apache Flink and Apache Iceberg

This article explains how Apache Flink integrates with data lake architectures, especially using Apache Iceberg as a table format, to enable real‑time streaming ingestion, CDC processing, near‑real‑time lambda architectures, and future enhancements like automatic file merging and row‑level deletes.

Apache IcebergData LakeFlink

0 likes · 13 min read

Streaming Data Lake Ingestion with Apache Flink and Apache Iceberg

Big Data Technology Architecture

Nov 25, 2020 · Big Data

Data Lake Storage Architecture Selection and JindoFS on Alibaba Cloud

This article explains the concept and benefits of data lakes, outlines the storage and acceleration challenges they pose, presents an ideal checklist for selecting a data lake solution, and evaluates Alibaba Cloud's JindoFS against that checklist, highlighting its capabilities for big‑data and AI workloads.

Alibaba CloudBig DataData Lake

0 likes · 9 min read

Big Data Technology & Architecture

Nov 14, 2020 · Big Data

Comparative Analysis of Apache Hudi, Apache CarbonData, and Delta Lake for Data Lake Solutions

This article examines the core requirements of data lakes and provides an in‑depth comparison of three major open‑source solutions—Apache Hudi, Apache CarbonData, and Delta Lake—highlighting their architectures, ACID support, query capabilities, and suitability for various real‑time and batch use cases.

ACIDApache CarbonDataApache Hudi

0 likes · 9 min read

Comparative Analysis of Apache Hudi, Apache CarbonData, and Delta Lake for Data Lake Solutions

DataFunTalk

Oct 29, 2020 · Big Data

Building a Large-Scale Near Real-Time Data Analytics Platform at Lyft Using Apache Flink

Lyft transformed its legacy data pipeline by designing a cloud‑native, Flink‑based near real‑time analytics platform that ingests billions of events, writes Parquet files to S3, leverages Presto for interactive queries, and implements multi‑stage non‑blocking ETL, fault‑tolerant back‑fill, and extensive performance optimizations.

AWSData LakeETL

0 likes · 12 min read

Building a Large-Scale Near Real-Time Data Analytics Platform at Lyft Using Apache Flink

Alibaba Cloud Developer

Oct 25, 2020 · Big Data

How Alibaba’s Cloud‑Native Data Lake Solves Big Data Challenges

Alibaba Cloud’s Data Lake Analytics (DLA) tackles the growing complexity of data scenarios by offering cloud‑native, serverless solutions for data lake management, massive metadata construction, and high‑performance Spark and Presto engines, while addressing challenges such as high entry barriers, stability, and multi‑tenant isolation.

Cloud NativeData LakeServerless Spark

0 likes · 22 min read

How Alibaba’s Cloud‑Native Data Lake Solves Big Data Challenges

Big Data Technology & Architecture

Oct 21, 2020 · Big Data

An Introduction to Apache Hudi: Concepts, Design Principles, and Architecture

This article introduces Apache Hudi, explaining its core concepts, design principles, table architecture, write and compaction mechanisms, and the three query modes that enable efficient batch and incremental processing on modern data lakes.

Apache HudiBig DataData Lake

0 likes · 21 min read

An Introduction to Apache Hudi: Concepts, Design Principles, and Architecture

Big Data Technology & Architecture

Oct 19, 2020 · Big Data

Delta Lake: ACID Transactions, Schema Management, and Unified Batch‑Streaming for Data Lakes

Delta Lake adds ACID transaction support, schema enforcement, data versioning, and unified batch‑and‑stream processing to Apache Spark‑based data lakes, addressing reliability, quality, performance, and update challenges of traditional data lake architectures.

ACID TransactionsApache SparkBig Data

0 likes · 13 min read

Delta Lake: ACID Transactions, Schema Management, and Unified Batch‑Streaming for Data Lakes

Architects' Tech Alliance

Oct 15, 2020 · Big Data

Why Data Lakes and Data Warehouses Are Merging: The Rise of the Lakehouse Era

This article traces the 20‑year evolution of big‑data technologies, compares data lakes and data warehouses, explains their complementary strengths, and presents Alibaba Cloud’s lakehouse solution that unifies storage and compute to deliver flexible, performant, and cost‑effective analytics for enterprises.

Big DataCloud ComputingData Lake

0 likes · 30 min read

Why Data Lakes and Data Warehouses Are Merging: The Rise of the Lakehouse Era

DataFunTalk

Oct 9, 2020 · Big Data

NetEase’s Data Lake Iceberg: Challenges, Core Principles, and Practical Implementation

This article examines the pain points of traditional data warehouse platforms, explains the core concepts and advantages of the Iceberg data lake table format, compares it with Metastore, reviews the current Iceberg community ecosystem, and details NetEase’s practical integration with Hive, Impala, and Flink to improve ETL efficiency and support unified batch‑stream processing.

Data LakeETLFlink

0 likes · 13 min read

NetEase’s Data Lake Iceberg: Challenges, Core Principles, and Practical Implementation

Big Data Technology & Architecture

Sep 2, 2020 · Big Data

An Overview of Apache Hudi: Architecture, Features, and Query Types

Apache Hudi is an open‑source data‑lake framework that leverages Spark to ingest, manage, and incrementally query large analytical datasets on HDFS‑compatible storage, offering features such as timeline management, copy‑on‑write and merge‑on‑read tables, and support for snapshot, incremental, and read‑optimized queries across engines like Hive, Spark SQL and Presto.

Apache HudiBig DataData Lake

0 likes · 12 min read

An Overview of Apache Hudi: Architecture, Features, and Query Types

Big Data Technology & Architecture

Aug 23, 2020 · Big Data

Apache Hudi Overview, Core Concepts, and Quick‑Start Guide

This article introduces Apache Hudi, explaining its storage types, query views, timeline feature, typical use cases such as near‑real‑time ingestion and incremental pipelines, and provides a step‑by‑step Scala/Spark quick‑start guide with code examples for compiling, inserting, updating, querying, and syncing data to Hive.

Apache HudiBig DataData Lake

0 likes · 18 min read

Apache Hudi Overview, Core Concepts, and Quick‑Start Guide

Big Data Technology & Architecture

Aug 15, 2020 · Big Data

Understanding Data Lakes: Concepts, Architecture, Vendor Solutions, and Practical Use Cases

This comprehensive article explains what a data lake is, outlines its core characteristics and reference architecture, compares major cloud providers' data‑lake offerings, presents typical advertising and gaming use cases, and proposes a practical, agile process for building and operating a data lake.

Big DataCloud NativeData Architecture

0 likes · 50 min read

Understanding Data Lakes: Concepts, Architecture, Vendor Solutions, and Practical Use Cases

Big Data and Microservices

Jun 28, 2020 · Big Data

Data Warehouse vs Data Lake vs Data Platform vs Data Middle Platform: Which Fits Your Business?

This article compares data warehouse, data lake, data platform, and data middle platform, explaining their definitions, architectures, strengths, limitations, and use‑case differences, and provides tables that highlight how each solution handles structured and unstructured data, governance, flexibility, and business value.

Big DataData ArchitectureData Lake

0 likes · 12 min read

Data Warehouse vs Data Lake vs Data Platform vs Data Middle Platform: Which Fits Your Business?

Architect

May 12, 2020 · Big Data

An Overview of Apache Hudi: Architecture, Concepts, and Query Types

Apache Hudi is an open‑source data‑lake framework that leverages Spark and Hadoop‑compatible storage to provide efficient ingestion, incremental processing, and multiple query modes such as snapshot, incremental, and read‑optimized for large analytical datasets.

Apache HudiBig DataData Lake

0 likes · 11 min read

An Overview of Apache Hudi: Architecture, Concepts, and Query Types

ITPUB

Apr 6, 2020 · Big Data

How to Build a Data Lake Quickly: Strategies, Tools, and Real‑World Cases

This article explains the origins and market growth of data lakes, compares them with traditional data warehouses, showcases major implementations like Amazon Galaxy and Club Factory, and provides practical guidance on choosing open‑source or commercial cloud solutions to construct a data lake efficiently while minimizing risk.

AWSBig DataCloud Computing

0 likes · 10 min read

How to Build a Data Lake Quickly: Strategies, Tools, and Real‑World Cases

dbaplus Community

Mar 17, 2020 · Big Data

Choosing the Right Open‑Source Data Lake: Delta vs Iceberg vs Hudi

An in‑depth comparison of the three leading open‑source data lake platforms—Delta Lake, Apache Iceberg, and Apache Hudi—examines their origins, core challenges they address, key features, and performance across seven evaluation dimensions to guide practitioners in selecting the optimal solution for their workloads.

Apache HudiApache IcebergData Lake

0 likes · 15 min read

Choosing the Right Open‑Source Data Lake: Delta vs Iceberg vs Hudi

Architects Research Society

Feb 23, 2020 · Big Data

Understanding Data Lakes: Benefits, Challenges, and Comparison with Data Warehouses

The article explains what a Data Lake is, its origins, key characteristics, cost advantages, potential pitfalls such as becoming a data swamp, and compares it with traditional data warehouses, highlighting when each approach is most appropriate.

AnalyticsData ArchitectureData Lake

0 likes · 9 min read

Understanding Data Lakes: Benefits, Challenges, and Comparison with Data Warehouses

Big Data Technology & Architecture

Feb 6, 2020 · Big Data

Comparison of Hudi, Iceberg, and Delta Lake Table Formats

This article compares the design goals, data‑lake table formats—Hudi, Iceberg, and Delta—highlighting their common reliance on meta files and their distinct strengths for upserts, analytics, and unified streaming‑batch processing in modern big‑data environments.

Big DataData LakeDelta Lake

0 likes · 10 min read

Comparison of Hudi, Iceberg, and Delta Lake Table Formats

Architects' Tech Alliance

Oct 17, 2019 · Big Data

Understanding Alibaba's Data Middle Platform: Concepts, Architecture, and Differences from Data Warehouses and Data Lakes

The article explains Alibaba's data middle platform—its definition, methodology, organizational structure, key tools, and how it differs from traditional data warehouses and data lakes—while highlighting its role in supporting scalable, business‑centric data services and digital transformation.

AlibabaBig DataData Architecture

0 likes · 16 min read

Understanding Alibaba's Data Middle Platform: Concepts, Architecture, and Differences from Data Warehouses and Data Lakes

Big Data Technology & Architecture

Oct 17, 2019 · Big Data

Delta Lake: Architecture, Features, and Hands‑On Tutorial

This article explains the origins and motivations of Delta Lake, details its ACID transaction support, schema enforcement, metadata handling, versioning, and unified batch‑and‑stream processing, and provides a step‑by‑step Maven and Spark code tutorial for creating, updating, and querying Delta tables.

ACIDApache SparkBig Data

0 likes · 10 min read

Delta Lake: Architecture, Features, and Hands‑On Tutorial

Architects' Tech Alliance

Aug 20, 2019 · Big Data

Current State and Future Trends of Hadoop in the Big Data Landscape

Despite recent market turbulence and negative headlines, Hadoop's revenue continues to grow, driven by cloud migration, evolving storage solutions, and increasing adoption of related projects like Spark and Kafka, positioning it as a leading data‑lake technology.

Apache SparkBig DataData Lake

0 likes · 8 min read

Current State and Future Trends of Hadoop in the Big Data Landscape

Architects' Tech Alliance

Jul 28, 2019 · Big Data

Alluxio: A Virtual Distributed File System for Unified Big Data Access and Cost‑Effective Storage

The article explains how Alluxio, a memory‑speed virtual distributed file system, acts as a virtual data lake to unify access to structured and unstructured big‑data across heterogeneous storage systems, offering on‑demand fast local access, intelligent caching, reduced storage costs, and enterprise‑grade security and fault tolerance.

AlluxioBig DataCaching

0 likes · 15 min read

Alluxio: A Virtual Distributed File System for Unified Big Data Access and Cost‑Effective Storage

DataFunTalk

Jun 17, 2019 · Big Data

Understanding Hadoop’s Core Competitiveness in the Trillion‑Scale Data Era

This article explores Hadoop’s role in the big‑data era, detailing its architecture, core components such as HDFS, YARN, MapReduce, Ozone and Submarine, the challenges of trillion‑scale data, and why its scalability, cost efficiency, and a mature ecosystem give it a competitive edge.

Data LakeHadoopMapReduce

0 likes · 11 min read

Understanding Hadoop’s Core Competitiveness in the Trillion‑Scale Data Era

21CTO

Jan 26, 2019 · Big Data

Data Lake vs Data Warehouse: Which One Powers Your Business?

This article explains the core differences between data lakes and data warehouses, their respective strengths, and how they complement each other to support both exploratory analytics and routine business reporting.

AnalyticsBig DataData Lake

0 likes · 5 min read

Data Lake vs Data Warehouse: Which One Powers Your Business?

Architects' Tech Alliance

Nov 5, 2018 · Big Data

Alluxio as a Virtual Distributed File System for Data Lake Solutions

The article explains how Alluxio provides a virtual distributed file system that acts as a "virtual data lake," enabling unified, high‑performance access to structured and unstructured data across heterogeneous storage back‑ends while reducing storage costs through intelligent caching and eliminating the need for permanent data copies.

AlluxioBig DataCaching

0 likes · 16 min read

Alluxio as a Virtual Distributed File System for Data Lake Solutions

Tencent Cloud Developer

Oct 30, 2018 · Big Data

Big Data Technology Trends and Cloud Data Warehouse Architecture Practices

The article reviews recent big-data trends—from Hadoop’s evolution and Spark’s in-memory advances to emerging storage like Ozone—while detailing data-warehouse models, query-optimizer techniques, and cloud-native architectures that integrate diverse data sources, enabling scalable, AI-ready analytics and modern data-lake capabilities.

Big DataCloud Data WarehouseData Lake

0 likes · 30 min read

Big Data Technology Trends and Cloud Data Warehouse Architecture Practices

Big Data and Microservices

Aug 21, 2018 · Big Data

How to Build a Scalable Hadoop‑Spark Big Data Analytics Platform

This article explains why BI is essential for big data platforms, outlines the value hierarchy of data, details the Hadoop‑based analysis workflow, and provides step‑by‑step guidance for constructing both pure Hadoop and hybrid Hadoop‑Spark analytics architectures.

BIBig Data ArchitectureData Lake

0 likes · 12 min read

How to Build a Scalable Hadoop‑Spark Big Data Analytics Platform

UCloud Tech

Jul 9, 2018 · Big Data

How Distributed Unified Storage Solves Modern Big Data Challenges

This article explores the evolution of storage technology, the rise of software‑defined distributed unified storage like UMStor, and the Hadapter solution that enables high‑performance, compute‑storage separation for big‑data and cloud environments, highlighting real‑world deployments and performance insights.

Data LakeDistributed storageHadapter

0 likes · 14 min read

How Distributed Unified Storage Solves Modern Big Data Challenges

UCloud Tech

Jul 7, 2018 · Big Data

How UMStor and HAdapter Power Big Data Cloud Migration with Superior Performance

The article reports on UCloud's subsidiary presenting at ArchSummit 2018 in Shenzhen, detailing the evolution to the digital era, challenges of PB‑scale data storage, and their solution using NFS‑Ganesha, Hadapter, and UMStor to achieve efficient big‑data‑on‑cloud performance and a data‑lake model.

Data LakeDistributed storageHadoop

0 likes · 10 min read

How UMStor and HAdapter Power Big Data Cloud Migration with Superior Performance

dbaplus Community

Jun 14, 2018 · Big Data

Designing Scalable Hadoop‑Based Data Analytics Platforms: Architecture & Best Practices

This article explains how enterprises can build a scalable data analytics platform on Hadoop by outlining the multi‑layer architecture, storage options, data synchronization methods, and ETL/offline computation techniques, while highlighting practical component choices such as Hive, HBase, Spark, and Oozie.

Big DataData ArchitectureData Lake

0 likes · 10 min read

Designing Scalable Hadoop‑Based Data Analytics Platforms: Architecture & Best Practices

Full-Stack Internet Architecture

Jun 14, 2018 · Big Data

What Is Big Data? Definitions, Technologies, Skills, and Use Cases

This article explains the definition of big data, its characteristic 3Vs, common data sources, supporting IT infrastructure, key technologies such as Hadoop and Spark, specialized databases, required skills, and several practical business use cases.

Apache SparkData LakeHadoop

0 likes · 8 min read

What Is Big Data? Definitions, Technologies, Skills, and Use Cases

UCloud Tech

May 22, 2018 · Big Data

Can Data Lakes Combine Compute and Storage? Exploring HDFS, S3A, and UMStor Hadapter

This article examines the evolution of data lake architectures, comparing the compute‑storage fusion model of HDFS, the compute‑storage separation approach of S3A on Ceph, and a new UMStor Hadapter plugin that aims to unite their strengths while addressing performance bottlenecks.

CephData LakeHDFS

0 likes · 14 min read

Can Data Lakes Combine Compute and Storage? Exploring HDFS, S3A, and UMStor Hadapter

Ctrip Technology

Feb 28, 2018 · Big Data

Using Alluxio to Mitigate HDFS Maintenance Impact on Real-Time Jobs in Ctrip's Big Data Platform

The article explains how Ctrip's big‑data platform introduced Alluxio to isolate real‑time Spark Streaming jobs from HDFS NameNode maintenance, reduce NameNode pressure, improve Spark SQL performance, and provide a unified storage layer across multiple HDFS clusters.

AlluxioBig DataData Lake

0 likes · 9 min read

Using Alluxio to Mitigate HDFS Maintenance Impact on Real-Time Jobs in Ctrip's Big Data Platform

dbaplus Community

Dec 26, 2016 · Big Data

Why Data Lakes Are Redefining Enterprise Data Architecture

This article explains the origins, core features, logical architecture, and advantages of data lakes, contrasts them with traditional data warehouses, outlines a modern data architecture that combines lakes and warehouses, and introduces the DCE intelligent data lake platform with practical Q&A.

Big DataCloud ComputingData Lake

0 likes · 14 min read

Why Data Lakes Are Redefining Enterprise Data Architecture