Tagged articles
343 articles
Page 4 of 4
Architects' Tech Alliance
Architects' Tech Alliance
Feb 21, 2021 · Big Data

Data Warehouse and Data Lake: Concepts, Architecture, and Comparison

This article provides an extensive overview of data warehouse and data lake concepts, their architectures, differences, components, and implementation considerations, covering topics such as OLTP/OLAP, ETL processes, data quality, cloud solutions, and the role of data platforms in modern enterprises.

Data ArchitectureData LakeETL
0 likes · 92 min read
Data Warehouse and Data Lake: Concepts, Architecture, and Comparison
DataFunTalk
DataFunTalk
Feb 17, 2021 · Big Data

Apache Iceberg 0.11.0: New Partition Support, SortOrder, Flink Streaming Reader, and Ecosystem Integrations

The article details Apache Iceberg 0.11.0's core enhancements—including partition changes, SortOrder, extensive Flink and Spark integrations, CDC/Upsert support, hash‑based write distribution to reduce small files, and upcoming 0.12.0 roadmap—while providing practical SQL and API examples for data‑lake practitioners.

Apache IcebergBig DataCDC
0 likes · 13 min read
Apache Iceberg 0.11.0: New Partition Support, SortOrder, Flink Streaming Reader, and Ecosystem Integrations
DataFunTalk
DataFunTalk
Feb 1, 2021 · Big Data

Building a Real-Time Data Warehouse with Apache Flink and Apache Iceberg: Architecture, Challenges, and Best Practices

This article presents Tencent's experience of constructing a real‑time data warehouse by integrating Apache Flink with Apache Iceberg, covering background pain points, Iceberg's table format and capabilities, Flink‑Iceberg streaming and batch processing, practical implementations, and future roadmap for data‑lake acceleration.

Apache FlinkApache IcebergBig Data
0 likes · 21 min read
Building a Real-Time Data Warehouse with Apache Flink and Apache Iceberg: Architecture, Challenges, and Best Practices
21CTO
21CTO
Jan 25, 2021 · Big Data

Understanding Data Lakes vs. Data Warehouses: A Complete Guide

This article provides a comprehensive overview of data lakes and data warehouses, explaining their definitions, architectures, differences, and practical use cases, while also covering related concepts such as OLTP/OLAP, ETL processes, data governance, and modern lakehouse solutions.

Data GovernanceData LakeData Warehouse
0 likes · 95 min read
Understanding Data Lakes vs. Data Warehouses: A Complete Guide
Big Data Technology & Architecture
Big Data Technology & Architecture
Jan 20, 2021 · Big Data

Understanding Data Warehouse, Data Lake, and Data Middle Platform: Concepts, Differences, and Applications

This article provides a comprehensive overview of data warehouses, data lakes, and data middle platforms, explaining their definitions, architectures, functions, differences, and the value they bring to enterprises, while also addressing common misconceptions and related concepts such as data marts and data swamps.

Data ArchitectureData LakeData Warehouse
0 likes · 37 min read
Understanding Data Warehouse, Data Lake, and Data Middle Platform: Concepts, Differences, and Applications
Big Data Technology & Architecture
Big Data Technology & Architecture
Dec 31, 2020 · Big Data

Data Lake vs Data Warehouse: Evolution, Comparison, and Alibaba Cloud Lakehouse Integration

This article examines the 20‑year evolution of big data architectures, contrasts data lakes and data warehouses, explores their respective strengths and challenges, and details Alibaba Cloud’s lake‑warehouse (lakehouse) solution that unifies storage, metadata, and compute for enterprise‑grade analytics and AI workloads.

Data ArchitectureData LakeData Warehouse
0 likes · 30 min read
Data Lake vs Data Warehouse: Evolution, Comparison, and Alibaba Cloud Lakehouse Integration
DataFunTalk
DataFunTalk
Dec 15, 2020 · Big Data

Exploring JD's Real‑Time Data Lake with Delta Lake: Architecture, Challenges, and Practical Insights

This article introduces JD's real‑time data warehouse evolution, outlines the limitations of traditional Lambda‑based warehouses, compares open‑source lake formats (Delta, Hudi, Iceberg), explains Delta Lake's transaction‑log architecture and read flow, and demonstrates how a unified batch‑stream development model simplifies data processing and improves reliability.

ACIDData LakeDelta Lake
0 likes · 12 min read
Exploring JD's Real‑Time Data Lake with Delta Lake: Architecture, Challenges, and Practical Insights
DataFunTalk
DataFunTalk
Dec 3, 2020 · Big Data

Streaming Data Lake Ingestion with Apache Flink and Apache Iceberg

This article explains how Apache Flink integrates with data lake architectures, especially using Apache Iceberg as a table format, to enable real‑time streaming ingestion, CDC processing, near‑real‑time lambda architectures, and future enhancements like automatic file merging and row‑level deletes.

Apache IcebergData LakeFlink
0 likes · 13 min read
Streaming Data Lake Ingestion with Apache Flink and Apache Iceberg
Big Data Technology & Architecture
Big Data Technology & Architecture
Nov 14, 2020 · Big Data

Comparative Analysis of Apache Hudi, Apache CarbonData, and Delta Lake for Data Lake Solutions

This article examines the core requirements of data lakes and provides an in‑depth comparison of three major open‑source solutions—Apache Hudi, Apache CarbonData, and Delta Lake—highlighting their architectures, ACID support, query capabilities, and suitability for various real‑time and batch use cases.

ACIDApache CarbonDataApache Hudi
0 likes · 9 min read
Comparative Analysis of Apache Hudi, Apache CarbonData, and Delta Lake for Data Lake Solutions
DataFunTalk
DataFunTalk
Oct 29, 2020 · Big Data

Building a Large-Scale Near Real-Time Data Analytics Platform at Lyft Using Apache Flink

Lyft transformed its legacy data pipeline by designing a cloud‑native, Flink‑based near real‑time analytics platform that ingests billions of events, writes Parquet files to S3, leverages Presto for interactive queries, and implements multi‑stage non‑blocking ETL, fault‑tolerant back‑fill, and extensive performance optimizations.

AWSData LakeETL
0 likes · 12 min read
Building a Large-Scale Near Real-Time Data Analytics Platform at Lyft Using Apache Flink
Alibaba Cloud Developer
Alibaba Cloud Developer
Oct 25, 2020 · Big Data

How Alibaba’s Cloud‑Native Data Lake Solves Big Data Challenges

Alibaba Cloud’s Data Lake Analytics (DLA) tackles the growing complexity of data scenarios by offering cloud‑native, serverless solutions for data lake management, massive metadata construction, and high‑performance Spark and Presto engines, while addressing challenges such as high entry barriers, stability, and multi‑tenant isolation.

Cloud NativeData LakePresto
0 likes · 22 min read
How Alibaba’s Cloud‑Native Data Lake Solves Big Data Challenges
DataFunTalk
DataFunTalk
Oct 9, 2020 · Big Data

NetEase’s Data Lake Iceberg: Challenges, Core Principles, and Practical Implementation

This article examines the pain points of traditional data warehouse platforms, explains the core concepts and advantages of the Iceberg data lake table format, compares it with Metastore, reviews the current Iceberg community ecosystem, and details NetEase’s practical integration with Hive, Impala, and Flink to improve ETL efficiency and support unified batch‑stream processing.

Data LakeETLFlink
0 likes · 13 min read
NetEase’s Data Lake Iceberg: Challenges, Core Principles, and Practical Implementation
Big Data Technology & Architecture
Big Data Technology & Architecture
Sep 2, 2020 · Big Data

An Overview of Apache Hudi: Architecture, Features, and Query Types

Apache Hudi is an open‑source data‑lake framework that leverages Spark to ingest, manage, and incrementally query large analytical datasets on HDFS‑compatible storage, offering features such as timeline management, copy‑on‑write and merge‑on‑read tables, and support for snapshot, incremental, and read‑optimized queries across engines like Hive, Spark SQL and Presto.

Apache HudiBig DataData Lake
0 likes · 12 min read
An Overview of Apache Hudi: Architecture, Features, and Query Types
Big Data Technology & Architecture
Big Data Technology & Architecture
Aug 23, 2020 · Big Data

Apache Hudi Overview, Core Concepts, and Quick‑Start Guide

This article introduces Apache Hudi, explaining its storage types, query views, timeline feature, typical use cases such as near‑real‑time ingestion and incremental pipelines, and provides a step‑by‑step Scala/Spark quick‑start guide with code examples for compiling, inserting, updating, querying, and syncing data to Hive.

Apache HudiBig DataData Lake
0 likes · 18 min read
Apache Hudi Overview, Core Concepts, and Quick‑Start Guide
Big Data Technology & Architecture
Big Data Technology & Architecture
Aug 15, 2020 · Big Data

Understanding Data Lakes: Concepts, Architecture, Vendor Solutions, and Practical Use Cases

This comprehensive article explains what a data lake is, outlines its core characteristics and reference architecture, compares major cloud providers' data‑lake offerings, presents typical advertising and gaming use cases, and proposes a practical, agile process for building and operating a data lake.

Big DataCloud NativeData Architecture
0 likes · 50 min read
Understanding Data Lakes: Concepts, Architecture, Vendor Solutions, and Practical Use Cases
Big Data and Microservices
Big Data and Microservices
Jun 28, 2020 · Big Data

Data Warehouse vs Data Lake vs Data Platform vs Data Middle Platform: Which Fits Your Business?

This article compares data warehouse, data lake, data platform, and data middle platform, explaining their definitions, architectures, strengths, limitations, and use‑case differences, and provides tables that highlight how each solution handles structured and unstructured data, governance, flexibility, and business value.

Big DataData ArchitectureData Lake
0 likes · 12 min read
Data Warehouse vs Data Lake vs Data Platform vs Data Middle Platform: Which Fits Your Business?
Architect
Architect
May 12, 2020 · Big Data

An Overview of Apache Hudi: Architecture, Concepts, and Query Types

Apache Hudi is an open‑source data‑lake framework that leverages Spark and Hadoop‑compatible storage to provide efficient ingestion, incremental processing, and multiple query modes such as snapshot, incremental, and read‑optimized for large analytical datasets.

Apache HudiBig DataData Lake
0 likes · 11 min read
An Overview of Apache Hudi: Architecture, Concepts, and Query Types
ITPUB
ITPUB
Apr 6, 2020 · Big Data

How to Build a Data Lake Quickly: Strategies, Tools, and Real‑World Cases

This article explains the origins and market growth of data lakes, compares them with traditional data warehouses, showcases major implementations like Amazon Galaxy and Club Factory, and provides practical guidance on choosing open‑source or commercial cloud solutions to construct a data lake efficiently while minimizing risk.

AWSBig DataData Architecture
0 likes · 10 min read
How to Build a Data Lake Quickly: Strategies, Tools, and Real‑World Cases
dbaplus Community
dbaplus Community
Mar 17, 2020 · Big Data

Choosing the Right Open‑Source Data Lake: Delta vs Iceberg vs Hudi

An in‑depth comparison of the three leading open‑source data lake platforms—Delta Lake, Apache Iceberg, and Apache Hudi—examines their origins, core challenges they address, key features, and performance across seven evaluation dimensions to guide practitioners in selecting the optimal solution for their workloads.

Apache HudiApache IcebergData Lake
0 likes · 15 min read
Choosing the Right Open‑Source Data Lake: Delta vs Iceberg vs Hudi
Architects' Tech Alliance
Architects' Tech Alliance
Oct 17, 2019 · Big Data

Understanding Alibaba's Data Middle Platform: Concepts, Architecture, and Differences from Data Warehouses and Data Lakes

The article explains Alibaba's data middle platform—its definition, methodology, organizational structure, key tools, and how it differs from traditional data warehouses and data lakes—while highlighting its role in supporting scalable, business‑centric data services and digital transformation.

AlibabaBig DataData Architecture
0 likes · 16 min read
Understanding Alibaba's Data Middle Platform: Concepts, Architecture, and Differences from Data Warehouses and Data Lakes
Big Data Technology & Architecture
Big Data Technology & Architecture
Oct 17, 2019 · Big Data

Delta Lake: Architecture, Features, and Hands‑On Tutorial

This article explains the origins and motivations of Delta Lake, details its ACID transaction support, schema enforcement, metadata handling, versioning, and unified batch‑and‑stream processing, and provides a step‑by‑step Maven and Spark code tutorial for creating, updating, and querying Delta tables.

ACIDApache SparkBig Data
0 likes · 10 min read
Delta Lake: Architecture, Features, and Hands‑On Tutorial
Architects' Tech Alliance
Architects' Tech Alliance
Jul 28, 2019 · Big Data

Alluxio: A Virtual Distributed File System for Unified Big Data Access and Cost‑Effective Storage

The article explains how Alluxio, a memory‑speed virtual distributed file system, acts as a virtual data lake to unify access to structured and unstructured big‑data across heterogeneous storage systems, offering on‑demand fast local access, intelligent caching, reduced storage costs, and enterprise‑grade security and fault tolerance.

AlluxioBig DataData Lake
0 likes · 15 min read
Alluxio: A Virtual Distributed File System for Unified Big Data Access and Cost‑Effective Storage
DataFunTalk
DataFunTalk
Jun 17, 2019 · Big Data

Understanding Hadoop’s Core Competitiveness in the Trillion‑Scale Data Era

This article explores Hadoop’s role in the big‑data era, detailing its architecture, core components such as HDFS, YARN, MapReduce, Ozone and Submarine, the challenges of trillion‑scale data, and why its scalability, cost efficiency, and a mature ecosystem give it a competitive edge.

Data LakeDistributed SystemsHadoop
0 likes · 11 min read
Understanding Hadoop’s Core Competitiveness in the Trillion‑Scale Data Era
21CTO
21CTO
Jan 26, 2019 · Big Data

Data Lake vs Data Warehouse: Which One Powers Your Business?

This article explains the core differences between data lakes and data warehouses, their respective strengths, and how they complement each other to support both exploratory analytics and routine business reporting.

AnalyticsBig DataData Lake
0 likes · 5 min read
Data Lake vs Data Warehouse: Which One Powers Your Business?
Architects' Tech Alliance
Architects' Tech Alliance
Nov 5, 2018 · Big Data

Alluxio as a Virtual Distributed File System for Data Lake Solutions

The article explains how Alluxio provides a virtual distributed file system that acts as a "virtual data lake," enabling unified, high‑performance access to structured and unstructured data across heterogeneous storage back‑ends while reducing storage costs through intelligent caching and eliminating the need for permanent data copies.

AlluxioBig DataData Lake
0 likes · 16 min read
Alluxio as a Virtual Distributed File System for Data Lake Solutions
Tencent Cloud Developer
Tencent Cloud Developer
Oct 30, 2018 · Big Data

Big Data Technology Trends and Cloud Data Warehouse Architecture Practices

The article reviews recent big-data trends—from Hadoop’s evolution and Spark’s in-memory advances to emerging storage like Ozone—while detailing data-warehouse models, query-optimizer techniques, and cloud-native architectures that integrate diverse data sources, enabling scalable, AI-ready analytics and modern data-lake capabilities.

Big DataData LakeData Warehouse
0 likes · 30 min read
Big Data Technology Trends and Cloud Data Warehouse Architecture Practices
UCloud Tech
UCloud Tech
Jul 9, 2018 · Big Data

How Distributed Unified Storage Solves Modern Big Data Challenges

This article explores the evolution of storage technology, the rise of software‑defined distributed unified storage like UMStor, and the Hadapter solution that enables high‑performance, compute‑storage separation for big‑data and cloud environments, highlighting real‑world deployments and performance insights.

Data LakeHadapterSoftware-Defined Storage
0 likes · 14 min read
How Distributed Unified Storage Solves Modern Big Data Challenges
UCloud Tech
UCloud Tech
Jul 7, 2018 · Big Data

How UMStor and HAdapter Power Big Data Cloud Migration with Superior Performance

The article reports on UCloud's subsidiary presenting at ArchSummit 2018 in Shenzhen, detailing the evolution to the digital era, challenges of PB‑scale data storage, and their solution using NFS‑Ganesha, Hadapter, and UMStor to achieve efficient big‑data‑on‑cloud performance and a data‑lake model.

Data LakeHadoopUMStor
0 likes · 10 min read
How UMStor and HAdapter Power Big Data Cloud Migration with Superior Performance
dbaplus Community
dbaplus Community
Dec 26, 2016 · Big Data

Why Data Lakes Are Redefining Enterprise Data Architecture

This article explains the origins, core features, logical architecture, and advantages of data lakes, contrasts them with traditional data warehouses, outlines a modern data architecture that combines lakes and warehouses, and introduces the DCE intelligent data lake platform with practical Q&A.

Big DataData Lakecloud computing
0 likes · 14 min read
Why Data Lakes Are Redefining Enterprise Data Architecture