Tag

EMR

1 views collected around this technical thread.

DataFunTalk
DataFunTalk
Feb 14, 2024 · Databases

Open‑Source OLAP Overview, Scenario Analysis, and StarRocks Architecture & Roadmap

This article provides a comprehensive overview of open‑source OLAP technologies, examines various business scenarios and data‑lake architectures, and details StarRocks' core features, performance optimizations, and future development plans within the EMR ecosystem.

Big DataEMROLAP
0 likes · 16 min read
Open‑Source OLAP Overview, Scenario Analysis, and StarRocks Architecture & Roadmap
DataFunSummit
DataFunSummit
Jul 20, 2023 · Big Data

Cloud‑Native OLAP on Volcano EMR: Architecture, Capabilities, and Customer Cases

This article introduces Volcano EMR's cloud‑native OLAP solution, detailing its product overview, storage‑compute separation, elastic scaling, cost and hot‑cold data management, intelligent query analysis, multiple customer case studies, and future roadmap for real‑time and offline data warehousing.

Big DataEMROLAP
0 likes · 11 min read
Cloud‑Native OLAP on Volcano EMR: Architecture, Capabilities, and Customer Cases
ByteDance Data Platform
ByteDance Data Platform
Jan 11, 2023 · Big Data

How EMR Stateless Transforms Big Data with Transient, Stateless Clusters

This article explains the concept of transient clusters and the Stateless architecture in Volcano Engine's EMR platform, compares Stateless with traditional Stateful approaches, outlines its evolution, core components, elastic scaling features, and the business value of cost‑effective, on‑demand big‑data processing.

Big DataEMRStateless
0 likes · 17 min read
How EMR Stateless Transforms Big Data with Transient, Stateless Clusters
Big Data Technology Architecture
Big Data Technology Architecture
May 22, 2022 · Big Data

Delta Lake Overview, File Structure, Metadata, and Its Integration with Alibaba Cloud EMR, DLF, G‑SCD and CDC Solutions

This article introduces Delta Lake as an open‑source storage layer for lake‑house architectures, explains its key features, file and metadata structures, and details how Alibaba Cloud EMR and Data Lake Formation integrate and extend Delta Lake with advanced capabilities such as G‑SCD, CDC, performance optimizations, and future roadmap.

Big DataCDCDLF
0 likes · 10 min read
Delta Lake Overview, File Structure, Metadata, and Its Integration with Alibaba Cloud EMR, DLF, G‑SCD and CDC Solutions
ByteDance Data Platform
ByteDance Data Platform
Feb 25, 2022 · Big Data

Optimizing SparkSQL: ByteDance EMR’s Data Lake Integration and Multi‑Tenant Server

ByteDance’s EMR team details how they integrated data‑lake engines such as Hudi and Iceberg into SparkSQL, streamlined jar management, built a custom Spark SQL Server with Hive compatibility, multi‑tenant support, engine pre‑warming, and transaction capabilities, dramatically improving performance and resource efficiency for enterprise workloads.

EMRHudiSparkSQL
0 likes · 11 min read
Optimizing SparkSQL: ByteDance EMR’s Data Lake Integration and Multi‑Tenant Server
Tencent Tech
Tencent Tech
Sep 10, 2021 · Big Data

How Sohu Changyou Migrated 1 PB of Game Data to the Cloud Without Downtime

This article details how Sohu Changyou’s data team, together with Tencent Cloud engineers, planned and executed a seamless migration of over one petabyte of game data to Elastic MapReduce, Elasticsearch Service and Oceanus, achieving zero service impact and dramatically improving analytics performance.

Big DataEMRImpala
0 likes · 9 min read
How Sohu Changyou Migrated 1 PB of Game Data to the Cloud Without Downtime
Big Data Technology Architecture
Big Data Technology Architecture
Nov 21, 2020 · Big Data

Multi-Engine Support and Future Directions of Alibaba Cloud Data Lake Building Service

The article explains how Alibaba Cloud's Data Lake Building Service enables fine‑grained lake management by integrating multiple compute engines—including EMR, MaxCompute, Blink, Hologres, PAI, and open‑source Hive, Spark, and Presto—through unified metadata and OSS storage, while outlining current features, special format support, and planned future enhancements.

Alibaba CloudBig DataEMR
0 likes · 9 min read
Multi-Engine Support and Future Directions of Alibaba Cloud Data Lake Building Service
Tencent Cloud Developer
Tencent Cloud Developer
Oct 19, 2020 · Big Data

Improving Spark Write Performance for Massive Files on Object Storage with Tencent Cloud EMR

By parallelizing Spark’s driver‑side commit, trash, and move phases—previously single‑threaded operations that caused costly copy‑on‑rename when writing massive files to object storage—the Tencent Cloud EMR case achieved over a tenfold (1,100 %) speedup, making object storage a viable alternative to HDFS.

Big DataEMRObject Storage
0 likes · 8 min read
Improving Spark Write Performance for Massive Files on Object Storage with Tencent Cloud EMR
Tencent Cloud Developer
Tencent Cloud Developer
May 9, 2020 · Big Data

Huya Live Streaming Big Data Practice: Cloud‑Based Data Platform and EMR Solution

Huya migrated its petabyte‑scale Hadoop workloads to Tencent Cloud EMR, using a dedicated line and COS for warm/cold storage, enabling minute‑level cluster provisioning, rapid analysis, and up to 60 % cost savings while improving flexibility, efficiency, and data‑driven innovation.

Big DataEMRTencent Cloud
0 likes · 9 min read
Huya Live Streaming Big Data Practice: Cloud‑Based Data Platform and EMR Solution
Tencent Cloud Developer
Tencent Cloud Developer
May 21, 2019 · Information Security

Design and Implementation of a Cloud Audit Solution for Tencent Cloud Accounts

The article details a scalable, extensible cloud‑audit architecture for Tencent Cloud accounts that stores API logs in a Shanghai‑region COS bucket, processes them with EMR‑based Hive tables and hourly partition scripts, aggregates results into a hot MySQL store, and enables administrators to monitor all sub‑accounts with a real‑time “god view.”

Big DataCOSEMR
0 likes · 13 min read
Design and Implementation of a Cloud Audit Solution for Tencent Cloud Accounts