Tagged articles
4 articles
Page 1 of 1
Big Data Technology Architecture
Big Data Technology Architecture
Nov 28, 2021 · Big Data

EMR Studio: Architecture and Features for Simplifying Big Data Development

EMR Studio is a one‑stop, open‑source‑compatible big data development platform that integrates Zeppelin, Jupyter, Airflow and a custom Cluster Manager to streamline job creation, scheduling, monitoring, and cluster switching, thereby addressing common usability challenges in Spark, Flink, Hive, and Presto workflows.

AirflowApache SparkEMR Studio
0 likes · 9 min read
EMR Studio: Architecture and Features for Simplifying Big Data Development
Big Data Technology & Architecture
Big Data Technology & Architecture
Feb 7, 2021 · Big Data

Building a Flink SQL Platform on Zeppelin: Installation, Configuration, and Advanced Use Cases

This guide walks through setting up Apache Zeppelin as a low‑cost, SQL‑centric development platform for Flink, covering environment preparation, installation, interpreter configuration, execution modes, verification, common pitfalls, dimension‑table joins, custom UDFs, Redis integration, and dual‑stream join techniques.

FlinkSQLStreaming
0 likes · 24 min read
Building a Flink SQL Platform on Zeppelin: Installation, Configuration, and Advanced Use Cases
Architecture Digest
Architecture Digest
Feb 15, 2018 · Databases

Design and Architecture of Zeppelin Distributed Block Storage System

This article presents an in‑depth overview of Zeppelin, a high‑availability, high‑performance block storage service, covering its motivation, online vs offline storage distinctions, data distribution strategies, centralized meta‑server design, replication policies, RocksDB‑based storage engine, Raft‑based consistency protocol, threading model, client request flow, and fault‑handling mechanisms.

Hash PartitioningRaftReplication
0 likes · 19 min read
Design and Architecture of Zeppelin Distributed Block Storage System