Deploying Apache Zeppelin on Kubernetes: Architecture, Scaling, and Best Practices
This article explains Apache Zeppelin's features, architecture, and how to deploy and customize it on Kubernetes to achieve scalability, resource isolation, multi‑tenant support, and persistent storage for interactive big‑data analytics.
Background
Many internet companies provide notebook‑style products for interactive data analysis, modeling, and visualization, typically built on Jupyter or Zeppelin and integrated with big‑data computation, storage, and resource management frameworks to support common machine‑learning and deep‑learning workloads.
Our department’s current product, BigQuery, runs Hive SQL queries on a Tez engine in a YARN cluster, lacking interactive programmable analysis tools for iterative ML/DL tasks and result visualization. Adding notebook capabilities and Spark support is required. Apache Zeppelin, similar to Jupyter but better suited for enterprise big‑data use cases, meets these needs.
Zeppelin Introduction
Apache Zeppelin is a web‑based notebook that enables interactive analysis and visualization across multiple languages and data sources. It supports the full data workflow—from ingestion to modeling—offering a rich front‑end visualization library and back‑end interpreters for Spark, HBase, Flink, Python, JDBC, Markdown, Shell, Elasticsearch, etc., allowing developers to use SQL and other languages within Zeppelin.
Key Features and Characteristics
Interactive visual data analysis via a web UI.
Notebook management for creating, modifying, running, and deleting notebooks, with import/export support.
Built‑in data visualization that renders structured results as charts.
Configurable interpreters (Spark, JDBC, Elasticsearch, etc.) with group‑based management.
Task management for submitting and stopping notebook jobs.
User authentication mechanisms.
One‑click notebook sharing via HTTP URLs.
IntelliJ big‑data tool plugin for efficient local development and debugging.
Official interpreter support overview:
Zeppelin Architecture
Zeppelin consists of three components: the AngularJS‑based front‑end, a Jetty‑based lightweight server, and interpreters. Communication between front‑end and server uses REST API and WebSocket; the server talks to interpreters via Thrift RPC, enabling bidirectional interaction.
Each interpreter runs in its own JVM process, which can be on the same machine as the server or on a separate node or Kubernetes pod. Interpreters support dynamic Maven dependency loading, multi‑JVM isolation, and a Thrift‑based cross‑language IPC mechanism.
Running interpreters in separate JVMs avoids dependency conflicts and improves horizontal scalability, which is critical for production‑grade big‑data analysis.
Production Practices
Running Zeppelin in a single‑node, multi‑process mode limits scalability and isolation. Deploying Zeppelin on Kubernetes as containerized pods addresses extensibility, security, and multi‑tenant requirements. Key production challenges include lifecycle management, multi‑tenant isolation, and persistent storage for notebooks.
Kubernetes Deployment
Deploying on K8s provides:
Scalable multi‑pod execution for resource‑intensive tasks (e.g., Spark, TensorFlow).
Containerized interpreters that avoid host‑level dependency conflicts and improve security.
The overall K8s architecture includes:
Custom Java K8s API calls create namespaces, ConfigMaps, Services, RBAC, PV/PVC, and Zeppelin Server Deployments, with NFS or S3 mounts for notebook persistence and init containers for copying demo notes.
Multi‑tenant access via per‑user namespaces, dedicated Zeppelin servers, and Nginx NodePort services that proxy URLs to the appropriate namespace.
Interpreter launch and communication handling.
Connecting Compute and Storage
Zeppelin’s interpreters are pre‑configured to access Spark, Hive, and HDFS without user‑side setup, using built‑in client binaries and configuration files, along with DNS and authentication handling for seamless operation across K8s and big‑data clusters.
Resource Isolation and Recycling
Problem Background
Users may request up to 100 Spark executors and 400 GB memory, exhausting K8s resources.
Cause Analysis
Limited CPU/memory in the cluster and lack of scheduling/isolation policies cause contention, and unreclaimed resources eventually block other jobs.
Solution
K8s node labeling, pod resource requests/limits, and idle‑time termination are employed. Nodes are labeled for notebook workloads; pods inherit these labels, and a controller recycles idle pods after a configurable timeout.
File Upload and Data Persistence
Problem Background
Notebooks disappear after container restart.
Zeppelin lacks native file upload and sharing across interpreter pods.
Solution
Persistent storage is achieved by mounting NFS or S3 volumes into each Zeppelin Server pod, using subPath to isolate user directories. Interpreter pods also mount the same storage, enabling uploaded files to be accessible across all interpreters.
Summary and Outlook
Zeppelin notebooks support data development, analysis, reporting, and interactive big‑data modeling. Deploying on Kubernetes resolves scalability, security, and multi‑tenant challenges, and custom extensions have been added in production. Future work includes running Flink on Zeppelin in K8s, exposing Spark UI and web shells, scheduling notebooks, and further interpreter integrations.
References
https://zeppelin.apache.org/
https://zhuanlan.zhihu.com/p/372250644
https://www.yuque.com/jeffzhangjianfeng/ggi5ys/shby78
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
