Big Data 11 min read

Getting Started with Apache Zeppelin: Installation, Core Features, and Integration with JDBC, Spark, and Flink

This tutorial introduces Apache Zeppelin, explains REPL and Jupyter concepts, outlines its core features and project structure, and provides step‑by‑step instructions for installing Zeppelin, creating notebooks, and connecting to databases, Spark, and Flink with practical code examples.

Big Data Technology & Architecture
Big Data Technology & Architecture
Big Data Technology & Architecture
Getting Started with Apache Zeppelin: Installation, Core Features, and Integration with JDBC, Spark, and Flink

Introduction

Before learning Apache Zeppelin, you should understand two concepts: REPL (Read‑Eval‑Print Loop) and Jupyter Notebook. REPL provides an interactive interpreter environment; Java’s JShell is an example. Jupyter Notebook is a web‑based interactive notebook supporting many languages, commonly used for data cleaning, modeling, and machine learning.

Zeppelin Overview

Apache Zeppelin is a big‑data analysis and visualization tool that lets analysts write code in multiple languages within a web‑based notebook, execute it against various data sources, and visualize results.

Main Features

Interactive visual data analysis via a graphical interface.

Notebook management (create, edit, run, delete, import/export).

Built‑in data visualizations for structured results.

Configurable interpreters (Spark, JDBC, Elasticsearch, etc.).

Task execution management.

User authentication.

One‑click notebook sharing via HTTP.

Project Structure

Zeppelin is a Maven‑based Java project composed of several modules. Core modules include:

zeppelin‑server – entry point with embedded Jetty, WebSocket, REST, and authentication.

zeppelin‑zengine – notebook persistence and retrieval.

zeppelin‑interpreter – abstract interpreter interface and Thrift communication.

zeppelin‑web – front‑end built with AngularJS.

zeppelin‑display – binds Angular UI to backend data.

zeppelin‑spark‑dependencies – provides Spark integration (may be removed in future).

zeppelin‑distribution – packaging module.

helium‑dev – runtime plugin system (experimental).

Apache Zeppelin Installation

Download the binary package from Apache archive , upload it to the server, and extract: tar -zxvf zeppelin-0.8.2-bin-all.tgz Start Zeppelin without additional configuration: ./zeppelin-daemon.sh start If you see Zeppelin start [ OK ], the service is running on port 8080. Open http://<i>host</i>:8080 in a browser to access the UI.

Creating a Notebook

In the Zeppelin UI, click “Create new note”, name it (e.g., hellozep), and select an interpreter such as Python. Ensure the corresponding backend (Python) is installed.

Connecting Zeppelin to JDBC

Zeppelin supports many databases via JDBC (PostgreSQL, MySQL, MariaDB, Apache Drill, Redshift, Tajo, Hive, Phoenix, etc.). To add a JDBC interpreter, click the “+ Create” button on the interpreter settings page, give it a name (e.g., mysql), choose the JDBC group, and fill in the connection properties. After adding the driver, save the configuration.

Example usage in a paragraph: %jdbc_interpreter_name<br/>show databases If the paragraph finishes without errors, you can run any CRUD SQL statements.

Connecting Zeppelin to Spark

Configure Spark in conf/zeppelin-env.sh:

export SPARK_HOME=/usr/lib/spark<br/>export HADOOP_CONF_DIR=/usr/lib/hadoop<br/>export SPARK_SUBMIT_OPTIONS="--packages com.databricks:spark-csv_2.10:1.2.0"<br/>export ZEPPELIN_INTP_CLASSPATH_OVERRIDES=/etc/hive/conf

Load a sample dataset and run Spark SQL:

val bankText = sc.textFile("bank.csv")
case class Bank(age: Integer, job: String, marital: String, education: String, balance: Integer)
val bank = bankText.map(s => s.split(";")).filter(s => s(0) != "\"age\"")
  .map(s => Bank(s(0).toInt, s(1).replaceAll("\"", ""), s(2).replaceAll("\"", ""), s(3).replaceAll("\"", ""), s(5).replaceAll("\"", "").toInt))
  .toDF()
bank.registerTempTable("bank")

Connecting Zeppelin to Flink

Set Flink‑related environment variables in conf/zeppelin-env.sh, then edit the Flink interpreter configuration via the UI. A simple Flink demo:

%flink<br/>case class WordCount(word: String, frequency: Int)<br/>val bible: DataSet[String] = benv.readTextFile("10.txt.utf-8")<br/>val partialCounts: DataSet[WordCount] = bible.flatMap { line =>
  """\b\w+\b""".r.findAllIn(line).map(word => WordCount(word, 1))
}<br/>val wordCounts = partialCounts.groupBy("word").reduce { (l, r) => WordCount(l.word, l.frequency + r.frequency) }<br/>val result10 = wordCounts.first(10).collect()

After running, click the FLINK JOB label to open the job’s Web UI. Zeppelin also supports Flink batch, streaming, and SQL.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

FlinkJDBCInstallationSparknotebookApache Zeppelin
Big Data Technology & Architecture
Written by

Big Data Technology & Architecture

Wang Zhiwu, a big data expert, dedicated to sharing big data technology.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.