Big Data 11 min read

How to Build a Real‑Time Recommendation System with Flink, HBase, and Docker

This article walks through a complete real‑time recommendation system built on Apache Flink, detailing its v2.0 architecture, modules for user behavior, interest, and product profiling, the recommendation algorithms (hot‑list, collaborative filtering, item similarity), and step‑by‑step Docker deployment of MySQL, Redis, HBase, and Kafka.

IT Architects Alliance
IT Architects Alliance
IT Architects Alliance
How to Build a Real‑Time Recommendation System with Flink, HBase, and Docker

Project Overview

GitHub repository: https://github.com/CheckChe0803/flink-recommandSystem-demo. The project implements a real‑time recommendation system using Apache Flink, Kafka, HBase, Redis and a web UI.

System Architecture (v2.0)

The system consists of several Flink jobs that ingest user interaction logs from Kafka, store raw and intermediate data in HBase, cache hot items in Redis, and expose recommendation results through a SpringBoot web module.

Core Flink Modules

User‑Product Browsing History : records each product a user views; data are written to HBase table p_history for later item‑based collaborative filtering.

User‑Interest : computes an interest event when the interval between two actions (e.g., purchase → click < 100 s) is short. Uses Flink ValueState to clear the state when the user favorites a product (action=3) or when the interval exceeds the threshold. Results are stored in HBase table u_interest.

User Profile : generates tag vectors for color, country and style preferences. Persisted in HBase table user.

Product Profile : stores age‑group and gender preference dimensions for each product in HBase table prod.

Hot‑list : a time‑windowed Flink job computes real‑time popularity scores, caches the ranked list in Redis using ListState.

Log Import : consumes Kafka streams, writes raw logs to HBase table con, and aggregates data needed for the front‑end dashboard.

Web Module : front‑end displays three recommendation columns (hot‑list, collaborative‑filtering, product‑profile); back‑end provides monitoring metrics for administrators.

Recommendation Engine Logic

Hot‑list based recommendation : re‑ranks the hot list according to user feature vectors, then combines it with similarity scores to suggest related items.

Product‑profile similarity : calculates cosine similarity between items using three profile dimensions (color, country, style) and filters by user rating.

Collaborative filtering : derives item‑item similarity from the user‑product matrix stored in HBase; the similarity formula is illustrated in the original diagram.

Frontend and Backend UI

The recommendation page shows three columns: hot‑list, collaborative‑filtering and product‑profile recommendations.

The backend dashboard displays real‑time metrics such as hot‑list scores and hourly log ingestion volume. The SQL script used to build the dashboard resides at resource/database.sql.

Docker Deployment

MySQL

docker pull mysql:5.7
docker run --name local-mysql -p 3308:3306 -e MYSQL_ROOT_PASSWORD=123456 -d mysql:5.7

Redis

docker run --name local-redis -p 6379:6379 -d redis

HBase

docker pull harisekhon/hbase
docker run -d -h base-server \
  -p 2181:2181 -p 8080:8080 -p 8085:8085 -p 9090:9090 \
  -p 9000:9000 -p 9095:9095 -p 16000:16000 -p 16010:16010 \
  -p 16201:16201 -p 16301:16301 -p 16020:16020 \
  --name hbase harisekhon/hbase

Kafka, Zookeeper & Kafka‑Manager

# Pull images
docker pull wurstmeister/zookeeper
docker pull wurstmeister/kafka
docker pull sheepkiller/kafka-manager

# Zookeeper
docker run -d --name zookeeper -p 2181:2181 \
  --volume /etc/localtime:/etc/localtime \
  --restart=always wurstmeister/zookeeper

# Kafka broker
docker run -d --name kafka -p 9092:9092 \
  --link zookeeper:zookeeper \
  -e KAFKA_ADVERTISED_HOST_NAME=192.168.1.8 \
  -e KAFKA_ZOOKEEPER_CONNECT=zookeeper:2181 \
  wurstmeister/kafka

# Kafka Manager
docker run -d --link zookeeper:zookeeper -p 9000:9000 \
  -e ZK_HOSTS="zookeeper:2181" \
  hlebalbau/kafka-manager:stable -Dpidfile.path=/dev/null

Running the Project

Configure the IP addresses and ports of MySQL, Redis, HBase and Kafka in the flink-2-hbase module and the web module.

In the root directory of flink-2-hbase execute mvn clean install to build the JAR and install it into the local Maven repository.

Start each Flink task (e.g., via IDE run configuration).

Launch the SchedulerJob to periodically compute scores for collaborative filtering and user profiling.

Start the web project; it will automatically load the generated JAR and serve the recommendation UI.

Note: When the services start there is no click data, so the system returns random products. Generate user interactions on the recommendation page to see true real‑time recommendations.

Future Work

Add monitoring for Flink tasks.

Enrich the data dashboard with more detailed metrics.

Calculate business metrics such as recall and precision.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

DockerReal-time ProcessingFlinkrecommendation systemKafkaHBase
IT Architects Alliance
Written by

IT Architects Alliance

Discussion and exchange on system, internet, large‑scale distributed, high‑availability, and high‑performance architectures, as well as big data, machine learning, AI, and architecture adjustments with internet technologies. Includes real‑world large‑scale architecture case studies. Open to architects who have ideas and enjoy sharing.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.