Building a Real‑Time Flink Recommendation System: Architecture, Code & Deployment
This article walks through a complete Flink‑based recommendation system, detailing its v2.0 architecture, recommendation algorithms, front‑end and back‑end components, and step‑by‑step Docker deployment of MySQL, Redis, HBase, and Kafka services.
Preface
Previously I shared introductions to Flink; now I present a practical project with source code from GitHub: https://github.com/CheckChe0803/flink-recommandSystem-demo.
1. System Architecture v2.0
1.1 System Architecture Diagram
1.2 Module Description
a. Log Data Module (flink-2-hbase) – six Flink jobs:
User‑Product Browsing History: records browsing for collaborative‑filtering, stores scores in HBase table p_history.
User‑Interest: computes interest based on action intervals, uses ValueState, stores in u_interest.
User Profile: tags (color, country, style) stored in user.
Product Profile: age and gender tags stored in prod.
Hot‑Score List: real‑time hotness via windowing, cached in Redis.
Log Import: consumes Kafka, writes raw logs to HBase table con.
b. Web Module
Front‑end UI: returns recommended product list.
Back‑end Monitoring: shows metrics for administrators.
2. Recommendation Engine Logic
2.1 Hot‑Score Based Recommendation Re‑ranks hot list by user features, then combines with similarity scores to recommend related items.
2.2 Product‑Profile Similarity (Cosine) Uses three product tags (color, country, style) and cosine similarity to compute item‑item scores.
2.3 Collaborative Filtering Similarity Calculates similarity scores from the user‑product table in HBase.
3. Front‑End Recommendation Page
The page shows three columns: hot‑score recommendations, collaborative‑filtering recommendations, and product‑profile recommendations.
4. Back‑End Data Dashboard
Displays real‑time metrics such as hot‑score list and one‑hour log ingestion volume; data originates from other Flink modules. The SQL script is located at resource/database.sql.
5. Deployment Instructions
All services are deployed with Docker.
MySQL
docker pull mysql:5.7
docker run --name local-mysql -p 3308:3306 -e MYSQL_ROOT_PASSWORD=123456 -d mysql:5.7Redis
docker run --name local-redis -p 6379:6379 -d redisHBase
docker pull harisekhon/hbase
docker run -d -h base-server \
-p 2181:2181 -p 8080:8080 -p 8085:8085 -p 9090:9090 \
-p 9000:9000 -p 9095:9095 -p 16000:16000 \
-p 16010:16010 -p 16201:16201 -p 16301:16301 \
-p 16020:16020 \
--name hbase harisekhon/hbaseAccess the HBase web UI at http://localhost:16010/master-status.
Kafka
# Pull images
docker pull wurstmeister/zookeeper
docker pull wurstmeister/kafka
docker pull sheepkiller/kafka-manager
# Run Zookeeper
docker run -d --name zookeeper --publish 2181:2181 \
--volume /etc/localtime:/etc/localtime \
--restart=always wurstmeister/zookeeper
# Run Kafka
docker run --name kafka \
-p 9092:9092 \
--link zookeeper:zookeeper \
-e KAFKA_ADVERTISED_HOST_NAME=192.168.1.8 \
-e KAFKA_ZOOKEEPER_CONNECT=zookeeper:2181 \
-d wurstmeister/kafka
# Run Kafka Manager
docker run -d \
--link zookeeper:zookeeper \
-p 9000:9000 \
-e ZK_HOSTS="zookeeper:2181" \
hlebalbau/kafka-manager:stable -Dpidfile.path=/dev/nullAfter starting, access the manager at localhost:9000.
6. Starting Services
Operations are performed in IntelliJ IDEA.
Configure IPs and ports of MySQL, Redis, HBase, and Kafka in flink-2-hbase and the web service.
Run mvn clean install in the flink-2-hbase root to package the project.
Start each task in the task directory (right‑click run in IDEA).
Launch SchedulerJob to compute collaborative‑filtering and user‑profile scores periodically.
Open the web project in IDEA; after the generated JAR is loaded, start the service.
Note: Initially the system will recommend random products because no click data exists; generate some clicks to enable real‑time recommendations.
7. Next Steps
Add Flink task monitoring.
Enhance the data dashboard with more detailed metrics.
Calculate business metrics such as recall and precision.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
ITFLY8 Architecture Home
ITFLY8 Architecture Home - focused on architecture knowledge sharing and exchange, covering project management and product design. Includes large-scale distributed website architecture (high performance, high availability, caching, message queues...), design patterns, architecture patterns, big data, project management (SCRUM, PMP, Prince2), product design, and more.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
