Real-Time Machine Learning with Redis-ML: From Concepts to Movie Recommendations
This article explains how Redis-ML enables real‑time machine learning by extending Redis with modules, covering its architecture, random‑forest inference, performance benchmarks against Spark, and a complete movie‑recommendation use case with Docker, Scala, and Python examples.
Background
RedisConf 2017 introduced new Redis modules such as Redisearch, Redis‑ML, and Amazon ElastiCache for Redis. Shay Nativ (RedisLabs) presented a case study on real‑time machine learning using the Redis‑ML module.
Redis Core Advantages
Redis provides high performance, a rich set of native data structures, and extensibility through Redis Modules, which allow C/C++ programs to run inside Redis, add new data structures, and retain Redis's speed, scalability, and high availability.
Redis Modules
Any C/C++ program can run inside Redis.
Modules extend Redis with native functionality and new data structures.
They preserve Redis's speed, scalability, and high availability.
Modules can be created by anyone.
Adapt your database to your data, not the other way around.
Redis‑ML Overview
Redis‑ML is an in‑memory model server for inference. Trained model parameters are stored in Redis (hot mode) and evaluated directly inside the database. It can integrate existing C/C++ ML libraries, supports runtime adjustments, and inherits Redis's performance and HA features.
Machine‑Learning Foundations
The traditional Spark‑ML pipeline suffers from large model size, slow deployment, scalability, consistency, reliability, and cost issues. Redis‑ML offers a simplified lifecycle illustrated with a random‑forest example.
Random Forest and Gini Impurity
A random forest is a collection of decision trees used for classification and regression. Splits are evaluated using criteria such as Gini impurity, information entropy, or log‑value measures.
Improved Gini impurity for a split is calculated as:
Example: Titanic Survival Prediction
A sample feature vector for a passenger (John) is evaluated by three trees (Survived, Died, Survived), resulting in a final prediction of "Survived".
Real‑World Challenge: Ad‑Service Scoring
Advertising services require 50 ms latency for 20 000 requests per second . Benchmarks show a home‑grown solution versus Redis‑ML, where Redis‑ML achieves sub‑millisecond inference, dramatically outperforming Spark‑based inference.
Case Study: Movie Recommendation System
The workflow consists of four steps:
Data acquisition – download the MovieLens 100K dataset.
Format conversion – transform each movie’s training data into a single line per user containing rating, user info, other‑movie ratings, and genre averages.
Model training and loading – train a Spark RandomForestClassifier (500 trees) and load the forest into Redis using the Redis‑ML API.
Inference – query Redis to obtain a rating prediction for a given user‑movie pair.
Docker commands to obtain the required images:
# docker pull shaynativ/redis-ml
# docker run --net=host shaynativ/redis-ml
# docker pull shaynativ/spark-redis-ml
# docker run --net=host shaynativ/spark-redis-mlScala snippet for training and loading the model:
// Create a new forest instance
val rf = new RandomForestClassifier()
.setFeatureSubsetStrategy("auto")
.setLabelCol("indexedLabel")
.setFeaturesCol("indexedFeatures")
.setNumTrees(500)
// Train model
val model = pipeline.fit(trainingData)
val rfModel = model.stages(2).asInstanceOf[RandomForestClassificationModel]
// Load the model to Redis
val f = new Forest(rfModel.trees)
f.loadToRedis("movie-10", "127.0.0.1")Python example for inference:
import redis
config = {"host": "localhost", "port": 6379}
r = redis.StrictRedis(**config)
user_profile = r.get("user_shay_profile")
print(user_profile) # e.g. 12:1.0,13:1.0,14:3.0,...,1817:0.06
result = r.execute_command("ML.FOREST.RUN", "movie-10", user_profile)
print(result) # '3'Redis CLI example:
> KEYS *
1) "movie-5"
2) "movie-1"
...
10) "movie-10"
11) "user_1_profile"
> ML.FOREST.RUN movie-10
... (feature vector) ...
'3'Performance Comparison
Redis‑ML inference latency: 0.64 ms (result = 3)
Spark ML inference latency: 46–49 ms (result ≈ 3.0)
Average speed‑up: ~61× faster for Redis‑ML.
References
Redis‑ML repository: https://github.com/RedisLabsModules/redis-ml
Spark‑Redis‑ML repository: https://github.com/RedisLabs/spark-redis-ml
Databricks notebook: http://bit.ly/sparkredisml
Docker images: https://hub.docker.com/r/shaynativ/redis-ml/ , https://hub.docker.com/r/shaynativ/spark-redis-ml/
Decision‑Tree theory (Gini impurity): http://wiki.swarma.net/index.php/%E5%86%B3%E7%AD%96%E6%A0%91#.E5.9F.BA.E5.B0.BC.E4.B8.8D.E7.BA.AF.E5.BA.A6.28Gini_impurity.29.E5.87.86.E5.88.99
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
dbaplus Community
Enterprise-level professional community for Database, BigData, and AIOps. Daily original articles, weekly online tech talks, monthly offline salons, and quarterly XCOPS&DAMS conferences—delivered by industry experts.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
