Real-Time Machine Learning with Redis-ML: From Concepts to Movie Recommendations

This article explains how Redis-ML enables real‑time machine learning by extending Redis with modules, covering its architecture, random‑forest inference, performance benchmarks against Spark, and a complete movie‑recommendation use case with Docker, Scala, and Python examples.

dbaplus Community
dbaplus Community
dbaplus Community
Real-Time Machine Learning with Redis-ML: From Concepts to Movie Recommendations

Background

RedisConf 2017 introduced new Redis modules such as Redisearch, Redis‑ML, and Amazon ElastiCache for Redis. Shay Nativ (RedisLabs) presented a case study on real‑time machine learning using the Redis‑ML module.

Redis Core Advantages

Redis provides high performance, a rich set of native data structures, and extensibility through Redis Modules, which allow C/C++ programs to run inside Redis, add new data structures, and retain Redis's speed, scalability, and high availability.

Redis Modules

Any C/C++ program can run inside Redis.

Modules extend Redis with native functionality and new data structures.

They preserve Redis's speed, scalability, and high availability.

Modules can be created by anyone.

Adapt your database to your data, not the other way around.

Redis‑ML Overview

Redis‑ML is an in‑memory model server for inference. Trained model parameters are stored in Redis (hot mode) and evaluated directly inside the database. It can integrate existing C/C++ ML libraries, supports runtime adjustments, and inherits Redis's performance and HA features.

Machine‑Learning Foundations

The traditional Spark‑ML pipeline suffers from large model size, slow deployment, scalability, consistency, reliability, and cost issues. Redis‑ML offers a simplified lifecycle illustrated with a random‑forest example.

Random Forest and Gini Impurity

A random forest is a collection of decision trees used for classification and regression. Splits are evaluated using criteria such as Gini impurity, information entropy, or log‑value measures.

Improved Gini impurity for a split is calculated as:

Example: Titanic Survival Prediction

A sample feature vector for a passenger (John) is evaluated by three trees (Survived, Died, Survived), resulting in a final prediction of "Survived".

Real‑World Challenge: Ad‑Service Scoring

Advertising services require 50 ms latency for 20 000 requests per second . Benchmarks show a home‑grown solution versus Redis‑ML, where Redis‑ML achieves sub‑millisecond inference, dramatically outperforming Spark‑based inference.

Case Study: Movie Recommendation System

The workflow consists of four steps:

Data acquisition – download the MovieLens 100K dataset.

Format conversion – transform each movie’s training data into a single line per user containing rating, user info, other‑movie ratings, and genre averages.

Model training and loading – train a Spark RandomForestClassifier (500 trees) and load the forest into Redis using the Redis‑ML API.

Inference – query Redis to obtain a rating prediction for a given user‑movie pair.

Docker commands to obtain the required images:

# docker pull shaynativ/redis-ml
# docker run --net=host shaynativ/redis-ml
# docker pull shaynativ/spark-redis-ml
# docker run --net=host shaynativ/spark-redis-ml

Scala snippet for training and loading the model:

// Create a new forest instance
val rf = new RandomForestClassifier()
  .setFeatureSubsetStrategy("auto")
  .setLabelCol("indexedLabel")
  .setFeaturesCol("indexedFeatures")
  .setNumTrees(500)

// Train model
val model = pipeline.fit(trainingData)
val rfModel = model.stages(2).asInstanceOf[RandomForestClassificationModel]

// Load the model to Redis
val f = new Forest(rfModel.trees)
f.loadToRedis("movie-10", "127.0.0.1")

Python example for inference:

import redis
config = {"host": "localhost", "port": 6379}
r = redis.StrictRedis(**config)
user_profile = r.get("user_shay_profile")
print(user_profile)  # e.g. 12:1.0,13:1.0,14:3.0,...,1817:0.06
result = r.execute_command("ML.FOREST.RUN", "movie-10", user_profile)
print(result)  # '3'

Redis CLI example:

> KEYS *
1) "movie-5"
2) "movie-1"
... 
10) "movie-10"
11) "user_1_profile"

> ML.FOREST.RUN movie-10
... (feature vector) ...
'3'

Performance Comparison

Redis‑ML inference latency: 0.64 ms (result = 3)

Spark ML inference latency: 46–49 ms (result ≈ 3.0)

Average speed‑up: ~61× faster for Redis‑ML.

References

Redis‑ML repository: https://github.com/RedisLabsModules/redis-ml

Spark‑Redis‑ML repository: https://github.com/RedisLabs/spark-redis-ml

Databricks notebook: http://bit.ly/sparkredisml

Docker images: https://hub.docker.com/r/shaynativ/redis-ml/ , https://hub.docker.com/r/shaynativ/spark-redis-ml/

Decision‑Tree theory (Gini impurity): http://wiki.swarma.net/index.php/%E5%86%B3%E7%AD%96%E6%A0%91#.E5.9F.BA.E5.B0.BC.E4.B8.8D.E7.BA.AF.E5.BA.A6.28Gini_impurity.29.E5.87.86.E5.88.99

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

DockerReal-TimeRedisRandom ForestRedis-ML
dbaplus Community
Written by

dbaplus Community

Enterprise-level professional community for Database, BigData, and AIOps. Daily original articles, weekly online tech talks, monthly offline salons, and quarterly XCOPS&DAMS conferences—delivered by industry experts.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.