Big Data 13 min read

Building Real‑Time Recommendations with Kiji: A Hands‑On Guide

This article explains how to use the open‑source Kiji framework together with HBase, Avro, and MapReduce to build a scalable, entity‑centric real‑time recommendation system that can instantly refresh suggestions based on user context and recent interactions.

ITPUB
ITPUB
ITPUB
Building Real‑Time Recommendations with Kiji: A Hands‑On Guide

Why Real‑Time Recommendations Are Needed

Batch‑generated recommendations are refreshed only daily, weekly or monthly, which is too slow for use‑cases that require immediate context, such as location‑based suggestions or a user who suddenly changes genre preference. Real‑time recommendation engines must be able to read the latest user actions and produce a ranked list within milliseconds.

Kiji Framework Overview

Kiji is an open‑source, modular framework for building big‑data applications that need low‑latency, entity‑centric storage and real‑time scoring. It stores a 360° view of any entity (user, device, account, POS, etc.) in a single HBase row, using HBase’s four‑dimensional data model: row, column family, column qualifier, and timestamp. Multiple versions of a cell allow both slowly changing attributes (e.g., name, email) and high‑frequency event streams (e.g., clicks) to coexist.

Kiji relies on Apache Avro for schema‑driven serialization, so application code works with native Java/Scala types while Avro handles the byte‑array conversion required by HBase.

Kiji architecture diagram
Kiji architecture diagram

Model Development: Batch Training and Real‑Time Scoring

Kiji provides two parallel APIs:

Java API (KijiMR) – builds classic MapReduce jobs.

Scala API (KijiExpress) – wraps Scalding to write concise data‑flow pipelines.

Typical workflow:

Run a batch job over the full HBase data set to train a model. The output can be:

Linear classifier weights

Cluster centroids for k‑means

A product‑product similarity matrix for collaborative filtering.

Publish the trained artifacts to a managed Maven repository and record metadata (model name, version, required columns, freshness policy) in a dedicated Kiji system table (an HBase table).

Deploy the model without stopping the service; KijiScoring servers poll the model library and load/unload models on demand.

KijiScoring and the Freshener Mechanism

KijiScoring implements lazy, on‑demand computation. When a client requests a recommendation for an entity, the server checks the FreshnessPolicy attached to the target column:

If the data is still fresh (e.g., less than one hour old or fewer than N events have occurred), the stored score is returned immediately.

If the data is stale, the server runs the associated ScoringFunction , which combines the trained model with the entity’s latest attributes, writes the new score back to HBase, and returns it to the client.

The Freshener is the composition of a ScoringFunction and a FreshnessPolicy. Fresheners are bound to specific columns, enabling fine‑grained control over which pieces of derived data are recomputed.

Freshener architecture
Freshener architecture

Deploying Models in Production

The model library stores:

Metadata describing the model (name, version, required columns, freshness policy).

References to the compiled JARs stored in a Maven repository.

Optional external resources (e.g., pre‑computed similarity files).

KijiScoring servers periodically query the library, download new JARs if a newer version is registered, and instantiate the ScoringFunction classes. Because the servers are stateless Java processes, they can be horizontally scaled behind a load balancer.

Collaborative Filtering with a Similarity Matrix

For item‑based collaborative filtering, a batch job computes a similarity matrix S where S[i][j] is the similarity between product i and product j. The matrix is stored either as flat files or in a Kiji table, with each product’s row containing a column that holds the list of its most similar items (often truncated to the top‑K entries to limit size).

During real‑time scoring, the KeyValueStore API reads only the rows needed for the current user, avoiding loading the entire matrix into memory.

Similarity matrix example
Similarity matrix example

Personalized Scoring Example

A typical personalized ScoringFunction performs the following steps:

// Pseudocode for a Scala ScoringFunction
def score(userId: String): List[Recommendation] = {
  // 1. Retrieve the user's most recent ratings from the entity row
  val recentRatings = getUserRatings(userId)

  // 2. For each rated product, fetch its top‑K similar items from the similarity matrix
  val candidateItems = recentRatings.flatMap { case (productId, rating) =>
    val similar = kvStore.getSimilar(productId, k = 50)
    similar.map { case (simProduct, simScore) =>
      // 3. Weight similarity by the user's rating
      (simProduct, rating * simScore)
    }
  }

  // 4. Aggregate scores per candidate item and keep the highest‑scoring ones
  val aggregated = candidateItems.groupBy(_._1).mapValues(_.map(_._2).sum)
  aggregated.toList.sortBy(-_._2).take(10).map { case (item, score) =>
    Recommendation(item, score)
  }
}

By limiting the number of similar items retrieved per rated product, the function runs in sub‑second latency even for catalogs with millions of items.

Key Takeaways

The Kiji stack enables a complete real‑time recommendation pipeline:

Entity‑centric storage in HBase provides low‑latency reads/writes of both static attributes and event streams.

Avro handles schema evolution and efficient binary serialization.

Batch jobs (MapReduce/Scalding) generate models and large auxiliary structures such as similarity matrices.

KijiScoring with Fresheners delivers lazy, on‑demand scoring while keeping derived data in HBase for future reuse.

Model metadata and Maven‑hosted artifacts allow seamless, zero‑downtime model upgrades.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

HBaseScalareal-time recommendationAvroKiji
ITPUB
Written by

ITPUB

Official ITPUB account sharing technical insights, community news, and exciting events.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.