Designing System and Personalized Recommendation Engines with Mahout and Spark

This article explains the architecture of both system-wide and personalized recommendation modules, compares three recommendation strategies, details the use of Apache Mahout for collaborative filtering with Java code examples, and discusses cold‑start solutions within a Spark‑Hadoop stack.

21CTO
21CTO
21CTO
Designing System and Personalized Recommendation Engines with Mahout and Spark

1. Introduction

Existing e‑commerce platforms such as Taobao and JD.com use two main recommendation modules: system recommendation, which provides the same popular items to all users, and personalized recommendation, which tailors results to individual tastes.

2. System Recommendation

2.1 Purpose

Recommend currently popular, promotional, or newly launched products to all users to boost sales.

2.2 Implementation

Two approaches are used: automated recommendation based on item attributes (release time, category, inventory, purchase history, etc.) and manual configuration by operators (ranking position, start/end time, description).

3. Personalized Recommendation

3.1 Purpose

Provide precise recommendations by analyzing user preferences, item characteristics, or social network similarity to increase conversion.

3.2 Recommendation Modes

The engine can operate in three modes:

Demographic‑based: uses gender, age range, income, education, profession, etc.

Content‑based: recommends items similar in content (e.g., movies with similar attributes).

Collaborative filtering: recommends items based on similar users' behavior.

These modes can be used alone or combined; collaborative filtering is preferred for our scenario, though it introduces a cold‑start problem.

3.3 User Preference Design

Factors such as purchase history, cart additions, searches, and page views are weighted (e.g., purchase = 10, cart = 8, search = 5, view = 6) to compute a numeric preference score for each item.

3.4 Mahout Overview

Mahout is a powerful distributed machine‑learning library built on Hadoop. It provides the Taste engine for collaborative filtering as well as classification and clustering algorithms. In this project we only use Mahout’s recommendation APIs.

3.5 Mahout Collaborative‑Filtering Example

Dependencies

<dependency>
  <groupId>org.apache.mahout</groupId>
  <artifactId>mahout-core</artifactId>
  <version>0.9</version>
</dependency>
<dependency>
  <groupId>org.apache.mahout</groupId>
  <artifactId>mahout-math</artifactId>
  <version>0.9</version>
</dependency>
<dependency>
  <groupId>org.apache.hadoop</groupId>
  <artifactId>hadoop-core</artifactId>
  <version>1.2.1</version>
</dependency>

Implementation Code

public static void main(String[] args) {
    try {
        // Load data
        DataModel model = new FileDataModel(new File("D:\\mahout\\data.csv"));
        // User similarity (Pearson)
        UserSimilarity similarity = new PearsonCorrelationSimilarity(model);
        // Neighborhood size = 2
        UserNeighborhood neighborhood = new NearestNUserNeighborhood(2, similarity, model);
        // User‑based recommender
        Recommender recommender = new GenericUserBasedRecommender(model, neighborhood, similarity);
        // Get 2 recommendations for user 1
        List<RecommendedItem> recommendations = recommender.recommend(1, 2);
        for (RecommendedItem recommendation : recommendations) {
            System.out.println(recommendation);
        }
    } catch (Exception e) {
        System.out.println(e);
    }
}

data.csv (userId,itemId,rating)

1,101,5
1,102,3
1,103,2.5
2,101,2
2,102,2.5
2,103,5
2,104,2
3,101,2.5
3,104,4
3,105,4.5
3,107,5
4,101,5
4,103,3
4,104,4.5
4,106,4
5,101,4
5,102,3
5,103,2
5,104,4
5,105,3.5
5,106,4

Result

3.6 Mahout Recommendation Algorithms

Key recommenders include GenericUserBasedRecommender, GenericItemBasedRecommender, SlopeOneRecommender, SVDRecommender, KnnRecommender, and TreeClusteringRecommender. The first two are the focus of this article.

3.7 Data Source Options

Mahout’s DataModel can be backed by JDBC, files, HBase, Cassandra, MongoDB, PostgreSQL, MySQL, etc. No native HDFS model exists yet.

3.8 Technical Stack

The implementation combines Mahout (recommendation algorithms) with Spark (parallel computation), Hadoop (storage/processing) and Elasticsearch (search).

3.9 Cold‑Start Problem

When a new user or item appears, insufficient interaction data prevents model building. Solutions include using registration demographics, soliciting initial ratings, expert labeling, random recommendation, or average‑value imputation. For our case, demographic‑based initialization is preferred.

recommendation systemcollaborative filteringcold startSparkMahout
21CTO
Written by

21CTO

21CTO (21CTO.com) offers developers community, training, and services, making it your go‑to learning and service platform.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.