Artificial Intelligence 15 min read

Designing System and Personalized Recommendation Engines with Mahout and Spark

This article explains the architecture of both system-wide and personalized recommendation modules, compares three recommendation strategies, details the use of Apache Mahout for collaborative filtering with Java code examples, and discusses cold‑start solutions within a Spark‑Hadoop stack.

21CTO

Apr 12, 2016

Designing System and Personalized Recommendation Engines with Mahout and Spark

1. Introduction

Existing e‑commerce platforms such as Taobao and JD.com use two main recommendation modules: system recommendation, which provides the same popular items to all users, and personalized recommendation, which tailors results to individual tastes.

2. System Recommendation

2.1 Purpose

Recommend currently popular, promotional, or newly launched products to all users to boost sales.

2.2 Implementation

Two approaches are used: automated recommendation based on item attributes (release time, category, inventory, purchase history, etc.) and manual configuration by operators (ranking position, start/end time, description).

3. Personalized Recommendation

3.1 Purpose

Provide precise recommendations by analyzing user preferences, item characteristics, or social network similarity to increase conversion.

3.2 Recommendation Modes

The engine can operate in three modes:

Demographic‑based: uses gender, age range, income, education, profession, etc.

Content‑based: recommends items similar in content (e.g., movies with similar attributes).

Collaborative filtering: recommends items based on similar users' behavior.

These modes can be used alone or combined; collaborative filtering is preferred for our scenario, though it introduces a cold‑start problem.

3.3 User Preference Design

Factors such as purchase history, cart additions, searches, and page views are weighted (e.g., purchase = 10, cart = 8, search = 5, view = 6) to compute a numeric preference score for each item.

3.4 Mahout Overview

Mahout is a powerful distributed machine‑learning library built on Hadoop. It provides the Taste engine for collaborative filtering as well as classification and clustering algorithms. In this project we only use Mahout’s recommendation APIs.

3.5 Mahout Collaborative‑Filtering Example

Dependencies

<dependency>
  <groupId>org.apache.mahout</groupId>
  <artifactId>mahout-core</artifactId>
  <version>0.9</version>
</dependency>
<dependency>
  <groupId>org.apache.mahout</groupId>
  <artifactId>mahout-math</artifactId>
  <version>0.9</version>
</dependency>
<dependency>
  <groupId>org.apache.hadoop</groupId>
  <artifactId>hadoop-core</artifactId>
  <version>1.2.1</version>
</dependency>

Implementation Code

public static void main(String[] args) {
    try {
        // Load data
        DataModel model = new FileDataModel(new File("D:\\mahout\\data.csv"));
        // User similarity (Pearson)
        UserSimilarity similarity = new PearsonCorrelationSimilarity(model);
        // Neighborhood size = 2
        UserNeighborhood neighborhood = new NearestNUserNeighborhood(2, similarity, model);
        // User‑based recommender
        Recommender recommender = new GenericUserBasedRecommender(model, neighborhood, similarity);
        // Get 2 recommendations for user 1
        List<RecommendedItem> recommendations = recommender.recommend(1, 2);
        for (RecommendedItem recommendation : recommendations) {
            System.out.println(recommendation);
        }
    } catch (Exception e) {
        System.out.println(e);
    }
}

data.csv (userId,itemId,rating)

1,101,5
1,102,3
1,103,2.5
2,101,2
2,102,2.5
2,103,5
2,104,2
3,101,2.5
3,104,4
3,105,4.5
3,107,5
4,101,5
4,103,3
4,104,4.5
4,106,4
5,101,4
5,102,3
5,103,2
5,104,4
5,105,3.5
5,106,4

Result

3.6 Mahout Recommendation Algorithms

Key recommenders include GenericUserBasedRecommender, GenericItemBasedRecommender, SlopeOneRecommender, SVDRecommender, KnnRecommender, and TreeClusteringRecommender. The first two are the focus of this article.

3.7 Data Source Options

Mahout’s DataModel can be backed by JDBC, files, HBase, Cassandra, MongoDB, PostgreSQL, MySQL, etc. No native HDFS model exists yet.

3.8 Technical Stack

The implementation combines Mahout (recommendation algorithms) with Spark (parallel computation), Hadoop (storage/processing) and Elasticsearch (search).

3.9 Cold‑Start Problem

When a new user or item appears, insufficient interaction data prevents model building. Solutions include using registration demographics, soliciting initial ratings, expert labeling, random recommendation, or average‑value imputation. For our case, demographic‑based initialization is preferred.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

recommendation system collaborative filtering cold-start Spark Mahout

Written by

21CTO

21CTO (21CTO.com) offers developers community, training, and services, making it your go‑to learning and service platform.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.