Designing System & Personalized Recommendations Using Mahout

This article outlines the design of both system-wide and personalized recommendation modules for e‑commerce platforms, explains three recommendation approaches (demographic, content‑based, collaborative filtering), details the implementation of Mahout’s collaborative‑filtering algorithm with Java code, discusses data sources, technology stack, algorithm choices, and solutions to the cold‑start problem.

21CTO
21CTO
21CTO
Designing System & Personalized Recommendations Using Mahout

Introduction

Current e‑commerce platforms such as Taobao and JD.com use two recommendation modules: system recommendation and personalized recommendation.

System Recommendation

System recommendation provides the same items to all users, either static items set by administrators or popular items derived from aggregate user feedback. Its purpose is to promote sales of currently popular, promotional, or newly launched products.

Implementation includes automated recommendation and manual configuration. Automated factors may include release time, category, stock, purchase count, cart count, view count, discount amount, etc. Manual configuration is done via an operation page where operators set ranking position, start/end time, description, and other parameters.

Personalized Recommendation

Personalized recommendation gives each user more precise suggestions based on their tastes, requiring knowledge of item attributes and user characteristics, or leveraging social networks to find similar users.

Three recommendation modes are described:

Demographic‑based recommendation (gender, age, income, education, profession).

Content‑based recommendation (items similar in content, e.g., movies).

Collaborative filtering (based on user preferences).

Images illustrate content‑based and collaborative‑filtering examples.

User Preference Design

User preference factors include purchase history, cart history, search history, and browsing history, each assigned a weight (e.g., purchase = 10, cart = 8, search = 5, browse = 6). The final preference score for an item is a weighted sum.

Mahout Overview

Mahout is a powerful distributed machine‑learning library that provides collaborative‑filtering implementations (Taste), classification, clustering, etc., built on Hadoop’s MapReduce.

Mahout Collaborative‑Filtering Example

Dependencies:

<dependency>
  <groupId>org.apache.mahout</groupId>
  <artifactId>mahout-core</artifactId>
  <version>0.9</version>
</dependency>
<dependency>
  <groupId>org.apache.mahout</groupId>
  <artifactId>mahout-math</artifactId>
  <version>0.9</version>
</dependency>
<dependency>
  <groupId>org.apache.hadoop</groupId>
  <artifactId>hadoop-core</artifactId>
  <version>1.2.1</version>
</dependency>

Java implementation (single‑node mode):

public static void main(String[] args) {
    try {
        DataModel model = new FileDataModel(new File("D:\\mahout\\data.csv"));
        UserSimilarity similarity = new PearsonCorrelationSimilarity(model);
        UserNeighborhood neighborhood = new NearestNUserNeighborhood(2, similarity, model);
        Recommender recommender = new GenericUserBasedRecommender(model, neighborhood, similarity);
        List<RecommendedItem> recommendations = recommender.recommend(1, 2);
        for (RecommendedItem recommendation : recommendations) {
            System.out.println(recommendation);
        }
    } catch (Exception e) {
        System.out.println(e);
    }
}

Sample data file (userId,itemId,rating) and a screenshot of the output are shown.

Algorithm Choice

Mahout provides several recommenders: GenericUserBasedRecommender, GenericItemBasedRecommender, SlopeOneRecommender, SVDRecommender, KNNRecommender, TreeClusteringRecommender, etc. The project selects GenericUserBasedRecommender for its simplicity.

Data Sources

DataModel implementations include JDBC, File, HBase, Cassandra, MongoDB, and others. No HDFS model is provided out‑of‑the‑box.

Technology Stack

The solution combines Mahout (recommendation algorithms), Spark (parallel computation), Hadoop, and Elasticsearch.

Cold‑Start Problem

When a new user or item appears, insufficient data prevents model computation. Possible solutions: use demographic registration info, solicit initial ratings, expert classification, random recommendation, or average‑value imputation. The first approach is recommended for this project.

e-commercemachine learningrecommendationcollaborative filteringMahout
21CTO
Written by

21CTO

21CTO (21CTO.com) offers developers community, training, and services, making it your go‑to learning and service platform.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.