Big Data 8 min read

Boosting Product Recommendations with Serverless Spark and Milvus: A Real‑World Case Study

蝉妈妈 migrated its recommendation platform to Alibaba Cloud Serverless Spark and Milvus, replacing traditional vector search and Spark clusters, achieving 40% faster offline tasks, 80% lower failure rates, significant cost savings, and scalable, low‑latency similar‑product retrieval for personalized marketing.

Alibaba Cloud Big Data AI Platform
Alibaba Cloud Big Data AI Platform
Alibaba Cloud Big Data AI Platform
Boosting Product Recommendations with Serverless Spark and Milvus: A Real‑World Case Study

Background

Chán Māmā (蝉妈妈) offers a talent product selection service platform called Chán Xuǎn, aiming to help creators earn money by providing high‑commission, stable, and fast‑response product recommendations.

Business Features

Personalized recommendation: Uses big data and AI algorithms to provide customized product suggestions based on user interests and behavior.

Data‑driven: Analyzes users and market trends to optimize recommendation strategies and improve satisfaction.

Precise marketing: Enables merchants to conduct effective product promotion through accurate user profiling.

Efficient search: Offers powerful search capabilities for quickly finding desired products.

Pain Points of the Original Architecture

Product‑level issues

Reliance on traditional search for vector similarity caused performance bottlenecks, low storage efficiency, high complexity, costly updates, and heavy resource consumption.

Spark cluster issues

Cluster stability required manual monitoring and maintenance.

Lack of acceleration technologies like Fusion led to slower task execution.

High operational burden for configuration, monitoring, and troubleshooting.

Inflexible resource allocation caused waste.

Costs persisted even during idle periods.

Complexity increased due to underlying infrastructure management.

Why Choose Alibaba Cloud Serverless Spark & Milvus

Comprehensive services: Full monitoring and alerting for real‑time task status and performance.

Managed elastic scaling: Automatic resource adjustment based on workload.

Cluster stability: High reliability managed by the cloud provider.

Elastic resource management: Pay‑as‑you‑go to avoid waste.

Pay‑per‑use: Only pay for actual resource consumption, reducing costs.

Rapid startup: No pre‑configuration needed for quick task launch.

Automatic scaling: Resources adjust automatically with workload.

Performance optimization: Serverless Spark uses Fusion acceleration; Milvus delivers high‑performance, large‑scale vector retrieval.

Technical Solution Design

Architecture Diagram

Architecture diagram
Architecture diagram

Business Scenario

In Serverless Spark, periodic offline jobs extract product data from StarRocks, convert product titles to vector representations via a machine‑learning model, and write the vectors together with other product info into Alibaba Cloud Milvus. Milvus stores and manages the vectors, supporting fast similarity search. An external data interface allows users to query similar products by providing a product or its features, enabling large‑scale, low‑latency recommendation and personalized marketing.

Key Service Components

Serverless Spark: EMR Serverless Spark is a high‑performance Lakehouse product for Data+AI, offering end‑to‑end data platform services (development, debugging, scheduling, operations) with full compatibility to open‑source Spark.

Milvus Vector Search Service: Alibaba Cloud Milvus is a fully managed, cloud‑native vector search engine compatible with open‑source Milvus, providing low‑cost, high‑availability similarity search for massive vector data, supporting multimodal retrieval, RAG, and large‑model AI scenarios.

Benefits After Migration

Performance: Offline task duration reduced by 40%, enabling earlier report generation.

Stability: Task failure rate dropped by 80%.

Operational flexibility: Resources automatically scale with business demand.

Cost‑effectiveness: Pay‑per‑use model eliminates idle resource costs and offers various resource packages for further savings.

Milvus advantages: 75% cost reduction compared to traditional search, significantly improved retrieval speed for high‑dimensional vectors, and support for larger data volumes and faster query responses.

Future Expectations

We hope Serverless Spark will fully support Spark Launcher for seamless task migration to a fully managed environment.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

Big Datarecommendation systemMilvusvector search
Alibaba Cloud Big Data AI Platform
Written by

Alibaba Cloud Big Data AI Platform

The Alibaba Cloud Big Data AI Platform builds on Alibaba’s leading cloud infrastructure, big‑data and AI engineering capabilities, scenario algorithms, and extensive industry experience to offer enterprises and developers a one‑stop, cloud‑native big‑data and AI capability suite. It boosts AI development efficiency, enables large‑scale AI deployment across industries, and drives business value.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.