Artificial Intelligence 14 min read

Exploring Personalized Recommendation at Kuaikan Comics: Business, Algorithms, and System Architecture

This article details Kuaikan Comics' personalized recommendation pipeline, covering business context, diverse content formats, technical challenges, content‑based and collaborative‑filtering methods, ranking models, system architecture, A/B testing, and future directions for improving recommendation quality.

DataFunTalk

Sep 12, 2019

Exploring Personalized Recommendation at Kuaikan Comics: Business, Algorithms, and System Architecture

Business Overview

Kuaikan Comics, founded in 2014, hosts over 200 million users (100 million registered) with 40 million daily active users, primarily Gen‑Z. The platform offers both long‑form comics and short‑form UGC (posts, videos), requiring a recommendation system that can handle both content types.

Recommendation Scenarios

The main recommendation touchpoints include the home personalized tab, discovery tab, world tab, and bottom‑of‑page recommendations, serving a mix of long and short content.

Technical Challenges

Capturing continuity, periodicity, and multiple interest points in long‑form comics.

Fusing long and short content in a unified pipeline.

Understanding diverse visual styles (e.g., distinguishing campus vs. urban scenes) and community‑specific language.

Content Types

Short content: brief, fragmented consumption, single interest point.

Long content: multi‑chapter, extended reading time, multiple interest points.

Tagging System

Kuaikan maintains a three‑dimensional tag hierarchy: basic work tags (e.g., comedy, youth), distribution tags (e.g., male, female, teen), and creation tags (e.g., sibling, student). Building a consistent tag set is labor‑intensive.

User Interest Model

The model incorporates user actions (follow, like, comment, share), fine‑grained behavior (down to specific chapters), interest decay, and item popularity.

Content‑Based Recommendation

Early efforts relied on manually curated tags to construct item and user profiles, yielding interpretable results. However, this approach suffers from heavy tag dependence, coarse granularity, and limited novelty. The first content‑based rollout increased DAU reading frequency by 35 %.

Collaborative Filtering

Three CF variants were deployed: item‑based, user‑based, and model‑based, all using K‑Nearest‑Neighbor (KNN) with Faiss as the ANN engine (chosen over Nmslib for GPU support). User‑based CF required massive similarity calculations, mitigated by Faiss’s IVF/HNSW indexes.

Ranking Models

Various CTR prediction models were evaluated: Logistic Regression (LR), Factorization Machines (FM/FFM), Gradient Boosted Decision Trees (XGBoost), and Deep Neural Networks (DNN). LR offers simplicity and good handling of discrete features; XGBoost provides automatic feature interactions; FM/FFM automate feature crossing but are computationally heavy; DNNs achieve high accuracy but lack interpretability. Ultimately, XGBoost was selected for its balance of performance and engineering effort.

System Architecture

The architecture mirrors a three‑layer design:

Near‑line layer: real‑time logs via Kafka → Flink produce dynamic user profiles and documents.

Offline layer: data ingested with Sqoop to HDFS, processed with Spark for feature engineering, model training, and vector indexing.

Online layer: real‑time recall, ranking, and serving to iOS/Android clients.

Tooling layer: tag‑weight models, result tracking, and monitoring dashboards.

A/B Testing Platform

A comprehensive AB platform supports device, user, and traffic randomization, orthogonal experiments, and mutually exclusive groups, with configurable metrics and significance analysis.

Recommendation Result Tracking

A full‑stack trace system records snapshot user profiles and context in HBase, enabling root‑cause analysis of good and bad recommendation cases.

Summary & Future Plans

The current system combines content‑based and collaborative filtering with XGBoost ranking, delivering a 31‑36 % lift in DAU reading frequency. Future work includes deeper visual and textual content understanding for comics and transitioning to deep‑learning recommendation models to further improve relevance.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

system architecture Machine Learning CTR Prediction recommendation system A/B testing collaborative filtering content-based filtering

Written by

DataFunTalk

Dedicated to sharing and discussing big data and AI technology applications, aiming to empower a million data scientists. Regularly hosts live tech talks and curates articles on big data, recommendation/search algorithms, advertising algorithms, NLP, intelligent risk control, autonomous driving, and machine learning/deep learning.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.