Exploring Personalized Recommendation at Kuaikan Comics: Business, Algorithms, and System Architecture
This article details Kuaikan Comics' personalized recommendation pipeline, covering business context, diverse content formats, technical challenges, content‑based and collaborative‑filtering methods, ranking models, system architecture, A/B testing, and future directions for improving recommendation quality.
Business Overview
Kuaikan Comics, founded in 2014, hosts over 200 million users (100 million registered) with 40 million daily active users, primarily Gen‑Z. The platform offers both long‑form comics and short‑form UGC (posts, videos), requiring a recommendation system that can handle both content types.
Recommendation Scenarios
The main recommendation touchpoints include the home personalized tab, discovery tab, world tab, and bottom‑of‑page recommendations, serving a mix of long and short content.
Technical Challenges
Capturing continuity, periodicity, and multiple interest points in long‑form comics.
Fusing long and short content in a unified pipeline.
Understanding diverse visual styles (e.g., distinguishing campus vs. urban scenes) and community‑specific language.
Content Types
Short content: brief, fragmented consumption, single interest point.
Long content: multi‑chapter, extended reading time, multiple interest points.
Tagging System
Kuaikan maintains a three‑dimensional tag hierarchy: basic work tags (e.g., comedy, youth), distribution tags (e.g., male, female, teen), and creation tags (e.g., sibling, student). Building a consistent tag set is labor‑intensive.
User Interest Model
The model incorporates user actions (follow, like, comment, share), fine‑grained behavior (down to specific chapters), interest decay, and item popularity.
Content‑Based Recommendation
Early efforts relied on manually curated tags to construct item and user profiles, yielding interpretable results. However, this approach suffers from heavy tag dependence, coarse granularity, and limited novelty. The first content‑based rollout increased DAU reading frequency by 35 %.
Collaborative Filtering
Three CF variants were deployed: item‑based, user‑based, and model‑based, all using K‑Nearest‑Neighbor (KNN) with Faiss as the ANN engine (chosen over Nmslib for GPU support). User‑based CF required massive similarity calculations, mitigated by Faiss’s IVF/HNSW indexes.
Ranking Models
Various CTR prediction models were evaluated: Logistic Regression (LR), Factorization Machines (FM/FFM), Gradient Boosted Decision Trees (XGBoost), and Deep Neural Networks (DNN). LR offers simplicity and good handling of discrete features; XGBoost provides automatic feature interactions; FM/FFM automate feature crossing but are computationally heavy; DNNs achieve high accuracy but lack interpretability. Ultimately, XGBoost was selected for its balance of performance and engineering effort.
System Architecture
The architecture mirrors a three‑layer design:
Near‑line layer: real‑time logs via Kafka → Flink produce dynamic user profiles and documents.
Offline layer: data ingested with Sqoop to HDFS, processed with Spark for feature engineering, model training, and vector indexing.
Online layer: real‑time recall, ranking, and serving to iOS/Android clients.
Tooling layer: tag‑weight models, result tracking, and monitoring dashboards.
A/B Testing Platform
A comprehensive AB platform supports device, user, and traffic randomization, orthogonal experiments, and mutually exclusive groups, with configurable metrics and significance analysis.
Recommendation Result Tracking
A full‑stack trace system records snapshot user profiles and context in HBase, enabling root‑cause analysis of good and bad recommendation cases.
Summary & Future Plans
The current system combines content‑based and collaborative filtering with XGBoost ranking, delivering a 31‑36 % lift in DAU reading frequency. Future work includes deeper visual and textual content understanding for comics and transitioning to deep‑learning recommendation models to further improve relevance.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
DataFunTalk
Dedicated to sharing and discussing big data and AI technology applications, aiming to empower a million data scientists. Regularly hosts live tech talks and curates articles on big data, recommendation/search algorithms, advertising algorithms, NLP, intelligent risk control, autonomous driving, and machine learning/deep learning.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
