Big Data 11 min read

How Hologres Shared Cluster Powers Fine‑Grained Taobao Subscription Operations

This article explains how Alibaba's Hologres shared cluster enables Taobao's subscription system to perform precise content selection, improve recommendation quality, reduce data movement, and achieve sub‑second query performance for large‑scale, real‑time business scenarios.

Alibaba Cloud Big Data AI Platform
Alibaba Cloud Big Data AI Platform
Alibaba Cloud Big Data AI Platform
How Hologres Shared Cluster Powers Fine‑Grained Taobao Subscription Operations

1 Taobao Subscription Fine‑Grained Content Operations

Taobao Subscription is a double‑private‑domain product for users and merchants that complements recommendation‑based "Guess You Like" on the user side and builds a "My Likes" mindset, while on the merchant side it enables structured, automated high‑quality supply to help merchants operate fan memberships more effectively.

The initial workflow covered content publishing from merchant back‑ends, algorithmic distribution, front‑end consumption, and data feedback. To achieve finer‑grained operation and improve recommendation experience, a content‑feature selection system was built.

Feature Requirements for Recommendation

High‑quality content selection: Multi‑dimensional feature filtering for content distribution on the subscription front‑end.

Low‑quality content filtering: Features filter out pornographic, political, or meaningless content.

Feature Requirements for Content Operations

Core content display: Operators select a batch of core content for front‑end aggregation using the selection system.

Promotion atmosphere strengthening: Selected promotional content receives enhanced display during major sales.

Merchant traffic tilt: Core partner merchants' content is identified and given traffic priority on the front‑end.

2 Subscription Content Feature Selection Engine Selection

Content selection involves filtering massive multi‑dimensional metrics, requiring a precise and extensible design.

2.1 Current Architecture Design

The selection process is abstracted as content‑id + related‑id + multi‑dimensional filter, producing a set of target content IDs. It creates an activity instance containing a batch of content, configures filter schemas, and supplies filter values.

Thus the problem becomes translating filter schemas and values into executable query statements.

2.2 Engine Selection Core Requirements: Flexibility & High Performance

The selection engine must be easy to integrate, translate filter items into executable SQL, guarantee performance and stability for complex queries, and support flexible addition of new feature fields.

Simple integration to reduce translation complexity.

Performance and stability for fast response to evolving operational strategies.

Flexibility to accommodate changing feature dimensions.

After extensive research and comparison, the following table summarizes the evaluation of MaxCompute versus Hologres shared clusters.

Evaluation

MaxCompute

Hologres Shared Cluster

Flexibility

General – multi‑table joins require specifying table space.

High – can aggregate multi‑table joins within a single space.

Cost

Low

Medium – no data import/export needed; SSD cache acceleration.

Query Speed

Typical single query >15 s; stage‑file design with high fault tolerance.

Billions of rows, sub‑second query; in‑memory response.

3 Building the Subscription System with Hologres

3.1 Hologres Cluster: Less Data Movement + Faster Queries

Low usage cost: Quick instance creation, easy onboarding, and the ability to migrate to an independent cluster later.

Seamless development: SQL syntax aligns with standard SQL; one‑click table‑structure sync supports frequently changing schemas.

Reduced data movement: External tables read data from multiple MaxCompute projects, enabling cross‑project aggregation without import/export.

3.2 Efficiency Gains

Compared with MaxCompute, performance improves dramatically: billions‑level data with complex multi‑table JOINs execute in 8‑9 seconds; single‑table external queries take ~2 seconds; internal table queries ~60 ms.

UDF/expression push‑down reduces unnecessary data transfer and further boosts performance.

3.3 Best Practices for a Hologres‑Based Subscription System

The workflow is illustrated below:

Operators select filter items and values on the back‑end; the system automatically generates Hologres SQL (example below), executes it, returns results to the front‑end for display, and iteratively refines the selection based on performance.

SELECT feed_id
FROM qn_xxx_provider AS a
WHERE a.xxx_pv > 30000
  AND a.xxx_pctr > '0.1'
  AND a.last_publish_time >= '2022-06-17 08:00:00'
  AND a.biz_xxx_code = '111'
  AND a.ds = MAX_PT('xxxxxx_table')
  AND CAST(a.owner_xxx_id AS VARCHAR) IN (
        SELECT b.domain_xxx_id
        FROM xxxxxxx_table AS b
        WHERE b.rule_type = 12
          AND b.channel_xxx_id = 137
          AND b.dataset_xx_id = xxxxx
          AND b.ds = MAX_PT('xxxxx_odps_channel')
      )
  AND a.feed_id IN (
        SELECT feed_id
        FROM xxxxx_submission_feed_hh
        WHERE activity_id = 222
          AND approval_status = 1
          AND ds = MAX_PT('xxxxx_submission_hh')
          AND hh = '13'
      );

4 Business Value

Using the Hologres shared cluster, the Taobao subscription system has supported over 1,000 operational selection tasks, powered major promotions such as Double 11 and 618, enabled subscription play scenarios, simplified configuration of multiple secondary pages, and eliminated data import/export, allowing teams to focus on growth.

Future plans include incorporating real‑time features via Hologres internal tables and reducing reliance on GUC parameter tuning to improve productization.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

Big DataHologresContent Feature SelectionTaobao Subscription
Alibaba Cloud Big Data AI Platform
Written by

Alibaba Cloud Big Data AI Platform

The Alibaba Cloud Big Data AI Platform builds on Alibaba’s leading cloud infrastructure, big‑data and AI engineering capabilities, scenario algorithms, and extensive industry experience to offer enterprises and developers a one‑stop, cloud‑native big‑data and AI capability suite. It boosts AI development efficiency, enables large‑scale AI deployment across industries, and drives business value.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.