Big Data 12 min read

Architecture and Data Flow of the Chinese Almanac Headline Recommendation System

The article describes the design, storage, update mechanisms, and optimization strategies of a headline recommendation platform that aggregates various data types using algorithms, MySQL, Redis, and a modular data‑fetching framework to achieve scalable and efficient content delivery.

Architecture Digest

Sep 11, 2017

Architecture and Data Flow of the Chinese Almanac Headline Recommendation System

The headline data of the Chinese Almanac is aggregated by recommendation algorithms and includes ALS algorithm data, user profile data, time‑sensitive data, non‑time‑sensitive data, fixed‑investment data, surprise data, channel data, hot‑list data, and related‑reading recommendation data. Startup modes are cold start and user‑profile start.

Cold start: no user profile or profile score < 8.

User profile: tags generated from user browsing, e.g., entertainment285L, travel1127L.

Time‑sensitive data: disappears over time, such as news and entertainment.

Non‑time‑sensitive data: persists long‑term, e.g., health.

Fixed‑investment data: manually placed via admin backend, e.g., ads, posts.

Surprise data: data excluding user profiles.

Channel data: combination of multiple tags; a channel groups several tags.

Hot‑list data: high‑scoring items calculated from real‑time click logs.

Related‑reading recommendation data: correlated items derived from real‑time click logs.

Data is fetched from partners via scheduled third‑party API calls, classified by channel tags, and stored in a MySQL database. The headline service periodically reloads data from MySQL into Redis and then into local memory, where aggregation assembles the final recommendation set.

Two reload steps are used to reduce database connection pressure and ensure consistency across horizontally scaled service nodes, leveraging Redis’s higher concurrency and faster access compared to MySQL.

In local memory, data is placed into distinct pools, each with a specific structure:

New pool: stores newly fetched non‑time‑sensitive data – Set<Long> Old pool: stores items with click and PV metrics – List<Long> Video pool: stores all video items – List<WnlLifeCardItemBean> Non‑time‑sensitive tag pool: stores IDs for non‑time‑sensitive entries – Multimap<Long, Long> Time‑sensitive tag pool: stores IDs for time‑sensitive entries – Multimap<Long, Long> Almanac pool, Zodiac pool, Future‑reminder pool, etc., all using List<WnlLifeCardItemBean> TotalMap: a map of all IDs to bean objects.

Additional recommendation data from the big‑data platform resides in Redis as Set<Long>. The bean WnlLifeCardItemBean represents a headline object; Long values are either bean IDs or tag IDs.

Early data update relied on two places: Redis and local memory. Spring Quartz jobs periodically read from Redis to sync to local memory, and a separate background module reads from MySQL to sync to Redis. Per‑second PV and click updates also run as scheduled tasks.

Issues identified include:

Ensuring data consistency across many API nodes.

Targeted updates without full reloads.

Separating API scheduled tasks into background modules.

Real‑time response to data changes.

Problems encountered:

Data loss during concurrent Redis updates and local reloads.

Long reload times caused by asynchronous loading and cache timing mismatches.

High memory and CPU consumption due to growing non‑time‑sensitive data and repeated deserialization.

To mitigate these, business data is separated by type, allowing fine‑grained reloads that only affect the relevant pool, reducing memory spikes and CPU load. SQL statements are split per business segment, and Redis pub/sub synchronizes changes to local memory.

Further optimization moves the entire recommendation dataset to Redis, using ordered ID sets and score‑based indexing updated by the big‑data platform. Incremental updates reduce full reloads, and caching of sufficient recommendation items in a user‑reading cache minimizes Redis round‑trips.

Data fetching was refactored into a dedicated project (ulike) with a management UI, MySQL‑stored configurations, a scheduler, Redis‑based command distribution, and processing/engine components for parsing source data.

Recommendation query performance was improved by:

Using Redis pipeline for batch commands.

Caching multiple pages of recommendation results.

Iterating tag index data with cursor control and resetting after prolonged access.

Employing multithreaded asynchronous computation.

In conclusion, the recommendation algorithm is continuously refined based on big‑data analysis to maximize user experience.

Source: http://weibo.com/ttarticle/p/show?id=2309404141400319987014&retcode=6102

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Big Data recommendation system Redis MySQL pipeline Data Architecture

Written by

Architecture Digest

Focusing on Java backend development, covering application architecture from top-tier internet companies (high availability, high performance, high stability), big data, machine learning, Java architecture, and other popular fields.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.