Tag

large-scale systems

0 views collected around this technical thread.

Alibaba Cloud Infrastructure
Alibaba Cloud Infrastructure
Jun 11, 2025 · Cloud Computing

How Alibaba’s Qi Tian Platform Secures Large-Scale Cloud Networks

This article examines Alibaba Cloud’s Qi Tian integrated operation‑management platform, detailing the challenges of massive cloud network management and the innovative data‑fusion, automated change, intent‑aware monitoring, and multi‑plane self‑healing technologies that enable secure, high‑performance operation at million‑device scale.

AICloud Computingdata management
0 likes · 11 min read
How Alibaba’s Qi Tian Platform Secures Large-Scale Cloud Networks
macrozheng
macrozheng
Dec 28, 2024 · Operations

What Makes China’s 12306 Railway Ticketing System So Resilient?

The article examines China’s 12306 railway ticketing platform, tracing its evolution from early Unix‑based reservation software to a massive, real‑time, three‑tier distributed system that handles billions of requests during peak travel periods, highlighting its architectural challenges, high‑concurrency solutions, and unique national centralization.

ChinaDistributed Systemshigh concurrency
0 likes · 9 min read
What Makes China’s 12306 Railway Ticketing System So Resilient?
Xiaohongshu Tech REDtech
Xiaohongshu Tech REDtech
Sep 23, 2024 · Artificial Intelligence

AlignRec: A Joint Training Framework for Aligning Multimodal Representations with Personalized Recommendation

AlignRec is a joint‑training framework that synchronizes multimodal encoders with personalized recommendation models through a staged alignment strategy and three specialized loss functions, preserving both content and ID signals, and achieving state‑of‑the‑art performance on multiple datasets while releasing superior Amazon multimodal features.

AIevaluation metricsjoint training
0 likes · 11 min read
AlignRec: A Joint Training Framework for Aligning Multimodal Representations with Personalized Recommendation
DataFunSummit
DataFunSummit
Sep 18, 2024 · Artificial Intelligence

Multi‑Scenario Modeling for NetEase Cloud Music Recommendation: Architecture, Challenges, and Results

This article presents NetEase Cloud Music's multi‑scenario recommendation modeling work, covering background, overall system architecture, key modules such as unified and private domain networks, modeling objectives and difficulties, experimental results, future outlook, and a detailed Q&A session.

AINetEase Cloud MusicRecommendation systems
0 likes · 13 min read
Multi‑Scenario Modeling for NetEase Cloud Music Recommendation: Architecture, Challenges, and Results
DaTaobao Tech
DaTaobao Tech
Jun 12, 2024 · Backend Development

Refactoring Large-Scale Video Streaming Engineering: Theory and Practice

The article presents a comprehensive guide to large‑scale video‑streaming system refactoring, combining theory on continuous improvement, architectural evolution, code‑quality criteria, and challenges with a practical roadmap that leverages automation, systematic analysis, engineering safeguards, static‑analysis tools, and design patterns to safely transform legacy monoliths into modular, containerized platforms.

Engineering Practicescode qualitycomponent architecture
0 likes · 16 min read
Refactoring Large-Scale Video Streaming Engineering: Theory and Practice
DataFunSummit
DataFunSummit
Feb 11, 2024 · Artificial Intelligence

GPU-Accelerated Model Service and Optimization Practices at Xiaohongshu

This article details Xiaohongshu's end‑to‑end GPU‑based transformation of its recommendation and search models, covering background, model characteristics, training and inference frameworks, system‑level and GPU‑level optimizations, compilation tricks, hardware upgrades, and future directions for large‑scale machine‑learning infrastructure.

GPUInferencelarge-scale systems
0 likes · 18 min read
GPU-Accelerated Model Service and Optimization Practices at Xiaohongshu
Continuous Delivery 2.0
Continuous Delivery 2.0
Sep 13, 2023 · Fundamentals

Overview of Google’s Software Engineering Practices

Google’s software engineering practices—including a unified source repository, Blaze build system, rigorous code review, automated testing, continuous integration, and structured project and personnel management—are detailed, offering insights and comparisons for other organizations seeking to adopt similar high‑scale development methodologies.

Code ReviewContinuous IntegrationGoogle
0 likes · 46 min read
Overview of Google’s Software Engineering Practices
JD Retail Technology
JD Retail Technology
Aug 5, 2023 · Operations

JDV Visual Big‑Screen Platform: Architecture, Challenges, and Technical Innovations for JD.com’s 618 Promotion

The article details JDV, JD.com’s internal visual‑big‑screen data platform, describing its architecture, the demanding real‑time, cross‑midnight, and high‑stability requirements during the 618 promotion, the technical challenges faced, and the innovative solutions—including request state control, heartbeat monitoring, video recording, orchestration tools, precise stop handling, and proxy data sources—that ensured reliable large‑scale screen deployment.

Operationsbackend architecturedata visualization
0 likes · 17 min read
JDV Visual Big‑Screen Platform: Architecture, Challenges, and Technical Innovations for JD.com’s 618 Promotion
Xiaohongshu Tech REDtech
Xiaohongshu Tech REDtech
Mar 21, 2023 · Artificial Intelligence

From Daily to Minute-Level Updates: Real-Time Recommendation System Enhancements at Xiaohongshu

Xiaohongshu transformed its recommendation pipeline from daily to minute‑level updates by redesigning recall, ranking and feature‑joining components, deploying a base‑plus‑incremental training scheme, migrating Spark to Flink, rewriting services in C++, and optimizing RocksDB, which yielded over 10% longer dwell time, 15% more interactions and roughly 50% higher new‑note efficiency.

Vector Searchlarge-scale systemsmachine learning
0 likes · 20 min read
From Daily to Minute-Level Updates: Real-Time Recommendation System Enhancements at Xiaohongshu
Alimama Tech
Alimama Tech
Aug 10, 2022 · Artificial Intelligence

Overview of Alibaba Mama’s Recent Papers on Online Advertising and Recommendation Systems

Alibaba Mama’s technical team presented ten CIKM‑2022 papers that introduce novel advertising and recommendation methods—including adaptive domain networks, neural‑metric ANN search, control‑based livestream bidding, graph‑based relevance learning, hierarchical ad exposure, knowledge‑extraction pretraining, traffic forecasting, overfitting analysis, adaptive sparsity, and visual debiasing—each deployed to boost revenue and performance on Alibaba’s platforms.

AICTR predictionadvertising
0 likes · 15 min read
Overview of Alibaba Mama’s Recent Papers on Online Advertising and Recommendation Systems
DaTaobao Tech
DaTaobao Tech
Jul 18, 2022 · Artificial Intelligence

Walle: An End-to-End, General-Purpose, Large-Scale Device-Cloud Collaborative Machine Learning System

Walle is Alibaba’s first end‑to‑end, general‑purpose, large‑scale device‑cloud collaborative machine‑learning platform that manages billions of mobile devices, provides a full‑stack data and compute pipeline, cuts cloud load by 87 %, reduces latency to ~100 ms, and already powers over a trillion daily ML invocations across dozens of Alibaba apps.

MNNOSDIbenchmark
0 likes · 11 min read
Walle: An End-to-End, General-Purpose, Large-Scale Device-Cloud Collaborative Machine Learning System
Alimama Tech
Alimama Tech
Jun 1, 2022 · Artificial Intelligence

Advances in Alibaba's Advertising Engine: Serverless Architecture, Recall, Strategy, and Creative Technologies

Alibaba Mama’s advertising engine has been transformed into a serverless, cloud‑native platform that unifies runtime, data, and business abstractions, adopts vector‑ and model‑based recall with offline pre‑computed pipelines, implements multi‑stage AI‑driven bidding and auction mechanisms, and leverages large‑scale generative AI for creative assets, thereby accelerating feature rollout, cutting latency, and boosting merchant value.

AIStrategyadvertising
0 likes · 18 min read
Advances in Alibaba's Advertising Engine: Serverless Architecture, Recall, Strategy, and Creative Technologies
JD Retail Technology
JD Retail Technology
Dec 20, 2021 · Artificial Intelligence

Large-Scale Graph Technology in JD.com E‑commerce: Practice and AI Computing Directions

The article summarizes JD.com Vice President Bao Yongjun's presentation on applying ultra‑large‑scale graph technology to e‑commerce, covering data foundations, recommendation and fraud detection use cases, technical challenges, the Galileo graph engine, and future AI computing development directions such as chips, auto‑learning, application layers, and privacy protection.

Artificial Intelligencee‑commercefraud detection
0 likes · 7 min read
Large-Scale Graph Technology in JD.com E‑commerce: Practice and AI Computing Directions
Top Architect
Top Architect
Dec 11, 2021 · Databases

Scaling Zhihu’s Moneta Service with TiDB: Architecture, Performance, and Lessons Learned

Zhihu’s Moneta service, handling over a trillion rows and billions of daily writes, migrated from MySQL to TiDB, achieving millisecond query latency, high availability, and horizontal scalability, and the article details the architecture, performance metrics, migration challenges, and lessons learned from this large‑scale deployment.

Database ScalabilityTiDBdata migration
0 likes · 13 min read
Scaling Zhihu’s Moneta Service with TiDB: Architecture, Performance, and Lessons Learned
Alimama Tech
Alimama Tech
Aug 18, 2021 · Artificial Intelligence

Overview of Recent Alibaba Mama Research Papers on AI and Large‑Scale Advertising Systems

The article surveys six Alibaba Mama papers accepted at CIKM 2021, presenting novel AI methods—including a heterogeneous graph neural network for keyword matching, a star‑topology multi‑domain CTR model, a compact hash embedding technique, adaptive masked twins layers, automated hierarchical conversion prediction, and a scalable multi‑view ad retrieval system—each demonstrating substantial online performance improvements and large‑scale deployment.

AICTR predictionGraph Neural Networks
0 likes · 11 min read
Overview of Recent Alibaba Mama Research Papers on AI and Large‑Scale Advertising Systems
Baidu Intelligent Testing
Baidu Intelligent Testing
Aug 3, 2021 · Operations

Stability Governance and Observability in Baidu Search: From Kepler 1.0 to Kepler 2.0

This article examines how Baidu Search achieves five‑nine‑plus availability by analyzing stability challenges, introducing the Kepler 1.0 observability stack, evolving to Kepler 2.0 with full‑trace collection, custom compression, and practical use‑cases that dramatically improve fault diagnosis and capacity management in a massive micro‑service environment.

backendlarge-scale systemsmetrics
0 likes · 18 min read
Stability Governance and Observability in Baidu Search: From Kepler 1.0 to Kepler 2.0
58 Tech
58 Tech
Apr 12, 2021 · Artificial Intelligence

Deep Interest Modeling and Multi‑Channel Recommendation for 58.com Home Page

This article presents the challenges of large‑scale home‑page recommendation at 58.com, describes how behavior‑sequence models such as DIN, DIEN and Transformer are applied and evolved into double‑channel and multi‑channel deep interest architectures, and details offline and online performance optimizations that yielded significant gains in click‑through and conversion rates.

AITransformerdeep learning
0 likes · 19 min read
Deep Interest Modeling and Multi‑Channel Recommendation for 58.com Home Page
Efficient Ops
Efficient Ops
Feb 1, 2021 · Operations

How to Detect Anomalous Nodes in Massive Compute Clusters Using Intelligent Ops

This article explains how internet companies can reduce soaring manual operations costs by applying intelligent monitoring techniques—such as pattern recognition and statistical anomaly detection—to automatically identify abnormal nodes among thousands of servers, streamline fault diagnosis, and improve service quality.

Anomaly DetectionOperationslarge-scale systems
0 likes · 4 min read
How to Detect Anomalous Nodes in Massive Compute Clusters Using Intelligent Ops
Continuous Delivery 2.0
Continuous Delivery 2.0
Apr 13, 2020 · Operations

Facebook Configuration Management: Practices, Statistics, and Cultural Insights

This article summarizes Facebook's holistic configuration management practices, presenting cultural influences, storage growth, size distribution, update frequency, change magnitude, and author collaboration statistics, while linking to a series of translated articles that explore tools such as Configerator, GateKeeper, and MobileConfig.

Configuration ManagementFacebookOperations
0 likes · 10 min read
Facebook Configuration Management: Practices, Statistics, and Cultural Insights
Efficient Ops
Efficient Ops
Mar 25, 2020 · Operations

How JD Logistics Built a 300‑Million‑Metric Real‑Time Monitoring System for 99.999% Uptime

This article details JD Logistics' journey to design and implement a massive, AI‑enhanced monitoring platform that handles over three million metrics across hundreds of warehouses, addressing challenges of scale, network complexity, frequent asset changes, and integrating AIOps for proactive fault detection and resolution.

AIOpsAnomaly DetectionCMDB
0 likes · 23 min read
How JD Logistics Built a 300‑Million‑Metric Real‑Time Monitoring System for 99.999% Uptime