Tag

capacity-management

0 views collected around this technical thread.

Efficient Ops
Efficient Ops
Oct 18, 2024 · Operations

Guotai Junan’s Level‑3 FinOps Success: Inside Their Capacity Management Journey

This article explores how Guotai Junan Securities leveraged FinOps and a new IT resource maturity model to achieve Level‑3 capacity management, detailing their cultural shift, automation tools, transparency gains, challenges overcome, and future plans for finer‑grained cost control in a rapidly digitizing industry.

Digital TransformationFinOpsIT Operations
0 likes · 12 min read
Guotai Junan’s Level‑3 FinOps Success: Inside Their Capacity Management Journey
Architect
Architect
Aug 10, 2023 · Operations

Capacity Management: Goals, Stages, Optimization Techniques, and Scaling Practices

The article explains how capacity management balances cost control and service quality through defined goals, three development stages, detailed resource optimization methods, stress‑testing metrics and standards, and automated scaling to achieve significant cost reductions while maintaining system stability.

Operationscapacity-managementperformance testing
0 likes · 10 min read
Capacity Management: Goals, Stages, Optimization Techniques, and Scaling Practices
AntTech
AntTech
Jul 14, 2023 · Cloud Native

KapacityStack: Open‑Source Cloud‑Native Intelligent Capacity Management and IHPA

KapacityStack is an open‑source, cloud‑native capacity platform from Ant Group that introduces the Intelligent Horizontal Pod Autoscaler (IHPA) to provide predictive, multi‑level, and stable autoscaling, reducing resource waste, carbon emissions, and operational costs while supporting extensible, modular integration with Kubernetes workloads.

Kubernetesautoscalingcapacity-management
0 likes · 11 min read
KapacityStack: Open‑Source Cloud‑Native Intelligent Capacity Management and IHPA
Efficient Ops
Efficient Ops
Apr 2, 2023 · Operations

Turning CMDB Data into Actionable Capacity Management for IT Operations

This article explores how CMDB data can be leveraged for proactive capacity assessment, outlining mechanisms, goals, metrics, evaluation types, baselines, and a tool design that integrates metric, policy, evaluation, and reporting functions to enhance IT asset efficiency and risk mitigation.

CMDBIT assetOperations
0 likes · 11 min read
Turning CMDB Data into Actionable Capacity Management for IT Operations
Bilibili Tech
Bilibili Tech
Mar 28, 2023 · Operations

Bilibili's Capacity Management Platform: Design, Implementation, and S12 Event Support

Bilibili's capacity management platform integrates foundational data, VPA/HPA scaling, quota control, and visual dashboards to streamline resource usage, cut costs, and boost stability, delivering event‑specific support such as for S12 that slashes release issues by 80% and online failures by 90%, while planning predictive scaling and risk control.

BilibiliSREcapacity visualization
0 likes · 13 min read
Bilibili's Capacity Management Platform: Design, Implementation, and S12 Event Support
Zhuanzhuan Tech
Zhuanzhuan Tech
Feb 8, 2023 · Operations

Capacity Management: Goals, Practices, and Optimization at ZuanZuan

This article outlines ZuanZuan’s capacity management approach, covering its objectives, development stages, water‑level metrics, resource optimization techniques, cluster capacity assessment, stress‑test indicators and standards, as well as scaling strategies, demonstrating how systematic capacity management reduces costs while ensuring service stability.

Operationscapacity-managementcost-optimization
0 likes · 12 min read
Capacity Management: Goals, Practices, and Optimization at ZuanZuan
Efficient Ops
Efficient Ops
Dec 12, 2022 · Operations

How Bilibili Built a 5‑Year SRE Journey: High‑Availability, Multi‑Active, and Capacity Management

This article chronicles Bilibili's five‑year evolution of Site Reliability Engineering, detailing the introduction of SRE culture, the construction of high‑availability and multi‑active architectures, capacity management with Kubernetes, VPA/HPA, incident case studies, and the ongoing transformation of SRE practices across the organization.

High AvailabilityKubernetesOperations
0 likes · 24 min read
How Bilibili Built a 5‑Year SRE Journey: High‑Availability, Multi‑Active, and Capacity Management
Efficient Ops
Efficient Ops
Nov 16, 2022 · Operations

Building a 99.95% Uptime Cloud‑Native Platform: Guoxin Securities’ Ops Journey

Guoxin Securities’ QianKun centralized operation platform showcases a cloud‑native, micro‑service architecture that achieved 99.95% availability through containerization, multi‑region deployment, AI‑driven capacity forecasting, and comprehensive DevOps practices, offering a 24/7 seamless account‑opening experience and setting industry benchmarks.

AIOpsDevOpsMicroservices
0 likes · 14 min read
Building a 99.95% Uptime Cloud‑Native Platform: Guoxin Securities’ Ops Journey
Bilibili Tech
Bilibili Tech
Oct 29, 2022 · Operations

Capacity Management Practice at Bilibili

The article details Bilibili’s capacity‑management approach, showcasing system architecture, monitoring metrics, scaling tactics, and performance results through diagrams and screenshots, and explains how its operational processes and tools maintain reliable service delivery during high‑traffic periods, offering practical capacity‑planning insights.

BilibiliCase StudyOperations
0 likes · 1 min read
Capacity Management Practice at Bilibili
Bilibili Tech
Bilibili Tech
Sep 9, 2022 · Operations

B站SRE's Stability Practices and Reflections

At the 2022 GOPS Global Operations Conference in Shenzhen, Bilibili’s infrastructure SRE lead Wu Anchuang unveiled the company’s comprehensive stability framework—detailing its SRE transformation, high‑availability architecture, active‑active disaster‑recovery, capacity planning, and event‑support strategies—marking the first public disclosure of these practices.

B站High AvailabilitySRE
0 likes · 1 min read
B站SRE's Stability Practices and Reflections
AntTech
AntTech
Jun 22, 2022 · Cloud Computing

Meta Reinforcement Learning Framework for Predictive Autoscaling in Cloud Environments

This article presents a cloud-native, end‑to‑end autoscaling solution that integrates traffic forecasting, CPU utilization meta‑prediction, and a reinforcement‑learning‑based scaling decision module into a fully differentiable system, achieving higher resource utilization and cost efficiency as demonstrated by ACM SIGKDD 2022 research.

Cloud Computingautoscalingcapacity-management
0 likes · 10 min read
Meta Reinforcement Learning Framework for Predictive Autoscaling in Cloud Environments
High Availability Architecture
High Availability Architecture
Dec 28, 2021 · Backend Development

Design and Practice of the Nimbus Low‑Code Platform for Search Middleware

This article examines the challenges faced by Baidu's search middleware in high‑frequency iteration and complex backend development, and presents the design, implementation, and benefits of the Nimbus low‑code platform—including a graph engine, unified development environment, visual operator composition, automated testing, and intelligent capacity management—to accelerate product innovation while reducing development effort.

Backend DevelopmentDevOpscapacity-management
0 likes · 16 min read
Design and Practice of the Nimbus Low‑Code Platform for Search Middleware
Efficient Ops
Efficient Ops
Apr 20, 2021 · Operations

How Dada’s Intelligent Elastic Scaling Cuts Costs and Boosts Delivery Performance

This article details Dada Group’s implementation of an intelligent elastic scaling architecture that automatically adjusts capacity during peak promotions and low‑traffic periods, improving delivery reliability, reducing costs, and supporting multi‑cloud and multi‑runtime environments through sophisticated monitoring and auto‑scaler mechanisms.

Operationsauto scalingcapacity-management
0 likes · 17 min read
How Dada’s Intelligent Elastic Scaling Cuts Costs and Boosts Delivery Performance
Dada Group Technology
Dada Group Technology
Apr 19, 2021 · Operations

Exploring Elastic Capacity and Automated Scaling Architecture at Dada Group

This article presents Dada Group's comprehensive approach to elastic capacity management and automated scaling, detailing the challenges faced during traffic spikes, the design of a cloud‑native auto‑scaler, multi‑metric observability, decision‑making logic, execution mechanisms, extreme scaling practices, and future optimization directions.

Multi-CloudSREauto scaling
0 likes · 15 min read
Exploring Elastic Capacity and Automated Scaling Architecture at Dada Group
AntTech
AntTech
Jul 8, 2020 · Cloud Native

From Double Eleven to Cloud‑Native Capacity: Zheng Yangfei’s Journey and Ant Group’s Autoscaling Innovation

The article chronicles Zheng Yangfei’s rise from a double‑eleven intern to leader of Ant Group’s cloud‑native capacity team, detailing the evolution of large‑scale load‑testing, the challenges of autoscaling in financial‑grade systems, and the team’s shift toward platform‑driven, risk‑aware engineering.

Big DataSREautoscaling
0 likes · 11 min read
From Double Eleven to Cloud‑Native Capacity: Zheng Yangfei’s Journey and Ant Group’s Autoscaling Innovation
Didi Tech
Didi Tech
Feb 18, 2020 · Operations

Didi's National Carpool Day: Technical Insights into Stability Assurance

Didi's National Carpool Day on Dec 3 2019 attracted 3.1M passengers; stability ensured via six pillars: organized task force, capacity forecasting and rapid container scaling, comprehensive monitoring with fire‑fighting map, robust contingency platform, strict process standards, and coordinated third‑party preparation.

Carpool DayDidiOperations
0 likes · 13 min read
Didi's National Carpool Day: Technical Insights into Stability Assurance
Architects' Tech Alliance
Architects' Tech Alliance
Jan 12, 2020 · Cloud Computing

Mitigating Hash Polarization and Elephant Flow in UCloud Physical Cloud Gateway Clusters: Multi‑Tunnel and Capacity Management Solutions

This article presents a detailed case study of how UCloud resolved hash polarization and elephant‑flow overload in physical cloud gateway clusters by deploying a multi‑tunnel traffic‑splitting strategy, expanding gateway capacity, implementing lossless isolation‑zone migration, and enhancing automation and high‑availability mechanisms, enabling the clusters to handle hundreds of gigabits of traffic during peak events.

Cloud ComputingHigh Availabilitycapacity-management
0 likes · 10 min read
Mitigating Hash Polarization and Elephant Flow in UCloud Physical Cloud Gateway Clusters: Multi‑Tunnel and Capacity Management Solutions
Efficient Ops
Efficient Ops
Jun 20, 2019 · Operations

How Baidu’s Noah TSDB Handles Capacity Management at Scale

This article explains how Baidu’s Noah time‑series database measures, plans, and protects capacity, detailing throughput metrics, estimation and load‑testing methods, and a water‑level model that drives reliable scaling and overload mitigation for massive monitoring workloads.

OperationsTSDBcapacity-management
0 likes · 11 min read
How Baidu’s Noah TSDB Handles Capacity Management at Scale
Ctrip Technology
Ctrip Technology
Mar 7, 2019 · Operations

Ctrip Container Cloud Operations: Practices, Challenges, and Future Outlook

This article presents Ctrip's experience in building and operating a private container cloud platform, detailing its architectural evolution, operational challenges, tooling, monitoring, capacity management, and future directions toward hybrid and cloud‑native environments.

ChatOpsKubernetescapacity-management
0 likes · 12 min read
Ctrip Container Cloud Operations: Practices, Challenges, and Future Outlook
Efficient Ops
Efficient Ops
Feb 14, 2019 · Operations

Scaling a 10,000‑Node Container Cloud: Ctrip’s Ops Practices and Lessons

This article details Ctrip's journey of building and operating a massive container cloud platform, covering its architectural evolution, operational challenges, tooling, capacity management, and future directions, offering practical insights for large‑scale cloud‑native environments.

DevOpsKubernetesOperations
0 likes · 17 min read
Scaling a 10,000‑Node Container Cloud: Ctrip’s Ops Practices and Lessons