Tag

resource utilization

0 views collected around this technical thread.

JD Tech Talk
JD Tech Talk
Aug 3, 2024 · Operations

Evolution of Load Balancing Strategies in JD Advertising Online Model System

This article examines the progression of load‑balancing techniques used in JD's advertising online model system, analyzing current challenges, outlining requirements, reviewing static and dynamic strategies, and presenting a multi‑objective, hierarchical approach that improves service availability, resource utilization, and overall system stability.

Distributed SystemsDynamic Schedulingload balancing
0 likes · 14 min read
Evolution of Load Balancing Strategies in JD Advertising Online Model System
JD Retail Technology
JD Retail Technology
Jul 24, 2024 · Operations

Load Balancing Strategies for Heterogeneous Hardware Clusters in JD Advertising Online Model System

This article examines the evolution, theory, and practical implementation of load balancing strategies for JD Advertising's online model system, focusing on heterogeneous hardware clusters, dual‑objective optimization of service availability and resource utilization, and the resulting performance improvements in large‑scale production environments.

Distributed Systemsheterogeneous clustersload balancing
0 likes · 15 min read
Load Balancing Strategies for Heterogeneous Hardware Clusters in JD Advertising Online Model System
360 Smart Cloud
360 Smart Cloud
Jan 24, 2024 · Cloud Native

Idle Compute Sharing in Dedicated Kubernetes Clusters Using Karmada

The article describes how a company implements an idle compute sharing feature for dedicated Kubernetes clusters, leveraging Karmada to allocate spare CPU and memory to offline workloads, thereby improving resource utilization, reducing costs, and outlining usage scenarios, configuration steps, technical architecture, and future plans.

Cloud NativeIdle Compute SharingKarmada
0 likes · 9 min read
Idle Compute Sharing in Dedicated Kubernetes Clusters Using Karmada
NetEase Cloud Music Tech Team
NetEase Cloud Music Tech Team
Nov 17, 2023 · Cloud Native

Cloud Music FinOps Practice: Building Enterprise Cloud Cost Management Platform

NetEase Cloud Music’s self‑built FinOps platform tackles rising cloud spend by unifying cost data, visualizing and allocating expenses, rating resource utilization, and empowering platform providers, business units, and developers with data‑driven governance to curb the Andy‑Bill effect and enable scalable, long‑term cost control.

Cloud Cost ManagementCloud NativeContainer Governance
0 likes · 8 min read
Cloud Music FinOps Practice: Building Enterprise Cloud Cost Management Platform
Tencent Architect
Tencent Architect
Sep 27, 2023 · Cloud Native

OpenCloudOS Cloud‑Native Practices, Resource Utilization Enhancements, and Testing Framework Overview

The article introduces OpenCloudOS’s cloud‑native initiatives—including a mixed‑workload CPU QoS scheduler, the RUE resource‑utilization enhancement, the eBPF‑based nettrace network‑diagnosis tool, and the TCase/TSuite testing platform—highlighting how these innovations improve CPU utilization, cut costs, and ensure high‑quality releases.

Cloud NativeLinuxTesting
0 likes · 14 min read
OpenCloudOS Cloud‑Native Practices, Resource Utilization Enhancements, and Testing Framework Overview
Cloud Native Technology Community
Cloud Native Technology Community
Aug 29, 2022 · Cloud Native

Cloud‑Native and Edge Computing: How Containers Empower Edge Applications

The article explains how the deep integration of cloud‑native technologies and edge computing, driven by digital transformation, improves resource utilization, unifies infrastructure management, reduces AI workload costs, simplifies device access, accelerates deployment, and enhances autonomy and ROI for enterprises.

AICloud NativeContainers
0 likes · 10 min read
Cloud‑Native and Edge Computing: How Containers Empower Edge Applications
Shopee Tech Team
Shopee Tech Team
May 26, 2022 · Cloud Computing

Shopee's Green Computing Practices: Optimizing Resource Utilization in Data Centers

Shopee reduces data‑center carbon emissions by over 40,000 tons annually through three 2021 green‑computing technologies—Overcommit resource oversubscription, mixed‑model Colocation of latency‑sensitive and batch workloads, and enhanced Auto Scaling that leverages global metrics to cut machine usage and improve resource efficiency.

Green computingKubernetesauto scaling
0 likes · 15 min read
Shopee's Green Computing Practices: Optimizing Resource Utilization in Data Centers
Tencent Cloud Developer
Tencent Cloud Developer
Dec 8, 2021 · Cloud Native

Using Tencent Cloud EKS Virtual Nodes to Solve CronJob Isolation and Scheduling Challenges

By offloading thousands of short‑lived CronJob pods to Tencent Cloud EKS serverless virtual nodes, Zuoyebang isolated them from online services, eliminated IP waste, achieved millisecond‑level parallel scheduling and sub‑3‑second startup, freed 10 % of cluster resources and cut scheduling costs by roughly 70 % while markedly improving cluster stability.

Cloud NativeCronJobKubernetes
0 likes · 10 min read
Using Tencent Cloud EKS Virtual Nodes to Solve CronJob Isolation and Scheduling Challenges
Tencent Architect
Tencent Architect
Sep 10, 2021 · Cloud Native

BT Scheduler for Absolute Preemption: Boosting CPU Utilization and QoS in Cloud‑Native Environments

This article analyzes the limitations of the Linux Completely Fair Scheduler (CFS) for high‑priority workloads, introduces Tencent's custom offline BT scheduler that provides absolute preemption, and presents experimental results showing significant improvements in latency, CPU utilization, and carbon‑reduction for cloud‑native services.

BT schedulerCFSCPU scheduling
0 likes · 10 min read
BT Scheduler for Absolute Preemption: Boosting CPU Utilization and QoS in Cloud‑Native Environments
Full-Stack Internet Architecture
Full-Stack Internet Architecture
Aug 31, 2021 · Operations

Facebook’s Shard Manager: Strategies for Large‑Scale System Sharding, Fault Tolerance, and Resource Utilization

The article explains how Facebook’s Shard Manager tackles large‑scale system sharding by combining stateful and stateless service deployment, consistent hashing versus sharding, fault‑as‑normal principles, replication, automated failover, load‑balancing, and elastic scaling to achieve high availability and efficient resource use.

Distributed SystemsFacebookSharding
0 likes · 9 min read
Facebook’s Shard Manager: Strategies for Large‑Scale System Sharding, Fault Tolerance, and Resource Utilization
Tencent Architect
Tencent Architect
Aug 31, 2021 · Cloud Native

Boost Server Utilization: TencentOS ‘Ruyi’ Mixed‑Deployment Solution Explained

This article explores how TencentOS Server’s mixed‑deployment product “Ruyi” combines cluster scheduling optimization with per‑node QoS to dramatically increase CPU utilization, cut energy costs, and improve resource isolation in large‑scale data‑center environments.

Cloud NativeQoSTencentOS
0 likes · 10 min read
Boost Server Utilization: TencentOS ‘Ruyi’ Mixed‑Deployment Solution Explained
Aikesheng Open Source Community
Aikesheng Open Source Community
Aug 17, 2021 · Databases

Design and Implementation of a Cloud‑Native MySQL Container Platform for High Availability and Resource Efficiency

The article describes how a bank built a Kubernetes‑based, containerized MySQL service platform (CDD) to improve database high availability, resource utilization, automated operations, and agile delivery by addressing network, storage, scheduling, and management challenges through custom networking, hybrid storage, scheduler extensions, and multi‑AZ deployment.

Cloud NativeContainerizationHigh Availability
0 likes · 16 min read
Design and Implementation of a Cloud‑Native MySQL Container Platform for High Availability and Resource Efficiency
Alibaba Cloud Infrastructure
Alibaba Cloud Infrastructure
Mar 24, 2021 · Cloud Computing

LIBRA and CARE: Memory Bandwidth Management and Fault‑Tolerance Innovations Presented at HPCA 2021

The article reviews two HPCA 2021 papers from Alibaba Cloud—LIBRA, a dynamic memory‑bandwidth management framework that boosts data‑center utilization, and CARE, a cache‑based fault‑tolerance architecture that delivers near‑Chipkill reliability with minimal overhead—while also highlighting future research directions in ML systems, quantum computing, and cache computing.

HPCA2021cloud computingdata center reliability
0 likes · 4 min read
LIBRA and CARE: Memory Bandwidth Management and Fault‑Tolerance Innovations Presented at HPCA 2021
Liulishuo Tech Team
Liulishuo Tech Team
Feb 4, 2021 · Cloud Computing

Improving Cloud Cost Allocation and Resource Utilization through Catalog, Tags, and Automated Monitoring

This article describes how a tech team built a catalog‑based cost‑allocation system, leveraged cloud tags and Kubernetes labels, used Prometheus data for scaling decisions, and combined reserved, spot, and on‑demand instances to boost cloud resource utilization while keeping services stable.

Cost OptimizationKubernetesPrometheus
0 likes · 8 min read
Improving Cloud Cost Allocation and Resource Utilization through Catalog, Tags, and Automated Monitoring
DataFunTalk
DataFunTalk
Jun 20, 2020 · Cloud Native

Automated Elastic Scaling for Million‑Scale Core Services and Mixed Workloads on ByteDance's Private Cloud Platform

This article presents ByteDance's private cloud platform TCE architecture and explains how automated elastic scaling, dynamic over‑commit, and mixed‑workload deployment are used to improve resource utilization for millions of services, balancing online peak demand with offline batch tasks.

Cloud NativeKuberneteselastic scaling
0 likes · 25 min read
Automated Elastic Scaling for Million‑Scale Core Services and Mixed Workloads on ByteDance's Private Cloud Platform
Didi Tech
Didi Tech
Dec 2, 2019 · Operations

Capacity Estimation Methodology for Growing Services

The article presents a systematic capacity‑estimation methodology that links service traffic to order volume, uses CPU‑Idle as a primary metric, predicts traffic growth and upper‑bound limits, validates predictions with load‑testing, and provides scaling recommendations while noting limitations of the CPU‑Idle baseline.

Capacity PlanningScalingperformance monitoring
0 likes · 9 min read
Capacity Estimation Methodology for Growing Services
Hujiang Technology
Hujiang Technology
Oct 18, 2017 · Operations

The USE Method: A Systematic Approach to Performance Analysis and Bottleneck Identification

The USE Method provides a concise, structured framework for quickly locating resource bottlenecks and errors in complex systems by examining utilization, saturation, and error metrics across hardware, software, and cloud environments, enabling practitioners to prioritize and resolve performance issues efficiently.

PerformanceUSE methodbottleneck detection
0 likes · 18 min read
The USE Method: A Systematic Approach to Performance Analysis and Bottleneck Identification
Ctrip Technology
Ctrip Technology
Feb 16, 2017 · Operations

Application‑Based Automated Capacity Management and Utilization Evaluation

The article presents a comprehensive, application‑centric approach to automated capacity management that analyzes why server utilization is low, defines safe usage thresholds, describes a load‑balancer‑driven stress‑testing workflow with regression modeling, and explains how this practice improves resource efficiency, cost savings, and developer‑ops collaboration.

AutomationDevOpscapacity-management
0 likes · 14 min read
Application‑Based Automated Capacity Management and Utilization Evaluation
Qunar Tech Salon
Qunar Tech Salon
Feb 14, 2017 · Operations

Application‑Based Automated Capacity Management and Utilization Evaluation

This article explains how to automate application‑centric capacity assessment, identify the safe utilization thresholds, use load‑balancer‑driven stress testing and regression modeling to pinpoint resource bottlenecks, and improve server usage while maintaining service reliability through close DevOps collaboration.

AutomationDevOpscapacity-management
0 likes · 15 min read
Application‑Based Automated Capacity Management and Utilization Evaluation
Efficient Ops
Efficient Ops
Feb 9, 2017 · Operations

Automating Application‑Based Capacity Management to Boost Resource Utilization

This article explains how to automate capacity management focused on application performance, identifies common causes of low resource utilization, proposes safe utilization thresholds, describes a testing framework that uses load‑balancer weighting and real‑time monitoring to pinpoint bottlenecks, and outlines how ops and developers can collaborate to improve efficiency.

Automationcapacity-managementoperations
0 likes · 18 min read
Automating Application‑Based Capacity Management to Boost Resource Utilization