Tagged articles
318 articles
Page 3 of 4
vivo Internet Technology
vivo Internet Technology
Apr 21, 2021 · Backend Development

Analysis of Apache Commons-Pool2 Object Pooling Implementation

The article examines Apache Commons‑Pool2’s object‑pool architecture, detailing its core interfaces (ObjectPool, PooledObjectFactory, PooledObject), the GenericObjectPool construction, FIFO/LIFO idle‑object deque management with lock‑based concurrency, borrowing and returning workflows, self‑protection features like abandonment detection, and the performance tuning needed for high‑concurrency environments.

JavaObject PoolingResource Management
0 likes · 17 min read
Analysis of Apache Commons-Pool2 Object Pooling Implementation
High Availability Architecture
High Availability Architecture
Apr 15, 2021 · Cloud Native

Meituan Elastic Scaling System: Architecture, Challenges, and Business Enablement

This article presents Meituan's elastic scaling platform, detailing its evolution from Hulk 1.0 to Hulk 2.0, the technical and operational challenges faced, the solutions implemented for resource management and multi‑tenant scaling, and real‑world business scenarios such as holiday, peak‑hour, and emergency capacity provisioning.

MeituanOperationsResource Management
0 likes · 22 min read
Meituan Elastic Scaling System: Architecture, Challenges, and Business Enablement
Meituan Technology Team
Meituan Technology Team
Apr 1, 2021 · Cloud Native

Meituan Elastic Scaling System: Architecture, Challenges, and Business Enablement

Meituan's elastic scaling system evolved from Hulk 1.0 on OpenStack to Hulk 2.0 on Kubernetes, adding micro‑services, quota management, hybrid‑cloud pools, and automated scheduling, thereby delivering cost savings, high‑availability handling of holiday peaks, delivery spikes, anti‑scraping needs, and SaaS releases, while future plans target stability, usability, and emerging technologies.

Cloud NativeKubernetesMeituan
0 likes · 21 min read
Meituan Elastic Scaling System: Architecture, Challenges, and Business Enablement
Python Programming Learning Circle
Python Programming Learning Circle
Mar 29, 2021 · Fundamentals

Understanding Python Context Managers: Basics, Custom Implementations, and Advanced Applications

This article explains Python's context manager mechanism, covering the basic with statement, custom __enter__/__exit__ classes, the contextlib.contextmanager decorator, nesting, combining multiple managers with ExitStack, and practical applications such as SQLAlchemy session handling, exception management, and persistent HTTP requests.

DecoratorPythonResource Management
0 likes · 6 min read
Understanding Python Context Managers: Basics, Custom Implementations, and Advanced Applications
Big Data Technology & Architecture
Big Data Technology & Architecture
Mar 18, 2021 · Big Data

Flink Job Troubleshooting and Performance Optimization: Data Skew, Kafka Configuration, Resource Management, and Checkpoint Issues

This article details common Flink streaming problems such as data skew causing task back‑pressure, oversized Kafka messages, high‑throughput ack settings, slot removal errors, checkpoint timeouts, and resource constraints, and provides concrete configuration changes and architectural adjustments to resolve them.

CheckpointData SkewFlink
0 likes · 18 min read
Flink Job Troubleshooting and Performance Optimization: Data Skew, Kafka Configuration, Resource Management, and Checkpoint Issues
dbaplus Community
dbaplus Community
Mar 16, 2021 · Big Data

How Kuaishou Scales YARN to Tens of Thousands of Nodes with the Kwai Scheduler

This article explains how Kuaishou’s massive offline compute clusters—tens of thousands of machines processing hundreds of petabytes daily—are managed by a heavily customized YARN stack and the home‑grown Kwai Scheduler, detailing architecture, scheduler evolution, multi‑scenario optimizations, and future scaling plans.

Big DataCluster OptimizationKwai Scheduler
0 likes · 14 min read
How Kuaishou Scales YARN to Tens of Thousands of Nodes with the Kwai Scheduler
DataFunTalk
DataFunTalk
Feb 28, 2021 · Big Data

Migrating Youzan Offline Spark Platform to Kubernetes: Architecture, Optimizations, and Lessons Learned

This article details how Youzan's offline Spark computing platform was transformed for the cloud‑native era by migrating from YARN to Kubernetes, introducing containerization, storage‑compute separation, dynamic allocation, deployment optimizations, and a collection of practical lessons to reduce cost and improve resource utilization.

Big DataKubernetesPerformance Optimization
0 likes · 27 min read
Migrating Youzan Offline Spark Platform to Kubernetes: Architecture, Optimizations, and Lessons Learned
Selected Java Interview Questions
Selected Java Interview Questions
Feb 8, 2021 · Fundamentals

Understanding Memory Leaks and Memory Overflow: Causes, Types, and Solutions

Memory leaks, caused by unreleased dynamic allocations, can accumulate and lead to memory overflow, severely degrading performance or crashing applications; this article explains leak definitions, causes, classifications (persistent, intermittent, one‑time, implicit), overflow reasons, and practical mitigation steps such as proper allocation, deallocation, and JVM tuning.

JVMResource Managementdynamic allocation
0 likes · 9 min read
Understanding Memory Leaks and Memory Overflow: Causes, Types, and Solutions
360 Smart Cloud
360 Smart Cloud
Jan 28, 2021 · Big Data

Overview of the Qirin Big Data Platform: Architecture, Modules, and Capabilities

The article provides a comprehensive overview of the Qirin big‑data platform, detailing its architecture, core modules such as resource management, metadata, data ingestion, task development, interactive query, and self‑service analysis, and outlines future development plans for the system.

Data PlatformResource Managementdata ingestion
0 likes · 12 min read
Overview of the Qirin Big Data Platform: Architecture, Modules, and Capabilities
Top Architect
Top Architect
Jan 24, 2021 · Cloud Native

Common Docker Compose Mistakes and How to Fix Them

This article examines common pitfalls when using Docker Compose for containerized development, such as frequent rebuilds, slow host volumes, fragile configurations, and resource mismanagement, and provides practical solutions including caching strategies, host volume tuning, env files, overrides, and resource allocation tips.

Docker ComposeDockerfile OptimizationHost Volumes
0 likes · 18 min read
Common Docker Compose Mistakes and How to Fix Them
360 Tech Engineering
360 Tech Engineering
Jan 7, 2021 · Big Data

Overview of the Qirin Big Data Platform Architecture and Core Modules

The article introduces the Qirin big data platform—a one‑stop solution covering resource management, metadata, data ingestion, task development, interactive querying, and self‑service analysis—detailing its modular architecture, typical processing workflow, and future development plans for enterprise‑wide data services.

Big DataData PlatformResource Management
0 likes · 11 min read
Overview of the Qirin Big Data Platform Architecture and Core Modules
DataFunTalk
DataFunTalk
Jan 6, 2021 · Big Data

Didi's Presto Engine: Architecture, Optimizations, and Operational Practices

This article presents Didi's three‑year experience with Presto, detailing its architecture, low‑latency design, large‑scale deployment, extensive Hive compatibility work, resource isolation, Druid connector integration, usability enhancements, stability engineering, performance tuning, and future directions for the ad‑hoc query engine.

Big DataDistributed SystemsDruid Connector
0 likes · 17 min read
Didi's Presto Engine: Architecture, Optimizations, and Operational Practices
Suning Technology
Suning Technology
Dec 25, 2020 · Blockchain

Understanding EOS: Architecture, Consensus, and Enterprise Applications

This article explores EOS as a leading Blockchain 3.0 platform, detailing its layered architecture, consensus mechanisms, resource management model, account system, smart contract framework, and how Suning leverages EOS for a scalable distributed data storage solution.

BlockchainConsensusDistributed Systems
0 likes · 28 min read
Understanding EOS: Architecture, Consensus, and Enterprise Applications
Top Architect
Top Architect
Dec 14, 2020 · Cloud Native

Lessons Learned from Two Years of Production Kubernetes at Grofers

This article recounts Grofers' two‑year journey migrating from Ansible‑managed EC2 instances to Kubernetes, detailing the motivations, migration strategy, operational challenges, observability choices, CI/CD tooling, resource management, security practices, cost considerations, and the overall impact on development velocity and platform stability.

Cloud NativeDevOpsKubernetes
0 likes · 20 min read
Lessons Learned from Two Years of Production Kubernetes at Grofers
Didi Tech
Didi Tech
Nov 24, 2020 · Industry Insights

Standardizing Customer Service: Inside DiDi’s Solution Platform Architecture

This article explains how DiDi built a unified solution platform that standardizes customer‑service responses across multiple channels by integrating business information, service capabilities, dynamic workflows, static knowledge bases, and a matching layer, while detailing the underlying workflow and resource engines and their performance impact.

Resource ManagementWorkflow Enginearchitecture
0 likes · 15 min read
Standardizing Customer Service: Inside DiDi’s Solution Platform Architecture
Liangxu Linux
Liangxu Linux
Oct 13, 2020 · Fundamentals

Why Understanding OS Fundamentals Boosts Your Programming Efficiency

This article explains the basic concepts of operating systems, covering their role as a hardware abstraction layer, the distinction between kernel and user modes, and how OS resource management (time‑sharing and space‑sharing) enables multiple programs to run safely and efficiently.

Resource Managementabstractionfundamentals
0 likes · 10 min read
Why Understanding OS Fundamentals Boosts Your Programming Efficiency
Architecture Digest
Architecture Digest
Oct 9, 2020 · Operations

Troubleshooting Alibaba Cloud Video On Demand Service: Billing Issues, Resource Deletion, and Recovery Steps

This article recounts a real‑world incident where Alibaba Cloud Video On Demand videos became inaccessible due to a tiny overseas charge causing a negative balance, explains how to diagnose the problem, outlines the steps taken to restore service, and shares practical lessons for cloud operations.

Alibaba CloudCloud ServicesResource Management
0 likes · 6 min read
Troubleshooting Alibaba Cloud Video On Demand Service: Billing Issues, Resource Deletion, and Recovery Steps
JD.com Experience Design Center
JD.com Experience Design Center
Sep 23, 2020 · Operations

Boost B2B Operations Efficiency with Template‑Based Design

B‑end operational activities often involve frequent, short‑term, high‑pressure tasks that drain design resources, so this article explains how generic design templates and collaborative online tools can streamline these demands, freeing up manpower and improving overall operational efficiency.

B2BOperationsResource Management
0 likes · 2 min read
Boost B2B Operations Efficiency with Template‑Based Design
Efficient Ops
Efficient Ops
Sep 15, 2020 · Cloud Native

Mastering Kubernetes YAML: How to Drive Resource Management Efficiently

This article explores how YAML files serve as the pivotal interface for managing Kubernetes resources, detailing their role in defining storage, compute, network, and application configurations, and demonstrating practical deployment, service, and ingress creation to streamline operations and enhance platform stability.

DevOpsKubernetesResource Management
0 likes · 10 min read
Mastering Kubernetes YAML: How to Drive Resource Management Efficiently
Sohu Tech Products
Sohu Tech Products
Sep 2, 2020 · Fundamentals

LegoOS: A Distributed Operating System for Disaggregated Hardware – Architecture and Design Overview

This article reviews the award‑winning 2018 OSDI paper LegoOS, describing its split‑kernel architecture, component‑based resource management across processors, memory and storage, and how it enables hardware disaggregation in data‑center clusters while addressing network latency and failure handling.

Distributed SystemsOS ArchitectureOperating Systems
0 likes · 17 min read
LegoOS: A Distributed Operating System for Disaggregated Hardware – Architecture and Design Overview
TAL Education Technology
TAL Education Technology
Sep 1, 2020 · Cloud Computing

Cost Optimization and Resource Management in an Online Education Platform: From XEN Migration to Container‑Based Scaling

This article describes how an online education platform reduced infrastructure costs and improved service reliability by replacing XEN with KVM, building resource‑tracking platforms, adopting Kubernetes‑based containerization, implementing rapid auto‑scaling, and establishing systematic resource auditing and standardization processes.

Cost OptimizationInfrastructureKubernetes
0 likes · 25 min read
Cost Optimization and Resource Management in an Online Education Platform: From XEN Migration to Container‑Based Scaling
iQIYI Technical Product Team
iQIYI Technical Product Team
Aug 28, 2020 · Operations

iQIYI Test Environment Management Platform: Design, Challenges, and Solutions

iQIYI’s Test Environment Management Platform centralizes topology and deployment scripts, automates on‑demand environment provisioning, and isolates QA resources, cutting deployment time from up to 60 minutes to an average of four minutes, boosting success rates above 95 % while supporting thousands of daily deployments across hundreds of applications.

DevOpsResource ManagementiQIYI
0 likes · 8 min read
iQIYI Test Environment Management Platform: Design, Challenges, and Solutions
NetEase Media Technology Team
NetEase Media Technology Team
Aug 13, 2020 · Cloud Native

How NetEase Media Scaled Its Infrastructure with Containerization and Service Mesh

NetEase Media transformed its infrastructure by containerizing services, establishing multiple resource pools, implementing a ServiceMesh with NSF, and isolating beta and production environments, resulting in higher CPU utilization, automated scaling, and improved stability, while sharing lessons learned and future plans.

Cloud NativeInfrastructureKubernetes
0 likes · 22 min read
How NetEase Media Scaled Its Infrastructure with Containerization and Service Mesh
Java Architect Essentials
Java Architect Essentials
Aug 12, 2020 · Operations

Common Kubernetes Pitfalls and How to Fix Them

This article outlines frequent Kubernetes operational mistakes—such as misconfigured resource requests, missing probes, improper load‑balancer exposure, naïve autoscaling, IAM/RBAC misuse, lack of anti‑affinity, absent PodDisruptionBudgets, multi‑tenant pitfalls, and suboptimal externalTrafficPolicy—providing concrete remediation steps and best‑practice code examples.

KubernetesProbesResource Management
0 likes · 15 min read
Common Kubernetes Pitfalls and How to Fix Them
Architect
Architect
Jul 15, 2020 · Big Data

Understanding Flink Task Slots, Resource Allocation, and Slot Sharing Mechanisms

This article explains how Flink uses task slots to partition TaskManager resources, the benefits of slot sharing, the interaction between Scheduler, SlotPool, and ResourceManager, and the internal classes such as LogicalSlot, PhysicalSlot, and SlotSharingManager that enable resource isolation and sharing in stream processing jobs.

Big DataFlinkResource Management
0 likes · 6 min read
Understanding Flink Task Slots, Resource Allocation, and Slot Sharing Mechanisms
Programmer DD
Programmer DD
Jul 11, 2020 · Backend Development

What Makes ElasticJob the Next‑Generation Distributed Scheduler?

ElasticJob is a Java‑based distributed scheduling framework that blends Quartz‑style job timing with ZooKeeper coordination, offering lightweight Lite and cloud‑native Cloud editions, elastic scaling, resource governance, and a roadmap toward Kubernetes support and plug‑in extensibility.

Distributed SchedulingKubernetesMesos
0 likes · 15 min read
What Makes ElasticJob the Next‑Generation Distributed Scheduler?
DataFunTalk
DataFunTalk
Jul 5, 2020 · Big Data

ByteDance’s Optimizations to Hadoop YARN: Enhancing Utilization, Multi‑Load Scenarios, Stability, and Multi‑Region Active‑Active

This article describes ByteDance’s four‑year series of customizations to Hadoop YARN—covering utilization improvements, multi‑load scenario optimizations, stability enhancements, and multi‑region active‑active deployment—along with practical production experiences, architectural details, and future work directions.

ByteDanceCluster OptimizationHadoop
0 likes · 12 min read
ByteDance’s Optimizations to Hadoop YARN: Enhancing Utilization, Multi‑Load Scenarios, Stability, and Multi‑Region Active‑Active
Architect
Architect
Jul 4, 2020 · Big Data

Kuaishou Flink Real‑Time Architecture and Spring Festival Gala Assurance Practices

This article details Kuaishou's Flink‑based real‑time computing architecture, its massive cluster scale, and the comprehensive strategies—including overload protection, system stability, pressure testing, and resource guarantees—implemented to ensure reliable streaming for the 2020 Spring Festival Gala and its real‑time dashboard.

Big DataFlinkKuaishou
0 likes · 12 min read
Kuaishou Flink Real‑Time Architecture and Spring Festival Gala Assurance Practices
Fulu Network R&D Team
Fulu Network R&D Team
Jun 8, 2020 · Frontend Development

Design and Implementation of a Micro‑Frontend Architecture for Internal Systems

This article presents a comprehensive technical study on adopting micro‑frontend architecture to solve version‑dependency, resource‑size, and navigation‑performance issues in a large internal system suite, detailing background analysis, solution research, layered architecture design, key technical challenges, custom webpack plugins, build workflow, and remaining open problems.

Frontend ArchitectureResource Managementmicro-frontend
0 likes · 14 min read
Design and Implementation of a Micro‑Frontend Architecture for Internal Systems
Big Data Technology & Architecture
Big Data Technology & Architecture
Apr 8, 2020 · Big Data

Common Apache Flink Exceptions and How to Resolve Them

This article enumerates typical Apache Flink deployment, job, and checkpoint errors—such as JDK version issues, resource shortages, task manager timeouts, and state migration problems—and provides practical troubleshooting steps and configuration tips to help engineers quickly diagnose and fix these failures.

Big DataCheckpointException
0 likes · 8 min read
Common Apache Flink Exceptions and How to Resolve Them
Big Data Technology & Architecture
Big Data Technology & Architecture
Feb 13, 2020 · Big Data

Optimizing Hadoop MapReduce Jobs for eBay CAL System to Reduce Execution Time and Resource Usage

This article describes how eBay's Central Application Logging (CAL) system generates massive daily logs, the challenges of Hadoop MapReduce job performance and resource consumption, and the step‑by‑step optimizations—reducing GC time, mitigating data skew, and improving algorithms—that cut execution time by over 60%, lowered cluster resource usage, and raised job success rates to nearly 100%.

Big DataData SkewHadoop
0 likes · 11 min read
Optimizing Hadoop MapReduce Jobs for eBay CAL System to Reduce Execution Time and Resource Usage
58UXD
58UXD
Jan 15, 2020 · Product Management

How We Built the "Crystal Ball" Design Resource Platform from Scratch

This case study details how the design team created the Crystal Ball platform—a shared, visualized design resource hub—to solve low search efficiency, duplicate purchases, and budget waste, outlining research, planning, naming, branding, development phases, iterations, and measurable impact on productivity.

Design PlatformResource ManagementUX
0 likes · 8 min read
How We Built the "Crystal Ball" Design Resource Platform from Scratch
58 Tech
58 Tech
Jan 6, 2020 · Big Data

Design and Architecture of the 58DP Big Data Platform Task Scheduling System

The article presents a comprehensive overview of the 58DP big data platform's task scheduling system, detailing its background, architecture, high‑availability design, slot‑based resource management, scheduling models, task lifecycle, priority rules, dependency handling, failure recovery, and future enhancements.

Big DataResource Managementdistributed system
0 likes · 14 min read
Design and Architecture of the 58DP Big Data Platform Task Scheduling System
JD Retail Technology
JD Retail Technology
Dec 17, 2019 · Mobile Development

Comprehensive Strategies for Reducing iOS App Package Size in a Large‑Scale E‑commerce Application

This article details a two‑phase, data‑driven approach to shrinking a rapidly growing iOS shopping app from over 300 MB to around 214 MB by focusing on install‑size metrics, resource‑file analysis, unused‑image removal, icon‑font adoption, dynamic‑library stripping, LTO, PNG handling, and automated monitoring.

Asset CatalogMobile DevelopmentResource Management
0 likes · 22 min read
Comprehensive Strategies for Reducing iOS App Package Size in a Large‑Scale E‑commerce Application
Youku Technology
Youku Technology
Nov 26, 2019 · Operations

Resource Assurance Strategies and Practices for Alibaba Youku Double‑11 Promotion

The article outlines Alibaba Youku’s end‑to‑end resource‑assurance platform for Double‑11 promotions, detailing automated demand collection, business‑to‑technical metric conversion, single‑machine capacity testing, rapid scaling and emergency borrowing, which together cut manual reviews by 80 % and boosted delivery efficiency tenfold.

AutomationOperationsResource Management
0 likes · 13 min read
Resource Assurance Strategies and Practices for Alibaba Youku Double‑11 Promotion
Tencent Cloud Developer
Tencent Cloud Developer
Oct 11, 2019 · Cloud Computing

Large-Scale Distributed Reinforcement Learning Solution Based on TKE

The project replaces cumbersome manual management of thousands of heterogeneous CPU and GPU nodes for large‑scale reinforcement learning with a TKE‑based, containerized actor‑learner architecture that automates batch start/stop, provides elastic autoscaling, fault‑tolerant processes, shared model storage, and CI‑driven image deployment, cutting costs by up to two‑thirds while dramatically speeding experiment cycles.

CI/CDCloud NativeDistributed Training
0 likes · 14 min read
Large-Scale Distributed Reinforcement Learning Solution Based on TKE
Efficient Ops
Efficient Ops
Sep 23, 2019 · Operations

How to Build an Effective CMDB for Scalable Operations Management

This article explains the step‑by‑step process of constructing a configuration management database (CMDB) for operations, covering resource modeling, data integration, organizational structures, maintenance methods, and how a well‑designed CMDB supports higher‑level business operations such as automation, visualization, and capacity planning.

AutomationCMDBITIL
0 likes · 14 min read
How to Build an Effective CMDB for Scalable Operations Management
Big Data Technology & Architecture
Big Data Technology & Architecture
Sep 19, 2019 · Big Data

Performance Optimization Practices for Meituan's Hadoop YARN Fair Scheduler

This article presents a comprehensive analysis of Meituan's Hadoop YARN fair scheduler, detailing its architecture, resource abstractions, scheduling workflow, performance bottlenecks, fine‑grained metrics, and a series of optimization techniques—including sorting improvements, job‑skip reduction, parallel queue sorting, and robust rollout strategies—to achieve high‑throughput, low‑latency scheduling for large‑scale offline, streaming, and machine‑learning workloads.

Big DataFair SchedulerPerformance Optimization
0 likes · 24 min read
Performance Optimization Practices for Meituan's Hadoop YARN Fair Scheduler
FunTester
FunTester
Aug 29, 2019 · Backend Development

Properly Releasing Resources with Apache HttpClient 4.5: Best Practices

This article explains why releaseConnection() is deprecated in HttpClient 4.5, introduces the PoolingHttpClientConnectionManager, and provides detailed code examples and step‑by‑step guidance for safely extracting response content, closing responses, consuming entities, releasing connections, and shutting down the HttpClient.

ApacheBackendHttpClient
0 likes · 4 min read
Properly Releasing Resources with Apache HttpClient 4.5: Best Practices
Meituan Technology Team
Meituan Technology Team
Aug 1, 2019 · Big Data

Performance Optimization Practices for Meituan's Hadoop YARN Fair Scheduler

Meituan improved its custom Hadoop YARN Fair Scheduler by pre‑computing resource usage, filtering zero‑demand jobs, and parallelizing queue sorting, which reduced sorting time from 30 s to 5 s per minute, boosted container‑per‑second throughput to 50 k, enabled live roll‑backs, and prepared the system for clusters up to 10 k nodes and future scaling to hundreds of thousands.

Big DataFair SchedulerHadoop
0 likes · 24 min read
Performance Optimization Practices for Meituan's Hadoop YARN Fair Scheduler
Alibaba Cloud Native
Alibaba Cloud Native
Jul 17, 2019 · Cloud Native

Why Traditional Autoscaling Fails in Kubernetes and How Cloud‑Native Solutions Evolve

The article examines the limitations of traditional threshold‑based autoscaling in Kubernetes, explains three core challenges—percentage fragmentation, capacity‑planning pitfalls, and resource‑utilization dilemmas—then expands the autoscaling concept across four workload types and outlines the cloud‑native components that address them.

Cloud NativeHPAKubernetes
0 likes · 10 min read
Why Traditional Autoscaling Fails in Kubernetes and How Cloud‑Native Solutions Evolve
NetEase Media Technology Team
NetEase Media Technology Team
Jun 20, 2019 · Cloud Native

Deep Dive into Docker and Kubernetes Resource Management Using Linux Namespaces and Cgroups

The article explains how Docker and Kubernetes use Linux namespaces to isolate processes, network, IPC, mounts, UTS and users, and employ cgroups to enforce CPU, memory and I/O limits, detailing Docker’s architecture, Kubernetes’s millicore‑based resource model, QoS classes, and the hierarchical pod‑level cgroup structure.

Cloud NativeDockerKubernetes
0 likes · 41 min read
Deep Dive into Docker and Kubernetes Resource Management Using Linux Namespaces and Cgroups
Qunar Tech Salon
Qunar Tech Salon
Jun 11, 2019 · Fundamentals

Understanding Python __init__ and __del__ Methods: Constructors, Destructors, and Resource Management

This article explains how Python uses the special __init__ and __del__ methods to emulate constructors and destructors, discusses the language's reference‑counting garbage collector, demonstrates common pitfalls with __del__, and presents safer alternatives such as context managers and weak references for resource cleanup.

PythonResource Managementcontext manager
0 likes · 7 min read
Understanding Python __init__ and __del__ Methods: Constructors, Destructors, and Resource Management
21CTO
21CTO
May 24, 2019 · Operations

How Meituan’s R&D Team Cut Tens of Millions in Resource Costs: A Practical Guide

This article details Meituan's R&D team's systematic PDCA‑based approach to resource cost optimization, covering methodology definition, planning, execution, checking, and iterative improvement across infrastructure, big‑data, and shared services, ultimately saving tens of millions of yuan.

Big DataCost OptimizationOperations
0 likes · 22 min read
How Meituan’s R&D Team Cut Tens of Millions in Resource Costs: A Practical Guide
Big Data Technology & Architecture
Big Data Technology & Architecture
May 7, 2019 · Databases

Design and Multi‑Tenant Management of HBase at Didi

This article details Didi's use of HBase for various online and offline workloads, covering multi‑language support, data types, rowkey designs for order, trajectory and ETA scenarios, multi‑tenant resource management with DHS and RS Group, and operational best practices.

GeoHashHBaseResource Management
0 likes · 12 min read
Design and Multi‑Tenant Management of HBase at Didi
Didi Tech
Didi Tech
Apr 4, 2019 · Artificial Intelligence

DiDi Machine Learning Platform: From Workshop‑Style Production to Cloud‑Native Architecture

Since 2016 DiDi has evolved its machine‑learning platform from isolated, workshop‑style GPU servers to a cloud‑native, Kubernetes‑driven architecture that unifies resource management, introduces custom parameter‑server and serving frameworks, provides autotuning, external SaaS offerings such as Elastic Inference and JianShu, and aims for a 3.0 unified internal‑external AI marketplace.

AI InfrastructureGPU computingKubernetes
0 likes · 19 min read
DiDi Machine Learning Platform: From Workshop‑Style Production to Cloud‑Native Architecture
dbaplus Community
dbaplus Community
Mar 27, 2019 · Big Data

How eBay Cut Hadoop Job Runtime by 60%: Real‑World CAL Log Optimization

This article explains how eBay's CAL team reduced Hadoop MapReduce job execution time and resource consumption by over 60% through targeted GC tuning, data‑skew mitigation, and algorithmic improvements, boosting job success rates to nearly 100% while handling petabyte‑scale log data.

Big DataData SkewGC tuning
0 likes · 12 min read
How eBay Cut Hadoop Job Runtime by 60%: Real‑World CAL Log Optimization
Cloud Native Technology Community
Cloud Native Technology Community
Mar 20, 2019 · Cloud Native

Kubernetes Architecture Analysis and Comparison of Scheduling Models with Mesos

This article explains the Kubernetes architecture, details each core component, demonstrates how a Deployment is created, and critically compares Kubernetes' two‑layer scheduling model with Mesos, evaluating resource utilization, scalability, flexibility, performance, and scheduling latency while discussing why cluster schedulers struggle with horizontal scaling.

Cloud NativeKubernetesMesos
0 likes · 15 min read
Kubernetes Architecture Analysis and Comparison of Scheduling Models with Mesos
MaGe Linux Operations
MaGe Linux Operations
Mar 8, 2019 · Operations

Mastering High‑Availability Clusters: Resources, Constraints, and Failure Handling

This article explains the principles and components of high‑availability (HA) clusters, covering active/standby nodes, resource stickiness and constraints, heartbeat and quorum mechanisms, split‑brain avoidance, failure detection methods, and the minimal setup required for a reliable web‑service HA deployment.

HeartbeatOperationsResource Management
0 likes · 14 min read
Mastering High‑Availability Clusters: Resources, Constraints, and Failure Handling
Ctrip Technology
Ctrip Technology
Dec 26, 2018 · Databases

CTrip’s Large‑Scale Redis Containerization: Architecture, Practices, and Lessons Learned

This article details CTrip’s experience of containerizing a 200 TB+ Redis deployment with millions of queries per second, covering the motivations, architecture, Kubernetes strategies, performance testing, operational challenges, and the practical solutions they devised to achieve high scalability and resource efficiency.

KubernetesResource Managementcontainerization
0 likes · 15 min read
CTrip’s Large‑Scale Redis Containerization: Architecture, Practices, and Lessons Learned
Efficient Ops
Efficient Ops
Oct 21, 2018 · Operations

How Alibaba Scales Resource Operations for Massive Events like Double 11

In this talk, Alibaba engineer Yang Yi explains the evolution of resource operation and DevOps at Alibaba, covering the shift from manual tasks to containerized, automated platforms, the challenges of large‑scale scheduling, cost reduction strategies for events such as Double 11, and the move toward intelligent, ops‑less infrastructure.

AlibabaResource ManagementScalable Operations
0 likes · 18 min read
How Alibaba Scales Resource Operations for Massive Events like Double 11
Alibaba Cloud Developer
Alibaba Cloud Developer
Oct 16, 2018 · Operations

Alibaba’s OS, Storage, and Resource Management Highlights from OSDI'18

The 13th OSDI conference in Carlsbad attracted over 650 attendees, featuring 47 accepted papers and three Best Papers—two led by Chinese students—while Alibaba showcased its latest OS kernel (AliKernel), next‑generation distributed storage system Pangu 2.0, and the large‑scale resource manager Sigma, sparking lively discussions among global experts.

AlibabaOSDIOperating Systems
0 likes · 8 min read
Alibaba’s OS, Storage, and Resource Management Highlights from OSDI'18
iQIYI Technical Product Team
iQIYI Technical Product Team
Sep 28, 2018 · Mobile Development

How iQIYI’s Neptune Enables Seamless Android Plugin Architecture

This article analyzes iQIYI’s Neptune plugin framework, explaining why pluginization is needed, the core technical principles of class and resource loading, lifecycle management, and how Neptune implements multi‑ClassLoader isolation, resource handling, context wrapping, and incremental updates for large‑scale Android apps.

AndroidDynamic LoadingMobile Development
0 likes · 18 min read
How iQIYI’s Neptune Enables Seamless Android Plugin Architecture
360 Zhihui Cloud Developer
360 Zhihui Cloud Developer
Aug 21, 2018 · Cloud Native

Kubernetes Namespace Resource Quotas: Set Defaults, Limits, and Enforce Policies

This guide explains how Kubernetes namespace-level resource management lets administrators set default CPU/memory requests, define minimum and maximum constraints, and enforce resource quotas, with step‑by‑step commands and YAML examples to create namespaces, ResourceQuota objects, and pods while handling quota limits.

Cloud NativeKubernetesNamespace
0 likes · 9 min read
Kubernetes Namespace Resource Quotas: Set Defaults, Limits, and Enforce Policies
Architecture Digest
Architecture Digest
Jul 19, 2018 · Operations

How to Prevent System Failures: Suspect Third‑Party Services, Guard Consumers, and Strengthen Your Own Service

The article presents practical strategies for avoiding service failures by treating third‑party dependencies as unreliable, designing robust APIs for consumers, and applying solid engineering principles such as degradation plans, timeout settings, traffic control, and resource‑limiting techniques.

ReliabilityResource Managementapi-design
0 likes · 16 min read
How to Prevent System Failures: Suspect Third‑Party Services, Guard Consumers, and Strengthen Your Own Service
Efficient Ops
Efficient Ops
Jul 9, 2018 · Databases

How YY Scaled Its Database Platform: From Manual Ops to Intelligent Automation

This article details YY's journey in transforming its database operations—from early quality and efficiency challenges to a multi‑stage platform that automates resource pooling, high‑availability proxy, cost control, quality monitoring, and security, outlining future intelligent extensions.

Cost OptimizationDatabase operationsResource Management
0 likes · 16 min read
How YY Scaled Its Database Platform: From Manual Ops to Intelligent Automation
JD Tech
JD Tech
Jul 9, 2018 · Big Data

JD's Large‑Scale Hadoop Cluster Resource Management and Scheduling Architecture

This article describes how JD built a multi‑regional, ten‑thousand‑node Hadoop ecosystem, unified resource management with YARN, introduced a three‑level Router scheduling layer, optimized performance, and integrated deep‑learning frameworks to achieve high availability, cost efficiency, and scalable big‑data processing.

Distributed SchedulingHadoopJD.com
0 likes · 12 min read
JD's Large‑Scale Hadoop Cluster Resource Management and Scheduling Architecture
Ctrip Technology
Ctrip Technology
Jul 3, 2018 · Big Data

Ctrip's Presto Engine: Challenges, Improvements, and Upgrade Roadmap

This article details Ctrip's experience with the Presto distributed SQL engine, outlining the initial performance and stability issues, the comprehensive enhancements made in security, resource control, compatibility, and monitoring, and the multi‑stage upgrade plan that guides its future evolution.

Big DataKerberosPerformance Optimization
0 likes · 11 min read
Ctrip's Presto Engine: Challenges, Improvements, and Upgrade Roadmap
DataFunTalk
DataFunTalk
Jun 24, 2018 · Big Data

OPPO Big Data Platform Operations and R&D Practices: Architecture, Scaling, and Monitoring

This article summarizes OPPO's rapid growth of its big‑data platform, detailing the three‑layer architecture, the evolution from Flume‑Kafka to NiFi for data ingestion, the upgrade of the OFlow task scheduler, comprehensive monitoring of data, resources and task SLA, and the development of a self‑service analytics tool called InnerEye to ensure stability, efficiency, and security.

AirflowBig DataNiFi
0 likes · 10 min read
OPPO Big Data Platform Operations and R&D Practices: Architecture, Scaling, and Monitoring
Alibaba Cloud Native
Alibaba Cloud Native
Jun 21, 2018 · Operations

How Scheduling Algorithms Power Efficient Data Center Resource Management

This article explains how modern data centers rely on cluster resource management systems and sophisticated scheduling algorithms to allocate containers across machines, improve application availability, reduce costs, and meet diverse constraints, while also introducing Alibaba’s global scheduling algorithm competition and its challenge details.

Data centerResource ManagementScheduling
0 likes · 11 min read
How Scheduling Algorithms Power Efficient Data Center Resource Management
Architects' Tech Alliance
Architects' Tech Alliance
May 14, 2018 · Big Data

Understanding Hadoop MapReduce Architecture and YARN: Components, Workflow, and Optimization

This article explains Hadoop's distributed storage and processing framework, details the MapReduce programming model, describes the classic JobTracker/TaskTracker architecture, outlines the shuffle and combine phases, and introduces YARN as a scalable replacement with its ResourceManager, ApplicationMaster, and NodeManager components.

Big DataHadoopMapReduce
0 likes · 13 min read
Understanding Hadoop MapReduce Architecture and YARN: Components, Workflow, and Optimization
Tencent Cloud Developer
Tencent Cloud Developer
May 3, 2018 · Operations

Tencent Cloud Kafka Automated Operations Practices

Tencent Cloud’s senior engineer Yang Yuan explains how their managed Kafka service tackles version diversity, resource allocation, dynamic scaling, broker addition/removal, and partition migration using versioned clusters, bin‑packing algorithms, penalty weighting, and predictive scheduling to sustain trillions of messages and billions of messages per minute.

KafkaOperations AutomationResource Management
0 likes · 14 min read
Tencent Cloud Kafka Automated Operations Practices
Senior Brother's Insights
Senior Brother's Insights
Apr 8, 2018 · Blockchain

How EOS.IO Redefines Scalable Blockchain Architecture with DPOS and Parallel Execution

EOS.IO introduces a novel blockchain architecture that combines delegated proof‑of‑stake consensus, OS‑like account and permission models, deterministic parallel transaction execution, and flexible resource and governance mechanisms, enabling million‑user scale, low‑latency, fee‑free decentralized applications while addressing security, upgradeability, and cross‑chain communication.

DPOSEOSIOParallel Execution
0 likes · 36 min read
How EOS.IO Redefines Scalable Blockchain Architecture with DPOS and Parallel Execution
Tencent Cloud Developer
Tencent Cloud Developer
Mar 20, 2018 · Cloud Native

A Comprehensive Guide to Terraform Provider Development

This guide walks readers through building a Terraform provider in Go, explaining its architecture, key files such as main.go and provider.go, implementing CRUD functions that call the Tencent Cloud SDK, handling Terraform’s lifecycle, and writing unit tests with the testAccProviders framework.

Go ProgrammingProvider DevelopmentResource Management
0 likes · 15 min read
A Comprehensive Guide to Terraform Provider Development
Alibaba Cloud Developer
Alibaba Cloud Developer
Mar 8, 2018 · Operations

How Cainiao Ark’s Elastic Scheduling Boosts Resource Efficiency and Cuts Costs

This article explains why Cainiao needed an elastic scheduling system, how its unique business and technical characteristics make it ideal for such a solution, and details the architecture, decision‑making layers, strategies, and real‑world results that together improve resource utilization, stability, and cost efficiency.

Auto ScalingCainiao ArkResource Management
0 likes · 27 min read
How Cainiao Ark’s Elastic Scheduling Boosts Resource Efficiency and Cuts Costs
AntTech
AntTech
Mar 1, 2018 · Operations

Intelligent Scheduling in Customer Service: Architecture, Challenges, and Future Directions

The article examines how intelligent scheduling combines AI-driven bots and human agents to dynamically allocate customer service resources, addressing market slowdown, complex business structures, and operational pain points through perception, decision‑making, and execution capabilities, while outlining current implementations and future plans at Ant Financial.

AIIntelligent SchedulingOperations
0 likes · 14 min read
Intelligent Scheduling in Customer Service: Architecture, Challenges, and Future Directions
Architecture Digest
Architecture Digest
Jan 2, 2018 · Information Security

Understanding the Essence of Permissions: Resources, Access, and Authorization Models

This article explains the fundamental nature of permissions as limited licensed access to protected resources, defines what constitutes a resource in software, outlines resource identification and limitation, and describes permission classifications, control models, and authorization mechanisms such as role‑based access.

AuthorizationPermissionsResource Management
0 likes · 7 min read
Understanding the Essence of Permissions: Resources, Access, and Authorization Models
MaGe Linux Operations
MaGe Linux Operations
Dec 21, 2017 · Operations

Mastering High Availability Clusters: Key Concepts, Resource Management, and Failure Handling

This article explains how high‑availability (HA) clusters provide redundancy for directors, RS‑servers, databases and storage, covering active‑passive node roles, resource stickiness, constraints, quorum voting, split‑brain avoidance, failure detection methods, and essential configuration tips.

ClusterOperationsResource Management
0 likes · 12 min read
Mastering High Availability Clusters: Key Concepts, Resource Management, and Failure Handling
Qunar Tech Salon
Qunar Tech Salon
Nov 8, 2017 · Operations

Evolution of Ele.me's Operations Infrastructure: From 1.0 to 2.0 – Standardization, Automation, and Data‑Driven Management

The article recounts Ele.me's rapid growth and the resulting operational challenges, describing how the company progressed from ad‑hoc 1.0 practices to a standardized, automated 2.0 infrastructure built on ZStack private cloud, fine‑grained operations, and data‑driven management to improve quality, efficiency, and cost.

Resource Managementmonitoringstandardization
0 likes · 21 min read
Evolution of Ele.me's Operations Infrastructure: From 1.0 to 2.0 – Standardization, Automation, and Data‑Driven Management
UCloud Tech
UCloud Tech
Oct 20, 2017 · Cloud Computing

Inside UCloud’s Compute Factory: Scaling VMs and Containers with Mesos

UCloud’s Compute Factory enables rapid provisioning of massive VM resources for compute‑intensive services by leveraging a Mesos‑based resource management platform that unifies multi‑region data centers, supports both VMs and containers, and addresses challenges in scheduling, networking, storage, and operational reliability.

MesosResource ManagementUCloud
0 likes · 14 min read
Inside UCloud’s Compute Factory: Scaling VMs and Containers with Mesos
Meituan Technology Team
Meituan Technology Team
Oct 12, 2017 · Mobile Development

Meituan Android Plugin Framework: Design, Compatibility, and Build System

Meituan’s Android plugin framework combines MultiDex‑style dex loading, global AssetManager replacement, component proxies, and a four‑phase Gradle integration to enable stable AAR‑based development while dynamically loading plugins across Android versions and OEM ROMs, handling resources, bytecode, and native libraries with minimal compatibility issues.

AndroidBuild SystemCompatibility
0 likes · 15 min read
Meituan Android Plugin Framework: Design, Compatibility, and Build System
JD Retail Technology
JD Retail Technology
Sep 15, 2017 · Mobile Development

Key Issues and Solutions in Implementing an Android Plugin Framework with Aura

This article explains the fundamental problems an Android plugin framework must solve—component representation, class loading, resource handling, inter‑module calls, resource sharing, and packaging—and describes Aura's design choices and implementation strategies for each of these challenges.

AndroidMobile DevelopmentPlugin Framework
0 likes · 10 min read
Key Issues and Solutions in Implementing an Android Plugin Framework with Aura
JD Retail Technology
JD Retail Technology
Sep 15, 2017 · Mobile Development

Plugin Compilation and Packaging in the Aura Framework: Resource Sharing, AAPT and Gradle Aura-Plugin Solutions

The article explains how the Aura framework handles plugin compilation and packaging by managing shared resources through public.xml, assigning unique package IDs, and offering two solutions—modifying AAPT and using the Gradle aura‑plugin—to achieve small, independent Android plugin bundles.

AAPTAndroidGradle
0 likes · 8 min read
Plugin Compilation and Packaging in the Aura Framework: Resource Sharing, AAPT and Gradle Aura-Plugin Solutions
StarRing Big Data Open Lab
StarRing Big Data Open Lab
Sep 8, 2017 · Information Security

How Guardian 5.0 Revolutionizes Big Data Security with Multi‑Granular Permissions

Guardian 5.0 transforms big‑data security by introducing a standalone service, an enhanced ARBAC model, fine‑grained permission and quota management, visual configuration, unified LDAP/Kerberos authentication, and simplified operations, delivering higher flexibility, availability, and efficiency for enterprise data protection.

Guardian 5.0RBACResource Management
0 likes · 7 min read
How Guardian 5.0 Revolutionizes Big Data Security with Multi‑Granular Permissions
Tencent IMWeb Frontend Team
Tencent IMWeb Frontend Team
Aug 31, 2017 · Frontend Development

From Frameworks to Smart Resource Management: Evolving Frontend Engineering

This article examines the evolution of frontend development from basic library selection through simple build optimization, modularization, and finally to component‑based development and intelligent static resource management, highlighting engineering practices that improve efficiency, scalability, and performance for complex web applications.

ComponentizationEngineeringResource Management
0 likes · 21 min read
From Frameworks to Smart Resource Management: Evolving Frontend Engineering
JD Retail Technology
JD Retail Technology
Aug 19, 2017 · Mobile Development

Key Issues and Solutions in Implementing the Aura Android Plugin Framework

This article explains the major challenges tackled by the Aura Android plugin framework—including component representation, class loading, resource handling, host‑plugin interaction, and packaging—and outlines how solving these enables building a generic, scalable plugin architecture for mobile apps.

AURAAndroidMobile Development
0 likes · 9 min read
Key Issues and Solutions in Implementing the Aura Android Plugin Framework
360 Quality & Efficiency
360 Quality & Efficiency
May 8, 2017 · Backend Development

Why and How to Manually Close Java Resource Objects

This article explains why Java developers must manually close resource objects such as streams and database connections, discusses the limitations of garbage collection, and presents best‑practice techniques like finally blocks, try‑with‑resources, and utility libraries for proper resource management.

Backend DevelopmentGarbage CollectionJava
0 likes · 5 min read
Why and How to Manually Close Java Resource Objects
Tencent TDS Service
Tencent TDS Service
Apr 27, 2017 · Mobile Development

How to Shrink Your Android APK Size: Practical Tips and Tools

This guide explains how Android developers can understand APK structure and apply a range of techniques—such as removing unused resources, compressing images, using vector drawables, and optimizing native code—to significantly reduce the size of their application packages and improve download and install performance.

APKAndroidMobile Development
0 likes · 15 min read
How to Shrink Your Android APK Size: Practical Tips and Tools
Qunar Tech Salon
Qunar Tech Salon
Apr 11, 2017 · Big Data

Implementing Dynamic Scaling for Spark on Mesos Using Marathon and Docker

This article describes how a team migrated Spark 1.6.x running on Mesos to a Marathon‑Docker based architecture that provides dynamic executor scaling, resolves configuration and resource‑allocation issues, and improves monitoring, fault‑tolerance, and upgrade processes for large‑scale streaming workloads.

DockerDynamic ScalingMarathon
0 likes · 17 min read
Implementing Dynamic Scaling for Spark on Mesos Using Marathon and Docker
Baidu Waimai Technology Team
Baidu Waimai Technology Team
Mar 23, 2017 · Databases

Design and Implementation of the "Little Boy" Greenplum Optimization and Operations Platform

This article introduces the architecture, key modules, and implementation details of the Little Boy platform, a Greenplum optimization and operations system that parses SQL, applies index and distribution‑key tuning, manages resources, and outlines future enhancements for large‑scale data warehouses.

Big DataDatabase OptimizationGreenplum
0 likes · 15 min read
Design and Implementation of the "Little Boy" Greenplum Optimization and Operations Platform
ITPUB
ITPUB
Sep 5, 2016 · Backend Development

Why a Barrier‑Based Config Swap Still Crashed: Uncovering a Shallow Copy Bug

A seemingly safe configuration swap using barriers caused a crash because a shallow copy duplicated resource pointers, leading to premature freeing of the active configuration; the article explains the bug, shows the faulty code, and presents a corrected approach.

BarrierC programmingResource Management
0 likes · 5 min read
Why a Barrier‑Based Config Swap Still Crashed: Uncovering a Shallow Copy Bug
Tencent Music Tech Team
Tencent Music Tech Team
Aug 20, 2016 · Mobile Development

Android App Internationalization: Problems and Solutions

Android app internationalization involves extracting hard‑coded strings into language‑specific strings.xml files, using Lint and regex to locate text, annotating resources, formatting placeholders, exporting for translation, updating the Resources configuration at runtime, handling UI refresh via onConfigurationChanged, recreate or activity restart, and adjusting layouts for length, dimensions, and caps to ensure a seamless multilingual experience.

AndroidResource ManagementStrings.xml
0 likes · 13 min read
Android App Internationalization: Problems and Solutions
ITPUB
ITPUB
Jul 7, 2016 · Databases

How Software Performance Engineering Boosts Database Optimization

The talk explains how systematic software performance engineering, through six optimization patterns such as Fast Path, Batching, Flex Path, First Things First, Coupling, and Alternate Routes, can identify and resolve database performance bottlenecks without merely adding more hardware resources.

Resource Managementoptimizationperformance engineering
0 likes · 14 min read
How Software Performance Engineering Boosts Database Optimization
dbaplus Community
dbaplus Community
Jun 5, 2016 · Operations

Mastering Mesos, Zookeeper, and Marathon: A Step‑by‑Step Guide to Building a Docker Cluster

This tutorial introduces Apache Mesos, Zookeeper, and Marathon, explains their core components and coordination mechanisms, and provides detailed, image‑rich step‑by‑step instructions for setting up a pseudo‑cluster, deploying Docker containers, and managing tasks through the Mesos and Marathon web interfaces.

Cluster DeploymentMarathonMesos
0 likes · 13 min read
Mastering Mesos, Zookeeper, and Marathon: A Step‑by‑Step Guide to Building a Docker Cluster