Tagged articles
133 articles
Page 2 of 2
iQIYI Technical Product Team
iQIYI Technical Product Team
May 7, 2021 · Mobile Development

Robustness Testing of iQIYI Mobile App Using Dirty Data Injection

iQIYI’s technology team built a non‑intrusive robustness‑testing platform that injects engineered “dirty data” into intercepted HTTP responses via an ASM‑hooked SDK, letting users configure mutation rules through a web console and run UI, monkey, or manual tests that have already uncovered numerous hidden crashes, achieving over 50 % defect‑closure and markedly improving app stability.

AutomationRobustnessSDK
0 likes · 9 min read
Robustness Testing of iQIYI Mobile App Using Dirty Data Injection
DevOps
DevOps
Mar 18, 2021 · Operations

Understanding Site Reliability Engineering (SRE) and Its Role in Software Stability

Site Reliability Engineering (SRE) combines software engineering with operations to ensure scalable, highly reliable systems, outlining the collaboration between product development and SRE roles, the software lifecycle, stability value, and practical frameworks for observability, controllability, and best‑practice implementation.

SRESite Reliability Engineeringsoftware lifecycle
0 likes · 12 min read
Understanding Site Reliability Engineering (SRE) and Its Role in Software Stability
Xueersi Online School Tech Team
Xueersi Online School Tech Team
Mar 12, 2021 · Operations

Evolution of Live Streaming Load Testing and Stability Assurance for an Online Education Platform

The article details how an online education provider progressively enhanced its live‑streaming performance testing framework—from rudimentary "stone age" checks to automated, data‑driven "information age" practices—by restructuring services, refining test scenarios, introducing traffic replay, and automating script generation to achieve more reliable and efficient stability assurance.

AutomationLoad Testingonline education
0 likes · 12 min read
Evolution of Live Streaming Load Testing and Stability Assurance for an Online Education Platform
DataFunTalk
DataFunTalk
Feb 12, 2021 · Big Data

Apache Flink at Kuaishou: Past, Present, and Future

Zhao Jianbo, head of Kuaishou's big data architecture team, presents an in‑depth overview of Apache Flink's adoption at Kuaishou, covering reasons for selection, development history, business data flows, technical innovations such as the Slimbase state engine, stability improvements, and future roadmap.

Apache FlinkBig DataKuaishou
0 likes · 16 min read
Apache Flink at Kuaishou: Past, Present, and Future
Alibaba Cloud Developer
Alibaba Cloud Developer
Feb 8, 2021 · Operations

Why Offline Environments Are Unstable and How to Make Them More Reliable

The article explains why offline environments are inherently unstable, outlines the root causes, and provides a comprehensive set of practical strategies—including infrastructure standards, stable layer improvements, dev environment hygiene, IaC, and continuous integration—to make offline environments as stable as possible.

continuous integrationinfrastructure-as-codeoffline environment
0 likes · 29 min read
Why Offline Environments Are Unstable and How to Make Them More Reliable
Didi Tech
Didi Tech
Jan 27, 2021 · Artificial Intelligence

Addressing Uncertainty in Autonomous Driving: Data‑Driven Control Module Strategies

The article proposes a three‑layer, data‑driven framework—problem analysis using massive fleet data, iterative deep‑learning algorithm development with fallback and explainable‑AI safeguards, and systematic validation via simulation and real‑world tests—to mitigate perception, prediction, and control uncertainties and advance trustworthy autonomous‑driving control systems.

Data-drivenScalabilitycontrol
0 likes · 12 min read
Addressing Uncertainty in Autonomous Driving: Data‑Driven Control Module Strategies
Alibaba Cloud Native
Alibaba Cloud Native
Jan 6, 2021 · Backend Development

How RocketMQ 4.8.0 Supercharges DLedger with Massive Performance and Stability Gains

Apache RocketMQ 4.8.0 introduces extensive DLedger enhancements, including asynchronous pipeline processing, batch log replication, extensive chaos‑testing validation, preferred‑leader selection, and batch‑message support, delivering several‑fold throughput improvements, faster recovery from failures, and new functional capabilities for production‑grade messaging.

Apache RocketMQDLedgerMessage Queue
0 likes · 8 min read
How RocketMQ 4.8.0 Supercharges DLedger with Massive Performance and Stability Gains
Yanxuan Tech Team
Yanxuan Tech Team
Dec 14, 2020 · Operations

Mastering Stability Governance: Practical Strategies for Reliable Supply‑Chain Systems

This article examines the critical role of stability governance in evolving systems, outlines a three‑stage framework—usability, monitoring alerts, and online emergency—illustrated with a case study of an electronic waybill service, and shares concrete strategies for prevention, detection, response, and post‑mortem to achieve predictable, observable, and fast‑acting reliability.

Operationsgovernanceincident response
0 likes · 11 min read
Mastering Stability Governance: Practical Strategies for Reliable Supply‑Chain Systems
NetEase Media Technology Team
NetEase Media Technology Team
Dec 8, 2020 · Operations

Comprehensive Online Load‑Testing and Stability Assurance Framework

The stability‑assurance squad built an online load‑testing framework that injects global TraceIds via a Java‑agent, records real‑traffic, routes test writes to shadow databases and caches, enforces automatic stop‑rules, and provides a UI platform, reducing cost, improving capacity insight, and enabling safe fault‑injection drills.

Distributed TracingJava AgentLoad Testing
0 likes · 12 min read
Comprehensive Online Load‑Testing and Stability Assurance Framework
Alibaba Terminal Technology
Alibaba Terminal Technology
Nov 19, 2020 · Frontend Development

How Alibaba’s EVA Framework Delivered Lightning‑Fast Double 11 Interactive Cats

Alibaba’s Double 11 “Super Star Show Cat” interactive experience achieved ultra‑fast page loads, seamless animation, and zero failures by leveraging a custom EVA front‑end solution, optimized page rendering infrastructure, modular loading, data slimming, accessibility support, and a global stability strategy with device‑level experience grading.

EVAWebaccessibility
0 likes · 13 min read
How Alibaba’s EVA Framework Delivered Lightning‑Fast Double 11 Interactive Cats
Xianyu Technology
Xianyu Technology
Nov 19, 2020 · Operations

Rapid and Safe Migration of a Centralized Microservice Platform to Department‑Built Infrastructure

The team migrated a large, multi‑service microservice publishing platform—including Xianyu, Taobao, Alipay, and Tmall—from a centralized environment to a department‑built infrastructure in ten working days by cloning the repo, updating configurations, separating databases, rigorously verifying functionality across dev, pre‑release, and production, and ensuring isolation and monitoring for stability.

Backend DevelopmentData MigrationDeployment
0 likes · 7 min read
Rapid and Safe Migration of a Centralized Microservice Platform to Department‑Built Infrastructure
New Oriental Technology
New Oriental Technology
Sep 7, 2020 · Operations

Performance Optimization and Stability Enhancement of the Continuation Enrollment System

This article details the background, performance and stability requirements, strategic approach, and concrete initiatives—including full‑chain load testing, chaos engineering, monitoring, and targeted optimization projects—that were undertaken to boost the performance by over 300% and improve high‑availability of the continuation enrollment platform.

Load Testingbackend optimizationchaos testing
0 likes · 7 min read
Performance Optimization and Stability Enhancement of the Continuation Enrollment System
Cloud Native Technology Community
Cloud Native Technology Community
Jul 7, 2020 · Cloud Native

Taming etcd Instability: Lessons from Managing Million‑Node Kubernetes Clusters

This article details how Tencent Cloud’s TKE team identified, analyzed, reproduced, and resolved multiple etcd stability and performance issues—including data inconsistency, memory leaks, mvcc deadlocks, and WAL crashes—while sharing the lessons learned and the optimizations applied to support million‑node Kubernetes deployments.

Kubernetescloud-nativedistributed storage
0 likes · 29 min read
Taming etcd Instability: Lessons from Managing Million‑Node Kubernetes Clusters
Tongcheng Travel Technology Center
Tongcheng Travel Technology Center
Sep 3, 2019 · Big Data

Practical Experiences and Lessons Learned in Building a Flink‑Based Real‑Time Computing Platform at Tongcheng‑Elong

This article details the design, implementation, and optimization of a Flink‑based real‑time computing platform at Tongcheng‑Elong, covering the evolution from Storm to Flink, support for FlinkSQL and FlinkStream, metric collection, logging, data lineage, savepoint management, and numerous stability fixes contributed back to the open‑source community.

Big DataData LineageFlink
0 likes · 16 min read
Practical Experiences and Lessons Learned in Building a Flink‑Based Real‑Time Computing Platform at Tongcheng‑Elong
iQIYI Technical Product Team
iQIYI Technical Product Team
Jul 26, 2019 · Mobile Development

iQIYI Mobile App Performance Optimization: Package Size, Startup Speed, Stability, and Toolchain Evolution

iQIYI’s engineering team shrank app packages, accelerated cold‑start times, hardened stability, and built an evolving automated toolbox that analyzes resources, binaries, launch phases, and code safety, enabling a sub‑200 MB install, sub‑second launches, 55 fps transitions, and crash rates below 0.2 %.

Mobile OptimizationPackage SizeTooling
0 likes · 19 min read
iQIYI Mobile App Performance Optimization: Package Size, Startup Speed, Stability, and Toolchain Evolution
AntTech
AntTech
Jul 20, 2019 · Mobile Development

Totoro: A Scalable Mobile Automation Testing Framework for Android, iOS, and Hybrid Apps

Totoro is an Ant Financial‑developed mobile automation testing framework that supports Android, iOS, HTML5, mini‑programs, Weex and Cube, featuring a two‑layer C/S architecture, full‑link stability mechanisms, intelligent app installation, comprehensive popup governance, AI‑assisted image detection, and a roadmap toward standardization and extensibility.

AISmartHubTotoro
0 likes · 14 min read
Totoro: A Scalable Mobile Automation Testing Framework for Android, iOS, and Hybrid Apps
Alibaba Cloud Developer
Alibaba Cloud Developer
Mar 18, 2019 · Operations

Alibaba Hema’s 7‑Layer Funnel & 23 Tactics for Ultra‑Fast Delivery Stability

The article outlines Alibaba’s Hema delivery platform’s end‑to‑end stability strategy, detailing a 7‑layer funnel review process, three core norms (development, architecture, stability), and 23 practical tactics—including core‑noncore isolation, proactive monitoring, fault prevention, rapid recovery, and service‑level controls—to ensure reliable 30‑minute deliveries despite complex logistics and external disruptions.

Operationsarchitecturedelivery
0 likes · 13 min read
Alibaba Hema’s 7‑Layer Funnel & 23 Tactics for Ultra‑Fast Delivery Stability
Java Captain
Java Captain
Dec 7, 2018 · Fundamentals

Overview of Common Sorting Algorithms and Their Characteristics

This article introduces sorting algorithms, distinguishing internal and external sorting, listing major internal sorts, explaining their time‑complexity categories and stability properties, and providing visual illustrations for each algorithm along with a reference to an open‑source implementation.

Data StructuresSorting Algorithmsalgorithm fundamentals
0 likes · 3 min read
Overview of Common Sorting Algorithms and Their Characteristics
Java Backend Technology
Java Backend Technology
Oct 19, 2018 · Operations

How to Ensure Stability for Billion-Request Websites: Proven Strategies

Ensuring stability for sites handling up to 100,000 requests per minute requires a combination of configuration management, feature toggles, phased deployment, robust error handling, comprehensive logging, real-time monitoring, traffic-aware throttling, service degradation, and disaster-recovery tactics, all of which are detailed in this guide.

Deploymentlarge-scale systemsrate limiting
0 likes · 9 min read
How to Ensure Stability for Billion-Request Websites: Proven Strategies
JD Tech Talk
JD Tech Talk
Aug 9, 2018 · Operations

Ensuring Stability and Scalability in Large‑Scale Kubernetes Clusters: Three Key Questions and Operational Practices

The article explains why operating massive Kubernetes clusters is as challenging as building large systems, outlines three critical stability questions, shares real‑world data collection, visualization, and tooling practices, and provides concrete recommendations for high‑availability, monitoring, and performance optimization.

AutomationKubernetesObservability
0 likes · 12 min read
Ensuring Stability and Scalability in Large‑Scale Kubernetes Clusters: Three Key Questions and Operational Practices
JD Retail Technology
JD Retail Technology
Jul 24, 2018 · Operations

Stability and Operational Practices for Large‑Scale Kubernetes Clusters

This article shares practical experience and best‑practice guidelines for operating large‑scale Kubernetes clusters, covering stability checks, component failure impact, recovery strategies, alerting mechanisms, data collection, visualization, and the suite of operational tools that help ensure reliable, high‑performance cloud‑native infrastructure.

KubernetesObservabilitycluster operations
0 likes · 10 min read
Stability and Operational Practices for Large‑Scale Kubernetes Clusters
Alibaba Cloud Infrastructure
Alibaba Cloud Infrastructure
Dec 21, 2017 · Operations

Stability Monitoring Practices for Double 11 2017

The 2017 Double 11 stability monitoring project introduced a four‑layer monitoring architecture—including customer & sentiment, business, system water‑level, and infrastructure monitoring—along with data archiving and system‑level reliability measures to detect, respond to, and mitigate issues far faster than traditional manual processes.

Operationsbig-dataincident response
0 likes · 14 min read
Stability Monitoring Practices for Double 11 2017
MaGe Linux Operations
MaGe Linux Operations
Aug 11, 2017 · Operations

Why Operations Matters: Beyond Automation to Real Business Value

In this reflective piece, Zhao Cheng (aka Qianyi) shares his experience managing the operations team at Mogujie, argues that operations value extends beyond automation to efficiency, stability, security, cost, and user experience, and offers practical guidance for shifting mindsets and aligning ops with business goals.

AutomationCost ManagementDevOps
0 likes · 12 min read
Why Operations Matters: Beyond Automation to Real Business Value
Efficient Ops
Efficient Ops
Jun 22, 2017 · Cloud Computing

How to Choose the Right Cloud Host: Inside Trusted Cloud’s Rating System

This article explains the Trusted Cloud host rating framework, detailing its star‑based levels, evaluation criteria such as availability, security and disaster recovery, and how enterprises can use the standards to select the most suitable cloud host provider.

AvailabilityCloud Hostcloud computing
0 likes · 6 min read
How to Choose the Right Cloud Host: Inside Trusted Cloud’s Rating System
21CTO
21CTO
Dec 18, 2016 · Backend Development

How to Design Clean, Stable, and User‑Friendly APIs: Best Practices

Effective API design requires shifting from a developer‑centric mindset to the user’s perspective, emphasizing clear documentation, stability, versioning, flexibility, security, and ease of use, while avoiding unnecessary complexity and ensuring consistent, well‑structured interfaces for diverse clients across web, mobile, and IoT platforms.

DocumentationSecurityVersioning
0 likes · 12 min read
How to Design Clean, Stable, and User‑Friendly APIs: Best Practices
Baidu Intelligent Testing
Baidu Intelligent Testing
May 24, 2016 · Operations

Pursuing Excellence in Continuous Integration: Strategies for Stable, Fast, and Generic Testing Services

This article outlines how a product line can achieve highly stable, rapid, and universally applicable testing services within a continuous integration pipeline by employing gray‑release, hierarchical builds, automated test case selection, subsystem decoupling, generic stubs, and a unified testing platform.

Automated TestingCITesting Services
0 likes · 7 min read
Pursuing Excellence in Continuous Integration: Strategies for Stable, Fast, and Generic Testing Services
MaGe Linux Operations
MaGe Linux Operations
Mar 9, 2016 · Fundamentals

Explore 8 Essential Sorting Algorithms and Their Trade‑offs

This article introduces eight fundamental internal sorting algorithms—Insertion, Shell, Selection, Bubble, Merge, Quick, Heap, and Radix—explaining their principles, step‑by‑step procedures, time and space complexities, and stability characteristics to help readers choose the right method for different data sets.

Data StructuresSorting Algorithmsalgorithm analysis
0 likes · 14 min read
Explore 8 Essential Sorting Algorithms and Their Trade‑offs
Architect
Architect
Mar 4, 2016 · Cloud Computing

Performance vs Stability: Comparative Evaluation of Major Public Cloud Providers

The article analyzes why stability, not raw performance, should be the primary criterion when choosing a cloud provider, describes a 7‑day benchmark across AWS, Azure, Alibaba Cloud, Tencent Cloud and UCloud, and presents detailed results for CPU, memory, disk, database and storage stability.

AWSAlibaba CloudAzure
0 likes · 10 min read
Performance vs Stability: Comparative Evaluation of Major Public Cloud Providers