Tagged articles

Latency Reduction

10 articles · Page 1 of 1

Dec 10, 2025 · Artificial Intelligence

How Offloading Latent Cache Boosts DeepSeek‑V3.2‑Exp Decoding Throughput

This report analyzes the memory bottleneck of DeepSeek‑V3.2‑Exp’s sparse‑attention decoder, proposes the Expanded Sparse Server (ESS) to offload the latent cache to CPU memory, and demonstrates through high‑fidelity simulation that the approach dramatically improves decode throughput while keeping latency within acceptable limits.

Cache offloadGPU memoryLLM Inference

0 likes · 20 min read

How Offloading Latent Cache Boosts DeepSeek‑V3.2‑Exp Decoding Throughput

Liangxu Linux

Oct 19, 2025 · Operations

Boost Linux Network Performance: Proven Techniques to Increase Bandwidth and Reduce Latency

This article provides a comprehensive guide to Linux network performance tuning, covering key metrics, TCP window adjustments, Fast Open, congestion control algorithms, kernel parameter optimizations, zero‑copy transmission, NIC bonding, connection limits, and essential monitoring tools to achieve higher bandwidth and lower latency.

Latency ReductionLinuxNetwork Performance

0 likes · 10 min read

Boost Linux Network Performance: Proven Techniques to Increase Bandwidth and Reduce Latency

Refining Core Development Skills

Sep 3, 2025 · Operations

When Should You Hire a Dedicated Performance Engineering Team?

This article explains why modern enterprises increasingly need specialized performance engineering teams, outlines their ROI through cost savings, latency reduction, scalability, and engineering efficiency, details the engineers' responsibilities, and provides practical hiring guidelines and real‑world case studies.

Latency Reductioncost optimizationinfrastructure ROI

0 likes · 29 min read

When Should You Hire a Dedicated Performance Engineering Team?

Ximalaya Technology Team

Dec 12, 2023 · Frontend Development

Performance Optimization of Cloud Editing Playback: Preloading and Latency Reduction

By analyzing latency sources and introducing a pre‑loading ‘prepare’ step with new player APIs, the cloud‑editing team reduced audio start‑up delays by roughly 200 ms on average—cutting half‑second waits to under three‑hundred milliseconds and markedly improving streamer workflow.

Latency ReductionPerformance Optimizationcloud editing

0 likes · 12 min read

Performance Optimization of Cloud Editing Playback: Preloading and Latency Reduction

Meituan Technology Team

Apr 13, 2023 · Artificial Intelligence

Peak-First Regularization for Low-Latency Streaming Speech Recognition

The paper presents a low‑latency streaming speech‑recognition solution that reframes latency reduction as a knowledge‑distillation task, using a simple peak‑first regularization term to shift CTC output probabilities leftward and achieve up to 200 ms average latency reduction without harming word error rate.

CTCKnowledge DistillationLatency Reduction

0 likes · 21 min read

Peak-First Regularization for Low-Latency Streaming Speech Recognition

Tencent Cloud Developer

Dec 12, 2022 · Artificial Intelligence

Performance Optimization of Tencent Cloud OCR Service: Reducing Latency and Improving Throughput

Tencent Cloud’s OCR team cut average response time from 1.8 seconds to under one second and boosted throughput by over 50 % by redesigning the model with self‑attention, accelerating inference with a Tensor‑Network accelerator, shrinking RPC payloads, enabling asynchronous logging, and optimizing multi‑region GPU memory utilization.

AI modelLatency ReductionOCR

0 likes · 13 min read

Performance Optimization of Tencent Cloud OCR Service: Reducing Latency and Improving Throughput

Java Architect Essentials

Nov 11, 2022 · Big Data

Meituan Kafka at Scale: Challenges and Optimizations for Latency, Cluster Management, and Reliability

This article details Meituan's large‑scale Kafka deployment, describing the current state, performance challenges such as slow nodes and disk imbalance, and the comprehensive optimizations applied—including read/write latency reductions, migration pipelines, fetcher isolation, SSD caching, RAID acceleration, cgroup isolation, full‑link monitoring, service lifecycle management, and TOR disaster recovery—to improve reliability and prepare for future growth.

KafkaLatency ReductionMeituan

0 likes · 21 min read

Meituan Kafka at Scale: Challenges and Optimizations for Latency, Cluster Management, and Reliability

IT Architects Alliance

Sep 12, 2020 · Industry Insights

Why Microsoft Is Sinking Servers: The Promise of Underwater Data Centers

The article examines Microsoft's bold Natick project of placing servers on the ocean floor, highlighting how underwater data centers can cut cooling costs, reduce latency for coastal users, and address the growing energy and space challenges of traditional land‑based data centers.

Latency ReductionMicrosoft Natickdata center cooling

0 likes · 4 min read

Why Microsoft Is Sinking Servers: The Promise of Underwater Data Centers

iQIYI Technical Product Team

Sep 6, 2019 · Industry Insights

How iQIYI and Huawei Cut Edge Latency Below 10 ms with 5G MEC for VR & 8K Video

iQIYI and Huawei verified a 5G MEC + CDN edge‑acceleration solution that reduced end‑to‑end latency from 60 ms to under 10 ms, boosted 1080p/4K video download speeds by 400%, and paved the way for large‑scale VR, AR, and ultra‑HD streaming in the 5G era.

5GLatency ReductionMEC

0 likes · 4 min read

How iQIYI and Huawei Cut Edge Latency Below 10 ms with 5G MEC for VR & 8K Video

Efficient Ops

Jan 6, 2019 · Cloud Computing

Why Does Video Buffer on Fast Phones? Understanding CDN Technology

This article explains why video streams can still lag on high‑speed mobile networks, introduces the origin and principles of Content Delivery Networks (CDNs), and shows how CDNs reduce latency, improve user experience, and benefit both internet and telecom operators.

CDNContent Delivery NetworkLatency Reduction

0 likes · 10 min read

Why Does Video Buffer on Fast Phones? Understanding CDN Technology