Tagged articles
10 articles
Page 1 of 1
Baidu Geek Talk
Baidu Geek Talk
Dec 10, 2025 · Artificial Intelligence

How Offloading Latent Cache Boosts DeepSeek‑V3.2‑Exp Decoding Throughput

This report analyzes the memory bottleneck of DeepSeek‑V3.2‑Exp’s sparse‑attention decoder, proposes the Expanded Sparse Server (ESS) to offload the latent cache to CPU memory, and demonstrates through high‑fidelity simulation that the approach dramatically improves decode throughput while keeping latency within acceptable limits.

Cache offloadGPU MemoryLLM inference
0 likes · 20 min read
How Offloading Latent Cache Boosts DeepSeek‑V3.2‑Exp Decoding Throughput
Liangxu Linux
Liangxu Linux
Oct 19, 2025 · Operations

Boost Linux Network Performance: Proven Techniques to Increase Bandwidth and Reduce Latency

This article provides a comprehensive guide to Linux network performance tuning, covering key metrics, TCP window adjustments, Fast Open, congestion control algorithms, kernel parameter optimizations, zero‑copy transmission, NIC bonding, connection limits, and essential monitoring tools to achieve higher bandwidth and lower latency.

Latency ReductionTCP Tuningbandwidth optimization
0 likes · 10 min read
Boost Linux Network Performance: Proven Techniques to Increase Bandwidth and Reduce Latency
Refining Core Development Skills
Refining Core Development Skills
Sep 3, 2025 · Operations

When Should You Hire a Dedicated Performance Engineering Team?

This article explains why modern enterprises increasingly need specialized performance engineering teams, outlines their ROI through cost savings, latency reduction, scalability, and engineering efficiency, details the engineers' responsibilities, and provides practical hiring guidelines and real‑world case studies.

Cost OptimizationLatency ReductionScalability
0 likes · 29 min read
When Should You Hire a Dedicated Performance Engineering Team?
Ximalaya Technology Team
Ximalaya Technology Team
Dec 12, 2023 · Frontend Development

Performance Optimization of Cloud Editing Playback: Preloading and Latency Reduction

By analyzing latency sources and introducing a pre‑loading ‘prepare’ step with new player APIs, the cloud‑editing team reduced audio start‑up delays by roughly 200 ms on average—cutting half‑second waits to under three‑hundred milliseconds and markedly improving streamer workflow.

Latency ReductionPerformance Optimizationcloud editing
0 likes · 12 min read
Performance Optimization of Cloud Editing Playback: Preloading and Latency Reduction
Meituan Technology Team
Meituan Technology Team
Apr 13, 2023 · Artificial Intelligence

Peak-First Regularization for Low-Latency Streaming Speech Recognition

The paper presents a low‑latency streaming speech‑recognition solution that reframes latency reduction as a knowledge‑distillation task, using a simple peak‑first regularization term to shift CTC output probabilities leftward and achieve up to 200 ms average latency reduction without harming word error rate.

CTCLatency ReductionPeak-First Regularization
0 likes · 21 min read
Peak-First Regularization for Low-Latency Streaming Speech Recognition
Tencent Cloud Developer
Tencent Cloud Developer
Dec 12, 2022 · Artificial Intelligence

Performance Optimization of Tencent Cloud OCR Service: Reducing Latency and Improving Throughput

Tencent Cloud’s OCR team cut average response time from 1.8 seconds to under one second and boosted throughput by over 50 % by redesigning the model with self‑attention, accelerating inference with a Tensor‑Network accelerator, shrinking RPC payloads, enabling asynchronous logging, and optimizing multi‑region GPU memory utilization.

AI modelCloud ServicesInference Acceleration
0 likes · 13 min read
Performance Optimization of Tencent Cloud OCR Service: Reducing Latency and Improving Throughput
Java Architect Essentials
Java Architect Essentials
Nov 11, 2022 · Big Data

Meituan Kafka at Scale: Challenges and Optimizations for Latency, Cluster Management, and Reliability

This article details Meituan's large‑scale Kafka deployment, describing the current state, performance challenges such as slow nodes and disk imbalance, and the comprehensive optimizations applied—including read/write latency reductions, migration pipelines, fetcher isolation, SSD caching, RAID acceleration, cgroup isolation, full‑link monitoring, service lifecycle management, and TOR disaster recovery—to improve reliability and prepare for future growth.

Cluster ManagementKafkaLatency Reduction
0 likes · 21 min read
Meituan Kafka at Scale: Challenges and Optimizations for Latency, Cluster Management, and Reliability
IT Architects Alliance
IT Architects Alliance
Sep 12, 2020 · Industry Insights

Why Microsoft Is Sinking Servers: The Promise of Underwater Data Centers

The article examines Microsoft's bold Natick project of placing servers on the ocean floor, highlighting how underwater data centers can cut cooling costs, reduce latency for coastal users, and address the growing energy and space challenges of traditional land‑based data centers.

Latency ReductionMicrosoft Natickdata center cooling
0 likes · 4 min read
Why Microsoft Is Sinking Servers: The Promise of Underwater Data Centers
Efficient Ops
Efficient Ops
Jan 6, 2019 · Cloud Computing

Why Does Video Buffer on Fast Phones? Understanding CDN Technology

This article explains why video streams can still lag on high‑speed mobile networks, introduces the origin and principles of Content Delivery Networks (CDNs), and shows how CDNs reduce latency, improve user experience, and benefit both internet and telecom operators.

CDNCloud ServicesContent Delivery Network
0 likes · 10 min read
Why Does Video Buffer on Fast Phones? Understanding CDN Technology