Author

Baidu Geek Talk

520

Articles

Likes

1.6k

Views

Comments

Latest from Baidu Geek Talk

100 recent articles max

Baidu Geek Talk

Feb 17, 2025 · Operations

How Baidu Netdisk Prevents Service Avalanches: Dynamic Circuit Breaking & Queue Control

This article analyzes Baidu Netdisk's anti‑avalanche architecture, explaining how avalanche cascades occur in high‑concurrency services and detailing practical prevention, blocking, and mitigation techniques such as dynamic circuit breaking, traffic isolation, request‑validity checks, and socket‑level detection to maintain system reliability.

Circuit BreakingDynamic ThrottlingOperations

0 likes · 18 min read

How Baidu Netdisk Prevents Service Avalanches: Dynamic Circuit Breaking & Queue Control

Baidu Geek Talk

Feb 12, 2025 · Artificial Intelligence

Deploy DeepSeek, Llama, Qwen Models Fast on Baidu Baige AI Heterogeneous Platform

This guide walks you through creating a lightweight compute instance, adding it to Baidu Baige AI heterogeneous computing platform, deploying the vLLM tool, loading and serving small‑scale dense models such as DeepSeek, Llama and Qwen, and provides recommended configuration lists to achieve low‑cost, high‑performance inference.

AI model deploymentBaidu BaigeDeepSeek

0 likes · 3 min read

Deploy DeepSeek, Llama, Qwen Models Fast on Baidu Baige AI Heterogeneous Platform

Baidu Geek Talk

Feb 10, 2025 · Artificial Intelligence

How Baidu Cloud Slashes Inference Costs: DeepSeek Model Optimizations Unveiled

Baidu Cloud's Qianfan platform launched DeepSeek‑R1 and DeepSeek‑V3 with ultra‑low inference pricing, leveraging advanced engine performance tweaks, a split Prefill/Decode architecture, and comprehensive security measures that together boost throughput, cut costs, and ensure enterprise‑grade reliability.

AI inferenceBaidu CloudPerformance Optimization

0 likes · 5 min read

How Baidu Cloud Slashes Inference Costs: DeepSeek Model Optimizations Unveiled

Baidu Geek Talk

Feb 5, 2025 · Artificial Intelligence

How to Unlock Full GPU Efficiency for Enterprise AI Platforms

This article analyzes common GPU efficiency problems in enterprise AI compute platforms—such as low utilization, long fault‑resolution times, and limited performance gains—and presents three practical solutions: dynamic resource allocation, systematic fault‑tolerance, and system‑level tuning, illustrated with real‑world case studies.

AI PlatformGPU UtilizationResource Scheduling

0 likes · 11 min read

How to Unlock Full GPU Efficiency for Enterprise AI Platforms

Baidu Geek Talk

Jan 22, 2025 · Mobile Development

iOS Sandbox Disk Management and Cleaning Strategies

The article explains iOS sandbox storage by detailing the four main directories, their backup rules, naming conventions, and retrieval APIs, then outlines how to calculate physical file size and implements both automatic quota‑based and manual user‑driven cleaning methods, including system cache removal for tmp, WKWebView, and dyld caches.

Cache CleaningObjective‑CSandbox

0 likes · 22 min read

iOS Sandbox Disk Management and Cleaning Strategies

Baidu Geek Talk

Jan 20, 2025 · Industry Insights

How Baidu’s Qianfan AppBuilder Is Redefining AI‑Native App Development

The interview explores how Baidu Cloud's Qianfan AppBuilder platform evolves from traditional coding to AI‑native low‑code development, detailing the impact of large‑model agents, Retrieval‑Augmented Generation, security, multimodal support, and future roadmap on enterprise productivity and digital transformation.

AI agentsAI native appsEnterprise AI

0 likes · 18 min read

How Baidu’s Qianfan AppBuilder Is Redefining AI‑Native App Development

Baidu Geek Talk

Jan 15, 2025 · Artificial Intelligence

Understanding Large Model Inference Engines and Reducing Token Interval (TPOT)

Large‑model inference engines convert prompts into responses via a Prefill stage and an autoregressive Decoder, measured by TTFT and TPOT, and Baidu’s AIAK suite improves TPOT by separating tokenization, using static slot scheduling, and asynchronous execution, cutting token‑interval latency from ~35 ms to ~14 ms and boosting GPU utilization to about 75 % while also leveraging quantization and speculative execution for higher throughput.

AI accelerationGPU UtilizationTPOT

0 likes · 10 min read

Understanding Large Model Inference Engines and Reducing Token Interval (TPOT)

Baidu Geek Talk

Jan 13, 2025 · Industry Insights

Evolution of Video Search Ranking Architecture Towards an End‑to‑End Large‑Model Framework

The article outlines how video search ranking has shifted from a tightly‑coupled multi‑stage cascade to an extensible, end‑to‑end, model‑centric framework called Rankflow, leveraging large‑model inference, decoupled recall, fine‑grained parallelism, and elastic compute allocation to boost performance, flexibility, and maintainability while paving the way for future retrieval‑augmented generation integration.

AILarge Modelselastic resources

0 likes · 11 min read

Evolution of Video Search Ranking Architecture Towards an End‑to‑End Large‑Model Framework

Baidu Geek Talk

Jan 6, 2025 · Information Security

MarkupLM-based Detection of Malicious Content Scraping

The article presents a MarkupLM‑based approach that enriches BERT with XPath embeddings to jointly model webpage text and structure, enabling site‑level detection of malicious content‑scraping pages that bypass traditional rule‑based filters and demonstrating the critical role of structural cues in improving spam classification accuracy.

MarkupLMXPath embeddingcontent scraping detection

0 likes · 16 min read

MarkupLM-based Detection of Malicious Content Scraping