Baidu Geek Talk
Author

Baidu Geek Talk

Follow us to discover more Baidu tech insights.

511
Articles
0
Likes
879
Views
0
Comments
Recent Articles

Latest from Baidu Geek Talk

100 recent articles max
Baidu Geek Talk
Baidu Geek Talk
Jan 6, 2025 · Information Security

MarkupLM-based Detection of Malicious Content Scraping

The article presents a MarkupLM‑based approach that enriches BERT with XPath embeddings to jointly model webpage text and structure, enabling site‑level detection of malicious content‑scraping pages that bypass traditional rule‑based filters and demonstrating the critical role of structural cues in improving spam classification accuracy.

Document UnderstandingMarkupLMXPath embedding
0 likes · 16 min read
MarkupLM-based Detection of Malicious Content Scraping
Baidu Geek Talk
Baidu Geek Talk
Dec 30, 2024 · Industry Insights

How Baidu’s HTAP Table Storage Achieves Massive IO Gains and Faster Development

Baidu’s Search Content Storage team built an HTAP table storage system and a serverless compute‑scheduling architecture that separates OLTP and OLAP workloads, delivering up to 200 GB/s peak IO, reducing storage cost by 75 %, and enabling SQL‑style task development with native FaaS functions.

Compute SchedulingHTAPIO optimization
0 likes · 20 min read
How Baidu’s HTAP Table Storage Achieves Massive IO Gains and Faster Development
Baidu Geek Talk
Baidu Geek Talk
Dec 25, 2024 · Industry Insights

How to Build a Multimodal Web Page Model for the LLM Era

This article examines the unique multimodal and multi‑granular nature of web pages, compares fusion strategies, proposes a cross‑modal attention approach, outlines fine‑ and coarse‑grained pre‑training tasks, and explores low‑cost adaptor methods for adapting large multimodal models to web‑page modeling in the LLM era.

AIHTMLLLM adaptation
0 likes · 10 min read
How to Build a Multimodal Web Page Model for the LLM Era
Baidu Geek Talk
Baidu Geek Talk
Dec 23, 2024 · Industry Insights

How Baidu’s One‑Stop Search Platform Cuts Development Costs by 80%

This article analyzes Baidu’s vertical‑search architecture team’s one‑stop development platform, detailing the background challenges, the FaaS and SaaS mechanisms introduced, design decisions, performance optimizations, dynamic form and DAG visualisation, and the resulting cost reductions and productivity gains.

FaaSSaaSSearch Platform
0 likes · 17 min read
How Baidu’s One‑Stop Search Platform Cuts Development Costs by 80%
Baidu Geek Talk
Baidu Geek Talk
Dec 18, 2024 · Artificial Intelligence

GEE Graph Embedding Algorithm for Business Security Anomaly Detection

The article presents the GEE (Graph Encoder Embedding) algorithm for business security anomaly detection, explains its label‑propagation foundation, evaluates it on ten‑million‑edge real data, identifies inefficiencies in the original implementation, and demonstrates that vectorized NumPy/Pandas optimizations reduce runtime from 55 seconds to about 4 seconds while preserving meaningful TSNE‑visualized embeddings.

GEE algorithmanomaly detectionanti-fraud
0 likes · 21 min read
GEE Graph Embedding Algorithm for Business Security Anomaly Detection
Baidu Geek Talk
Baidu Geek Talk
Dec 16, 2024 · Artificial Intelligence

AIAPI: Baidu's AI-Native Retrieval System for Large Language Model Applications

AIAPI, Baidu’s AI‑native retrieval platform for large language models, tackles hallucination, slow domain updates, and output opacity by delivering authoritative, timely, full‑content data through a dual‑channel architecture that combines traditional search and RAG, employs reusable ranking, graph‑enhanced data layers, dynamic caching that cuts storage by 70 %, and QueryPlan‑based QoS, achieving markedly higher retrieval quality and a 34 % speed gain with Wenxin 4.0.

AI-Native SystemsAIAPIQuery Planning
0 likes · 12 min read
AIAPI: Baidu's AI-Native Retrieval System for Large Language Model Applications
Baidu Geek Talk
Baidu Geek Talk
Dec 11, 2024 · Artificial Intelligence

How AI Cuts Essay Grading Time by 6×: Inside the Smart Writing Platform

This article examines how an AI‑powered essay‑grading platform combines PaddleOCR and Baidu's Wenxin large model to automate scoring, generate personalized feedback, and reduce teachers' grading workload by over six times, while improving student learning outcomes across hundreds of Chinese schools.

AIEducation TechnologyPaddleOCR
0 likes · 11 min read
How AI Cuts Essay Grading Time by 6×: Inside the Smart Writing Platform
Baidu Geek Talk
Baidu Geek Talk
Dec 4, 2024 · Artificial Intelligence

AI-Driven Microservice Governance Platform Based on Multi-Agent Architecture

The article introduces Jarvis, an AI-driven microservice governance platform that uses a multi-agent architecture and natural-language dialogue to automate full-process operations such as deployments, rate limiting, and circuit-breaker configuration, while leveraging large language model reasoning for root-cause diagnosis and a data-flywheel that continuously trains lightweight expert models.

AI DevOpsData FlywheelIntelligent Fault Diagnosis
0 likes · 10 min read
AI-Driven Microservice Governance Platform Based on Multi-Agent Architecture
Baidu Geek Talk
Baidu Geek Talk
Nov 25, 2024 · Artificial Intelligence

PP-ShiTuV2: A General Image Recognition Pipeline in PaddleX

PP‑ShiTuV2, a PaddleX pipeline that integrates subject detection, deep feature encoding, and vector retrieval, delivers 91 % recall@1 on AliProducts, surpasses earlier models by over 20 points, runs efficiently on GPU and CPU, and offers simple installation, quick‑start code, and full fine‑tuning support.

Image RecognitionPP-ShiTuV2PaddleX
0 likes · 8 min read
PP-ShiTuV2: A General Image Recognition Pipeline in PaddleX
Baidu Geek Talk
Baidu Geek Talk
Nov 20, 2024 · Artificial Intelligence

Boosting ANN Search with GPU: Inside RAFT’s IVF_INT8 Implementation

This article examines how Baidu and NVIDIA leveraged the open‑source RAFT library to build a GPU‑accelerated approximate nearest neighbor (ANN) retrieval system, detailing algorithm choices, offline indexing, online batch processing, performance results, and practical guidelines for deploying ANN on GPUs.

ANNGPUIVF_INT8
0 likes · 20 min read
Boosting ANN Search with GPU: Inside RAFT’s IVF_INT8 Implementation