Baidu Geek Talk
Author

Baidu Geek Talk

Follow us to discover more Baidu tech insights.

511
Articles
0
Likes
879
Views
0
Comments
Recent Articles

Latest from Baidu Geek Talk

100 recent articles max
Baidu Geek Talk
Baidu Geek Talk
Nov 18, 2024 · Big Data

Optimizing Multi-Dimensional User Count Statistics in Big Data Computing: A Data Tagging Approach

By replacing exponential row expansion with a data‑tagging strategy that encodes dimension combinations and aggregates at the user level, the authors cut Baidu Feed’s multi‑dimensional user‑count computation time from 49 to 14 minutes and shuffle size from 16 TB to 800 GB, enabling scalable analysis across dozens of dimensions for billions of daily users.

Big Data OptimizationHive SQLdata tagging
0 likes · 12 min read
Optimizing Multi-Dimensional User Count Statistics in Big Data Computing: A Data Tagging Approach
Baidu Geek Talk
Baidu Geek Talk
Nov 13, 2024 · Industry Insights

Why Cloud‑Native Data Lakes Are the New Standard for Storage Acceleration

This article analyzes the evolution of data‑lake storage acceleration, compares traditional parallel file systems, object‑storage‑based solutions and modern cache‑enabled architectures, and explains how cloud‑native data lakes address scalability, cost, and performance challenges for AI and big‑data workloads.

AIObject Storagebig data
0 likes · 24 min read
Why Cloud‑Native Data Lakes Are the New Standard for Storage Acceleration
Baidu Geek Talk
Baidu Geek Talk
Nov 6, 2024 · Cloud Computing

Baidu Canghai Storage Unified Technology Base: Architecture and Evolution of Metadata, Namespace, and Data Layers

Baidu’s Canghai Storage unifies metadata, hierarchical namespace, and data layers into a Meta‑Aware, three‑generation architecture that scales to trillions of metadata items and zettabyte‑scale data, using a distributed transactional KV store, single‑machine‑distributed namespace, and online erasure‑coding micro‑services to deliver high performance, low cost, and seamless scalability.

NewSQLbig datacloud storage
0 likes · 18 min read
Baidu Canghai Storage Unified Technology Base: Architecture and Evolution of Metadata, Namespace, and Data Layers
Baidu Geek Talk
Baidu Geek Talk
Nov 4, 2024 · Big Data

Why Object Storage Is Replacing HDFS for Modern Data Lakes: Baidu’s 2.0 Acceleration

Data lakes have evolved from HDFS to object storage, addressing resource inefficiency, scalability limits, and operational burdens; Baidu’s Data Lake Storage Acceleration 2.0 introduces hierarchical Namespace 2.0, a streaming storage engine, RapidFS caching, and a fully HDFS‑compatible BOS‑HDFS layer to boost performance and support massive AI workloads.

AIBaiduHDFS Compatibility
0 likes · 12 min read
Why Object Storage Is Replacing HDFS for Modern Data Lakes: Baidu’s 2.0 Acceleration
Baidu Geek Talk
Baidu Geek Talk
Oct 30, 2024 · Cloud Computing

Baidu Cloud Infrastructure for AI-Native Era

Baidu Intelligent Cloud outlines how its evolving, high-performance infrastructure—featuring rapid 3-minute instance provisioning, over 200 GB bandwidth, elastic computing, specialized storage, and AI-driven MLOps tools—enables AI-native model training and deployment across booming sectors such as automotive and finance, supporting the industry’s shift to AI-centric cloud services.

Case StudiesMLOpscloud computing
0 likes · 9 min read
Baidu Cloud Infrastructure for AI-Native Era
Baidu Geek Talk
Baidu Geek Talk
Oct 28, 2024 · Artificial Intelligence

Baidu Intelligent Cloud Qianfan AppBuilder: Enterprise-Level Large Model Application Development Platform

Baidu Intelligent Cloud’s Qianfan AppBuilder 3.0 offers an enterprise‑grade platform that simplifies large‑model application development by providing high‑accuracy RAG, robust agent scheduling, extensive integration, secure private‑or‑hybrid deployment, and a guided methodology, enabling industries to transform processes, add AI copilots, and create novel capabilities.

AI integrationAgent DevelopmentBaidu Intelligent Cloud
0 likes · 12 min read
Baidu Intelligent Cloud Qianfan AppBuilder: Enterprise-Level Large Model Application Development Platform
Baidu Geek Talk
Baidu Geek Talk
Oct 23, 2024 · Artificial Intelligence

Integrating Yuan 2.0 Large Model with PaddleNLP: Overview, Usage Steps, and Interaction Examples

The open‑source Yuan 2.0 large model is fully integrated into Baidu’s PaddleNLP, offering quick inference for tasks like code generation, translation, and reasoning, along with efficient distributed training and fine‑tuning features such as Zero Padding optimization, enabling developers to easily deploy and customize the model via simple setup steps and example interactions.

AIJavaLLM
0 likes · 10 min read
Integrating Yuan 2.0 Large Model with PaddleNLP: Overview, Usage Steps, and Interaction Examples
Baidu Geek Talk
Baidu Geek Talk
Oct 22, 2024 · Big Data

How Baidu’s DATAPILOT Uses NVIDIA RAPIDS to Supercharge SQL Analytics

Baidu’s DATAPILOT platform combines natural‑language interaction with GPU‑accelerated Spark‑RAPIDS to turn complex, multi‑table SQL queries into seconds‑fast results, boosting ad‑revenue analysis efficiency by up to five‑fold while reducing infrastructure costs.

Apache SparkBaiduData Analytics
0 likes · 10 min read
How Baidu’s DATAPILOT Uses NVIDIA RAPIDS to Supercharge SQL Analytics
Baidu Geek Talk
Baidu Geek Talk
Oct 21, 2024 · Databases

TDE-ClickHouse Optimization Practice at Baidu MEG: Query Performance, Data Import, and Distributed Architecture

Baidu MEG’s TDE‑ClickHouse optimization in the Turing 3.0 ecosystem boosts query speed up to 10×, halves latency, enables billion‑row bulk imports in under two hours, and migrates to a cloud‑native, ZooKeeper‑free architecture supporting 350 k CPU cores, 10 PB storage, and sub‑3‑second responses for 150 k daily BI queries.

Baidu MEGBulkloadClickHouse
0 likes · 19 min read
TDE-ClickHouse Optimization Practice at Baidu MEG: Query Performance, Data Import, and Distributed Architecture
Baidu Geek Talk
Baidu Geek Talk
Oct 16, 2024 · Operations

Design and Implementation of an Online‑Offline Task Scheduling System for Baidu’s Mobile Operations Promotion Platform

The authors redesign Baidu’s Mobile Operations Promotion Platform by separating online business logic from offline warehouse calculations and implementing a custom three‑step online‑offline scheduler that logs operations, orchestrates batch tasks, and dispatches them via TDS, delivering consistent, timely settlement data, reduced errors, and lower maintenance costs.

Architecture RefactoringBaidu OPSTDS
0 likes · 15 min read
Design and Implementation of an Online‑Offline Task Scheduling System for Baidu’s Mobile Operations Promotion Platform