Tagged articles
24 articles
Page 1 of 1
Baidu Intelligent Cloud Tech Hub
Baidu Intelligent Cloud Tech Hub
Nov 7, 2025 · Artificial Intelligence

From Big Data to 30,000‑GPU Clusters: The Evolution of China’s AI Infrastructure

In a deep interview, Baidu AI Computing chief scientist Wang Yanpeng and host Koji trace China's internet infrastructure from the early big‑data era through cloud computing to today's AI boom, highlighting the pivotal role of compute power, GPU acceleration, data scaling, and Baidu's Baige platform in shaping the AI arms race.

AI InfrastructureBaidu BaigeGPU computing
0 likes · 26 min read
From Big Data to 30,000‑GPU Clusters: The Evolution of China’s AI Infrastructure
Tencent Technical Engineering
Tencent Technical Engineering
Jul 18, 2025 · Artificial Intelligence

From CPUs to GPUs: How Traditional Backend Skills Power Modern AI Infrastructure

This article explores the evolution of AI infrastructure, comparing it with traditional backend systems, and details how hardware shifts to GPU-centric designs, software adaptations like deep learning frameworks, and engineering challenges in model training and inference can be addressed using established backend methodologies.

AI InfrastructureDeep LearningGPU computing
0 likes · 19 min read
From CPUs to GPUs: How Traditional Backend Skills Power Modern AI Infrastructure
Tencent Cloud Developer
Tencent Cloud Developer
Jul 17, 2025 · Artificial Intelligence

Why GPUs Are the New CPUs: Unpacking AI Infrastructure Challenges

This article explores how AI infrastructure has shifted from CPU‑centric designs to GPU‑driven architectures, detailing hardware evolution, software changes, and the engineering challenges of large‑model training and inference, while offering practical insights for traditional backend engineers transitioning to AI systems.

AI InfrastructureDeep LearningGPU computing
0 likes · 16 min read
Why GPUs Are the New CPUs: Unpacking AI Infrastructure Challenges
DataFunTalk
DataFunTalk
Jul 3, 2025 · Artificial Intelligence

Inside xAI’s Grok 4: Massive Funding, Extreme Iteration, and Power Challenges

Elon Musk’s xAI has quietly leaked its upcoming Grok 4 and Grok 4 Code models, skipped Grok 3.5, secured $10 billion in new financing, and is building massive GPU super‑computing facilities, while raising concerns about model bias, data integrity, and unprecedented power‑grid strain.

AI fundingGPU computingPower Grid
0 likes · 6 min read
Inside xAI’s Grok 4: Massive Funding, Extreme Iteration, and Power Challenges
AntTech
AntTech
May 20, 2025 · Information Security

FAST and Neo: New Hardware Accelerators for Scalable Fully Homomorphic Encryption

The article reviews two recent ISCA 2025 papers—FAST and Neo—that introduce hardware and GPU‑based accelerators employing hoisting, KLSS, and Tensor Core optimizations to significantly boost the performance of fully homomorphic encryption workloads.

Cryptographic OptimizationFully Homomorphic EncryptionGPU computing
0 likes · 6 min read
FAST and Neo: New Hardware Accelerators for Scalable Fully Homomorphic Encryption
Architects' Tech Alliance
Architects' Tech Alliance
Dec 11, 2024 · Fundamentals

Unlocking GPU Computing: PCIe, NVLink, NVSwitch, and HBM Explained

This article breaks down the core components of high‑performance GPU servers—including PCIe switch chips, the evolution of NVLink from version 1.0 to 4.0, NVSwitch architecture, HBM memory tiers, and the nuances of bandwidth units—providing a comprehensive technical foundation for large‑scale model training.

GPU computingHBMHigh‑performance computing
0 likes · 10 min read
Unlocking GPU Computing: PCIe, NVLink, NVSwitch, and HBM Explained
IT Services Circle
IT Services Circle
Oct 23, 2024 · Fundamentals

World’s Largest Known Prime Discovered Using GPUs: 2^136279841−1

A former Nvidia engineer, working through the GIMPS distributed project and leveraging thousands of GPUs across dozens of data centers, confirmed that 2^136279841−1—a 41,024,320‑digit Mersenne prime—is the largest known prime ever found, surpassing the previous record by over 1.6 million digits.

GIMPSGPU computingMersenne prime
0 likes · 7 min read
World’s Largest Known Prime Discovered Using GPUs: 2^136279841−1
Baidu Geek Talk
Baidu Geek Talk
Jul 31, 2024 · Artificial Intelligence

Quantitative Analysis of Transformer Architecture and Llama Model Performance

This engineering‑focused document reviews transformer fundamentals, derives precise FLOP and memory formulas for attention and feed‑forward layers, defines the MFU performance metric, analyzes memory components and parallelism strategies, examines recent architecture variants such as MQA, GQA, sliding‑window attention and MoE, and provides practice problems applying these calculations.

AIGPU computingTransformer
0 likes · 30 min read
Quantitative Analysis of Transformer Architecture and Llama Model Performance
Alibaba Cloud Developer
Alibaba Cloud Developer
Jun 25, 2024 · Artificial Intelligence

Demystifying Large Language Models: From ChatGPT Basics to Future Impact

This article walks readers through the fundamentals of large language models—explaining ChatGPT's architecture, training pipelines, required GPU hardware, industry deployment models, societal implications, and future industry trends—offering a cohesive framework for both newcomers and professionals.

AI ImpactAI fundamentalsGPU computing
0 likes · 22 min read
Demystifying Large Language Models: From ChatGPT Basics to Future Impact
Architects' Tech Alliance
Architects' Tech Alliance
May 14, 2024 · Fundamentals

Fundamentals of GPU Computing: PCIe, NVLink, NVSwitch, and HBM

This article provides a comprehensive overview of the core components and terminology of large‑scale GPU computing, covering GPU server architecture, PCIe interconnects, NVLink generations, NVSwitch, high‑bandwidth memory (HBM), and bandwidth unit considerations for AI and HPC workloads.

AI hardwareGPU computingHBM
0 likes · 11 min read
Fundamentals of GPU Computing: PCIe, NVLink, NVSwitch, and HBM
Amap Tech
Amap Tech
May 11, 2023 · Artificial Intelligence

A 20‑Year Review of AI Infrastructure Milestones

Over the past two decades, AI infrastructure has evolved from early distributed storage and MapReduce to GPU programming, modern package managers, in‑memory processing, deep‑learning frameworks, parameter servers, AI compilers, synthetic data pipelines, open‑source model hubs, and today’s large‑scale Kubernetes‑based clusters, forming the essential foundation for every breakthrough.

AI CompilersAI InfrastructureBig Data
0 likes · 29 min read
A 20‑Year Review of AI Infrastructure Milestones
Alibaba Cloud Infrastructure
Alibaba Cloud Infrastructure
Mar 22, 2023 · Artificial Intelligence

CUTLASS Extreme Performance Optimization and Its Application in Alibaba's Recommendation System

At the GTC conference, the talk presents Alibaba Cloud’s heterogeneous computing platform and introduces the Open Deep Learning API (ODLA), then details how CUTLASS‑based operator fusion dramatically accelerates attention and MLP layers in large‑scale recommendation models, achieving multi‑fold performance gains in production.

CUTLASSDeep LearningGPU computing
0 likes · 5 min read
CUTLASS Extreme Performance Optimization and Its Application in Alibaba's Recommendation System
Tencent Cloud Developer
Tencent Cloud Developer
Sep 30, 2022 · Cloud Computing

Understanding GPU Computing and Cloud-Based GPU Solutions

The article explains how massive parallel pixel calculations demand GPUs, whose high cost and inflexibility are solved by Tencent Cloud’s elastic, virtualized GPU services—including vGPU, qGPU, TACO abstraction, and spot instances—delivering up to 16 EFLOPS for AI, scientific, graphics, and video workloads.

GPU computingTencent Cloudcloud GPU
0 likes · 5 min read
Understanding GPU Computing and Cloud-Based GPU Solutions
Baidu App Technology
Baidu App Technology
Jan 24, 2022 · Mobile Development

Introduction to OpenCL Programming for Mobile GPU Computing

As mobile CPUs plateau, developers increasingly use OpenCL to harness Android GPUs like Qualcomm Adreno and Huawei Mali for heterogeneous computing, leveraging its platform, execution, and memory models to write portable kernels—illustrated by a simple array‑addition example that demonstrates device initialization, kernel creation, buffer management, and parallel execution.

AndroidC programmingGPU computing
0 likes · 8 min read
Introduction to OpenCL Programming for Mobile GPU Computing
Tencent Advertising Technology
Tencent Advertising Technology
May 19, 2021 · Artificial Intelligence

Experience Sharing on Using Tencent TI-ONE Platform for Advertising Algorithm Competition

This article shares personal experiences and insights from using Tencent's TI-ONE machine learning platform in the 2020 Tencent Advertising Algorithm Competition, covering platform features, development modes, resource management, and lessons learned for future participants.

Advertising CompetitionGPU computingNotebook Mode
0 likes · 6 min read
Experience Sharing on Using Tencent TI-ONE Platform for Advertising Algorithm Competition
Didi Tech
Didi Tech
Apr 4, 2019 · Artificial Intelligence

DiDi Machine Learning Platform: From Workshop‑Style Production to Cloud‑Native Architecture

Since 2016 DiDi has evolved its machine‑learning platform from isolated, workshop‑style GPU servers to a cloud‑native, Kubernetes‑driven architecture that unifies resource management, introduces custom parameter‑server and serving frameworks, provides autotuning, external SaaS offerings such as Elastic Inference and JianShu, and aims for a 3.0 unified internal‑external AI marketplace.

AI InfrastructureGPU computingKubernetes
0 likes · 19 min read
DiDi Machine Learning Platform: From Workshop‑Style Production to Cloud‑Native Architecture
iQIYI Technical Product Team
iQIYI Technical Product Team
Jan 4, 2019 · Artificial Intelligence

Building a Deep Learning Training Platform on Cloud: Challenges, Runonce Service, and Storage Optimization

iQIYI built a cloud‑based deep‑learning training platform called Jarvis, replacing the initial Runonce service, by containerizing GPU tasks, adopting Ceph S3 storage with FUSE, optimizing data pipelines, and addressing compute, storage, and networking challenges to improve scalability and reduce GPU idle time.

AI trainingDeep LearningGPU computing
0 likes · 9 min read
Building a Deep Learning Training Platform on Cloud: Challenges, Runonce Service, and Storage Optimization
CoolHome R&D Department
CoolHome R&D Department
Dec 30, 2017 · Backend Development

Scaling KuJiaLe's ExaCloud: Inside the Distributed Rendering Architecture

This article chronicles the evolution of KuJiaLe's ExaCloud rendering platform from its 2013 GPU‑based prototype to a multi‑IDC, 2000‑node distributed system, detailing architectural redesigns, load‑balancing strategies, hybrid CPU/GPU processing, and operational lessons learned to achieve high‑throughput cloud rendering.

CPU renderingGPU computingbackend scaling
0 likes · 15 min read
Scaling KuJiaLe's ExaCloud: Inside the Distributed Rendering Architecture