Tagged articles

GPU computing

26 articles · Page 1 of 1

Apr 17, 2026 · Industry Insights

The 30‑Year Journey: From Parallel Computing to Modern GPU‑Powered AI

This article traces three decades of government‑funded research in parallel computing, graphics systems, and stream processing, showing how those advances migrated to companies like Nvidia, evolved into CUDA and other GPU technologies, and ultimately enabled today’s AI revolution.

AICUDAGPU computing

0 likes · 18 min read

The 30‑Year Journey: From Parallel Computing to Modern GPU‑Powered AI

TonyBai

Jan 21, 2026 · Artificial Intelligence

When Go Meets GPU: A Hands‑On Guide to Unlocking Thousand‑Fold Compute with CUDA

This article walks Go developers through the fundamentals of GPU architecture and CUDA, demonstrates a complete CGO‑based matrix‑multiplication project, offers performance‑tuning tips such as minimizing PCIe transfers and leveraging shared memory, and presents a PureGo alternative for seamless Go‑GPU integration.

CGOCUDAGPU computing

0 likes · 17 min read

When Go Meets GPU: A Hands‑On Guide to Unlocking Thousand‑Fold Compute with CUDA

Baidu Intelligent Cloud Tech Hub

Nov 7, 2025 · Artificial Intelligence

From Big Data to 30,000‑GPU Clusters: The Evolution of China’s AI Infrastructure

In a deep interview, Baidu AI Computing chief scientist Wang Yanpeng and host Koji trace China's internet infrastructure from the early big‑data era through cloud computing to today's AI boom, highlighting the pivotal role of compute power, GPU acceleration, data scaling, and Baidu's Baige platform in shaping the AI arms race.

AI InfrastructureBaidu BaigeCloud Computing

0 likes · 26 min read

From Big Data to 30,000‑GPU Clusters: The Evolution of China’s AI Infrastructure

Tencent Technical Engineering

Jul 18, 2025 · Artificial Intelligence

From CPUs to GPUs: How Traditional Backend Skills Power Modern AI Infrastructure

This article explores the evolution of AI infrastructure, comparing it with traditional backend systems, and details how hardware shifts to GPU-centric designs, software adaptations like deep learning frameworks, and engineering challenges in model training and inference can be addressed using established backend methodologies.

AI InfrastructureGPU computingInference Optimization

0 likes · 19 min read

From CPUs to GPUs: How Traditional Backend Skills Power Modern AI Infrastructure

Tencent Cloud Developer

Jul 17, 2025 · Artificial Intelligence

Why GPUs Are the New CPUs: Unpacking AI Infrastructure Challenges

This article explores how AI infrastructure has shifted from CPU‑centric designs to GPU‑driven architectures, detailing hardware evolution, software changes, and the engineering challenges of large‑model training and inference, while offering practical insights for traditional backend engineers transitioning to AI systems.

AI InfrastructureGPU computingModel Training

0 likes · 16 min read

Why GPUs Are the New CPUs: Unpacking AI Infrastructure Challenges

DataFunTalk

Jul 3, 2025 · Artificial Intelligence

Inside xAI’s Grok 4: Massive Funding, Extreme Iteration, and Power Challenges

Elon Musk’s xAI has quietly leaked its upcoming Grok 4 and Grok 4 Code models, skipped Grok 3.5, secured $10 billion in new financing, and is building massive GPU super‑computing facilities, while raising concerns about model bias, data integrity, and unprecedented power‑grid strain.

AI fundingGPU computingLarge Language Model

0 likes · 6 min read

Inside xAI’s Grok 4: Massive Funding, Extreme Iteration, and Power Challenges

AntTech

May 20, 2025 · Information Security

FAST and Neo: New Hardware Accelerators for Scalable Fully Homomorphic Encryption

The article reviews two recent ISCA 2025 papers—FAST and Neo—that introduce hardware and GPU‑based accelerators employing hoisting, KLSS, and Tensor Core optimizations to significantly boost the performance of fully homomorphic encryption workloads.

Cryptographic OptimizationFully Homomorphic EncryptionGPU computing

0 likes · 6 min read

FAST and Neo: New Hardware Accelerators for Scalable Fully Homomorphic Encryption

Architects' Tech Alliance

Dec 11, 2024 · Fundamentals

Unlocking GPU Computing: PCIe, NVLink, NVSwitch, and HBM Explained

This article breaks down the core components of high‑performance GPU servers—including PCIe switch chips, the evolution of NVLink from version 1.0 to 4.0, NVSwitch architecture, HBM memory tiers, and the nuances of bandwidth units—providing a comprehensive technical foundation for large‑scale model training.

GPU computingHBMHigh-performance computing

0 likes · 10 min read

Unlocking GPU Computing: PCIe, NVLink, NVSwitch, and HBM Explained

IT Services Circle

Oct 23, 2024 · Fundamentals

World’s Largest Known Prime Discovered Using GPUs: 2^136279841−1

A former Nvidia engineer, working through the GIMPS distributed project and leveraging thousands of GPUs across dozens of data centers, confirmed that 2^136279841−1—a 41,024,320‑digit Mersenne prime—is the largest known prime ever found, surpassing the previous record by over 1.6 million digits.

Distributed ComputingGIMPSGPU computing

0 likes · 7 min read

World’s Largest Known Prime Discovered Using GPUs: 2^136279841−1

Baidu Geek Talk

Jul 31, 2024 · Artificial Intelligence

Quantitative Analysis of Transformer Architecture and Llama Model Performance

This engineering‑focused document reviews transformer fundamentals, derives precise FLOP and memory formulas for attention and feed‑forward layers, defines the MFU performance metric, analyzes memory components and parallelism strategies, examines recent architecture variants such as MQA, GQA, sliding‑window attention and MoE, and provides practice problems applying these calculations.

AIGPU computingTransformer

0 likes · 30 min read

Quantitative Analysis of Transformer Architecture and Llama Model Performance

Alibaba Cloud Developer

Jun 25, 2024 · Artificial Intelligence

Demystifying Large Language Models: From ChatGPT Basics to Future Impact

This article walks readers through the fundamentals of large language models—explaining ChatGPT's architecture, training pipelines, required GPU hardware, industry deployment models, societal implications, and future industry trends—offering a cohesive framework for both newcomers and professionals.

AI FundamentalsAI impactCloud AI services

0 likes · 22 min read

Demystifying Large Language Models: From ChatGPT Basics to Future Impact

Architects' Tech Alliance

May 14, 2024 · Fundamentals

Fundamentals of GPU Computing: PCIe, NVLink, NVSwitch, and HBM

This article provides a comprehensive overview of the core components and terminology of large‑scale GPU computing, covering GPU server architecture, PCIe interconnects, NVLink generations, NVSwitch, high‑bandwidth memory (HBM), and bandwidth unit considerations for AI and HPC workloads.

AI hardwareGPU computingHBM

0 likes · 11 min read

Fundamentals of GPU Computing: PCIe, NVLink, NVSwitch, and HBM

Architects' Tech Alliance

Apr 13, 2024 · Industry Insights

Why AI Servers Are Poised for Explosive Growth: Trends, Architecture, and Demand Forecast

The article analyzes how the surge in AIGC and large language models is reshaping the AI server market, detailing hardware composition, the rise of heterogeneous computing, GPU advantages, demand calculations for models like GPT‑3, and the competitive landscape driving rapid industry growth.

AI serversAIGCGPU computing

0 likes · 16 min read

Why AI Servers Are Poised for Explosive Growth: Trends, Architecture, and Demand Forecast

Amap Tech

May 11, 2023 · Artificial Intelligence

A 20‑Year Review of AI Infrastructure Milestones

Over the past two decades, AI infrastructure has evolved from early distributed storage and MapReduce to GPU programming, modern package managers, in‑memory processing, deep‑learning frameworks, parameter servers, AI compilers, synthetic data pipelines, open‑source model hubs, and today’s large‑scale Kubernetes‑based clusters, forming the essential foundation for every breakthrough.

AI CompilersAI InfrastructureBig Data

0 likes · 29 min read

A 20‑Year Review of AI Infrastructure Milestones

Laravel Tech Community

Apr 11, 2023 · Frontend Development

Chrome Announces WebGPU Implementation in Chrome 113 Beta with Cross‑Platform Support

Chrome’s team has released the first implementation of the WebGPU API in Chrome 113 Beta, enabling high‑performance 3D graphics and data‑parallel compute on supported ChromeOS, macOS, and Windows platforms, with broader platform support and library integrations slated for later this year.

ChromeGPU computingGraphics API

0 likes · 4 min read

Chrome Announces WebGPU Implementation in Chrome 113 Beta with Cross‑Platform Support

Alibaba Cloud Infrastructure

Mar 22, 2023 · Artificial Intelligence

CUTLASS Extreme Performance Optimization and Its Application in Alibaba's Recommendation System

At the GTC conference, the talk presents Alibaba Cloud’s heterogeneous computing platform and introduces the Open Deep Learning API (ODLA), then details how CUTLASS‑based operator fusion dramatically accelerates attention and MLP layers in large‑scale recommendation models, achieving multi‑fold performance gains in production.

CutlassGPU computingPerformance Optimization

0 likes · 5 min read

CUTLASS Extreme Performance Optimization and Its Application in Alibaba's Recommendation System

Tencent Cloud Developer

Sep 30, 2022 · Cloud Computing

Understanding GPU Computing and Cloud-Based GPU Solutions

The article explains how massive parallel pixel calculations demand GPUs, whose high cost and inflexibility are solved by Tencent Cloud’s elastic, virtualized GPU services—including vGPU, qGPU, TACO abstraction, and spot instances—delivering up to 16 EFLOPS for AI, scientific, graphics, and video workloads.

GPU computingTencent Cloudcloud GPU

0 likes · 5 min read

Understanding GPU Computing and Cloud-Based GPU Solutions

Baidu Geek Talk

May 18, 2022 · Mobile Development

Unlock Mobile GPU Power: A Hands‑On Guide to OpenCL Programming on Android

This article introduces the fundamentals of heterogeneous computing on mobile GPUs, explains OpenCL concepts and its programming model, and provides a step‑by‑step example of adding two arrays with complete OpenCL code for Android devices.

AndroidC#GPU computing

0 likes · 9 min read

Unlock Mobile GPU Power: A Hands‑On Guide to OpenCL Programming on Android

Baidu App Technology

Jan 24, 2022 · Mobile Development

Introduction to OpenCL Programming for Mobile GPU Computing

As mobile CPUs plateau, developers increasingly use OpenCL to harness Android GPUs like Qualcomm Adreno and Huawei Mali for heterogeneous computing, leveraging its platform, execution, and memory models to write portable kernels—illustrated by a simple array‑addition example that demonstrates device initialization, kernel creation, buffer management, and parallel execution.

AndroidC ProgrammingGPU computing

0 likes · 8 min read

Introduction to OpenCL Programming for Mobile GPU Computing

Tencent Advertising Technology

May 19, 2021 · Artificial Intelligence

Experience Sharing on Using Tencent TI-ONE Platform for Advertising Algorithm Competition

This article shares personal experiences and insights from using Tencent's TI-ONE machine learning platform in the 2020 Tencent Advertising Algorithm Competition, covering platform features, development modes, resource management, and lessons learned for future participants.

Advertising CompetitionGPU computingNotebook Mode

0 likes · 6 min read

Experience Sharing on Using Tencent TI-ONE Platform for Advertising Algorithm Competition

Didi Tech

Apr 4, 2019 · Artificial Intelligence

DiDi Machine Learning Platform: From Workshop‑Style Production to Cloud‑Native Architecture

Since 2016 DiDi has evolved its machine‑learning platform from isolated, workshop‑style GPU servers to a cloud‑native, Kubernetes‑driven architecture that unifies resource management, introduces custom parameter‑server and serving frameworks, provides autotuning, external SaaS offerings such as Elastic Inference and JianShu, and aims for a 3.0 unified internal‑external AI marketplace.

AI InfrastructureGPU computingPlatform Engineering

0 likes · 19 min read

DiDi Machine Learning Platform: From Workshop‑Style Production to Cloud‑Native Architecture

iQIYI Technical Product Team

Jan 4, 2019 · Artificial Intelligence

Building a Deep Learning Training Platform on Cloud: Challenges, Runonce Service, and Storage Optimization

iQIYI built a cloud‑based deep‑learning training platform called Jarvis, replacing the initial Runonce service, by containerizing GPU tasks, adopting Ceph S3 storage with FUSE, optimizing data pipelines, and addressing compute, storage, and networking challenges to improve scalability and reduce GPU idle time.

AI trainingGPU computingStorage Optimization

0 likes · 9 min read

Building a Deep Learning Training Platform on Cloud: Challenges, Runonce Service, and Storage Optimization

Architects' Tech Alliance

Nov 1, 2018 · Fundamentals

Why GPUs Matter: From Basics to Virtualization in Modern Computing

This article explains what a GPU (Graphics Processing Unit) is, why it differs from a CPU, how it is used through graphics and compute APIs, and explores GPU virtualization techniques such as virtual GPUs, passthrough, and vGPU architectures.

CUDAGPUGPU computing

0 likes · 17 min read

Why GPUs Matter: From Basics to Virtualization in Modern Computing

JD Retail Technology

Jul 30, 2018 · Cloud Computing

Fibonacci: JD.com's Enterprise-Grade Serverless Function Platform

Fibonacci is JD.com's enterprise-grade serverless function platform combining container technology, serverless architecture, and event-driven mechanisms to provide high-availability, high-efficiency, and high-usability FaaS solutions.

Cloud ComputingContainer TechnologyFaaS

0 likes · 8 min read

Fibonacci: JD.com's Enterprise-Grade Serverless Function Platform

Architects' Tech Alliance

Jan 6, 2018 · Fundamentals

Survey of HPC Applications Supporting GPU Computing and Their Adoption Across Domains

The article surveys how NVIDIA's GPU ecosystem and the Intersect360 study reveal that among the 50 most common high‑performance computing applications, 34 already support GPU acceleration, highlighting the growing importance of GPU computing across scientific, engineering, and business fields.

CUDAGPU computingHPC

0 likes · 7 min read

Survey of HPC Applications Supporting GPU Computing and Their Adoption Across Domains

CoolHome R&D Department

Dec 30, 2017 · Backend Development

Scaling KuJiaLe's ExaCloud: Inside the Distributed Rendering Architecture

This article chronicles the evolution of KuJiaLe's ExaCloud rendering platform from its 2013 GPU‑based prototype to a multi‑IDC, 2000‑node distributed system, detailing architectural redesigns, load‑balancing strategies, hybrid CPU/GPU processing, and operational lessons learned to achieve high‑throughput cloud rendering.

CPU renderingGPU computingbackend scaling

0 likes · 15 min read

Scaling KuJiaLe's ExaCloud: Inside the Distributed Rendering Architecture