Tag

GPU computing

0 views collected around this technical thread.

AntTech
AntTech
May 20, 2025 · Information Security

FAST and Neo: New Hardware Accelerators for Scalable Fully Homomorphic Encryption

The article reviews two recent ISCA 2025 papers—FAST and Neo—that introduce hardware and GPU‑based accelerators employing hoisting, KLSS, and Tensor Core optimizations to significantly boost the performance of fully homomorphic encryption workloads.

Cryptographic OptimizationFully Homomorphic EncryptionGPU computing
0 likes · 6 min read
FAST and Neo: New Hardware Accelerators for Scalable Fully Homomorphic Encryption
IT Services Circle
IT Services Circle
Oct 23, 2024 · Fundamentals

World’s Largest Known Prime Discovered Using GPUs: 2^136279841−1

A former Nvidia engineer, working through the GIMPS distributed project and leveraging thousands of GPUs across dozens of data centers, confirmed that 2^136279841−1—a 41,024,320‑digit Mersenne prime—is the largest known prime ever found, surpassing the previous record by over 1.6 million digits.

GIMPSGPU computingMersenne prime
0 likes · 7 min read
World’s Largest Known Prime Discovered Using GPUs: 2^136279841−1
Baidu Geek Talk
Baidu Geek Talk
Jul 31, 2024 · Artificial Intelligence

Quantitative Analysis of Transformer Architecture and Llama Model Performance

This engineering‑focused document reviews transformer fundamentals, derives precise FLOP and memory formulas for attention and feed‑forward layers, defines the MFU performance metric, analyzes memory components and parallelism strategies, examines recent architecture variants such as MQA, GQA, sliding‑window attention and MoE, and provides practice problems applying these calculations.

AIGPU computingLarge Language Models
0 likes · 30 min read
Quantitative Analysis of Transformer Architecture and Llama Model Performance
Architects' Tech Alliance
Architects' Tech Alliance
May 14, 2024 · Fundamentals

Fundamentals of GPU Computing: PCIe, NVLink, NVSwitch, and HBM

This article provides a comprehensive overview of the core components and terminology of large‑scale GPU computing, covering GPU server architecture, PCIe interconnects, NVLink generations, NVSwitch, high‑bandwidth memory (HBM), and bandwidth unit considerations for AI and HPC workloads.

AI hardwareGPU computingHBM
0 likes · 11 min read
Fundamentals of GPU Computing: PCIe, NVLink, NVSwitch, and HBM
Amap Tech
Amap Tech
May 11, 2023 · Artificial Intelligence

A 20‑Year Review of AI Infrastructure Milestones

Over the past two decades, AI infrastructure has evolved from early distributed storage and MapReduce to GPU programming, modern package managers, in‑memory processing, deep‑learning frameworks, parameter servers, AI compilers, synthetic data pipelines, open‑source model hubs, and today’s large‑scale Kubernetes‑based clusters, forming the essential foundation for every breakthrough.

AI CompilersAI infrastructureBig Data
0 likes · 29 min read
A 20‑Year Review of AI Infrastructure Milestones
Laravel Tech Community
Laravel Tech Community
Apr 11, 2023 · Frontend Development

Chrome Announces WebGPU Implementation in Chrome 113 Beta with Cross‑Platform Support

Chrome’s team has released the first implementation of the WebGPU API in Chrome 113 Beta, enabling high‑performance 3D graphics and data‑parallel compute on supported ChromeOS, macOS, and Windows platforms, with broader platform support and library integrations slated for later this year.

ChromeGPU computingGraphics API
0 likes · 4 min read
Chrome Announces WebGPU Implementation in Chrome 113 Beta with Cross‑Platform Support
Alibaba Cloud Infrastructure
Alibaba Cloud Infrastructure
Mar 22, 2023 · Artificial Intelligence

CUTLASS Extreme Performance Optimization and Its Application in Alibaba's Recommendation System

At the GTC conference, the talk presents Alibaba Cloud’s heterogeneous computing platform and introduces the Open Deep Learning API (ODLA), then details how CUTLASS‑based operator fusion dramatically accelerates attention and MLP layers in large‑scale recommendation models, achieving multi‑fold performance gains in production.

CUTLASSGPU computingRecommendation systems
0 likes · 5 min read
CUTLASS Extreme Performance Optimization and Its Application in Alibaba's Recommendation System
Tencent Cloud Developer
Tencent Cloud Developer
Sep 30, 2022 · Cloud Computing

Understanding GPU Computing and Cloud-Based GPU Solutions

The article explains how massive parallel pixel calculations demand GPUs, whose high cost and inflexibility are solved by Tencent Cloud’s elastic, virtualized GPU services—including vGPU, qGPU, TACO abstraction, and spot instances—delivering up to 16 EFLOPS for AI, scientific, graphics, and video workloads.

Cloud GPUGPU computingParallel Computing
0 likes · 5 min read
Understanding GPU Computing and Cloud-Based GPU Solutions
Baidu App Technology
Baidu App Technology
Jan 24, 2022 · Mobile Development

Introduction to OpenCL Programming for Mobile GPU Computing

As mobile CPUs plateau, developers increasingly use OpenCL to harness Android GPUs like Qualcomm Adreno and Huawei Mali for heterogeneous computing, leveraging its platform, execution, and memory models to write portable kernels—illustrated by a simple array‑addition example that demonstrates device initialization, kernel creation, buffer management, and parallel execution.

AndroidC ProgrammingGPU computing
0 likes · 8 min read
Introduction to OpenCL Programming for Mobile GPU Computing
Tencent Advertising Technology
Tencent Advertising Technology
May 19, 2021 · Artificial Intelligence

Experience Sharing on Using Tencent TI-ONE Platform for Advertising Algorithm Competition

This article shares personal experiences and insights from using Tencent's TI-ONE machine learning platform in the 2020 Tencent Advertising Algorithm Competition, covering platform features, development modes, resource management, and lessons learned for future participants.

Advertising CompetitionGPU computingNotebook Mode
0 likes · 6 min read
Experience Sharing on Using Tencent TI-ONE Platform for Advertising Algorithm Competition
Didi Tech
Didi Tech
Apr 4, 2019 · Artificial Intelligence

DiDi Machine Learning Platform: From Workshop‑Style Production to Cloud‑Native Architecture

Since 2016 DiDi has evolved its machine‑learning platform from isolated, workshop‑style GPU servers to a cloud‑native, Kubernetes‑driven architecture that unifies resource management, introduces custom parameter‑server and serving frameworks, provides autotuning, external SaaS offerings such as Elastic Inference and JianShu, and aims for a 3.0 unified internal‑external AI marketplace.

AI infrastructureGPU computingKubernetes
0 likes · 19 min read
DiDi Machine Learning Platform: From Workshop‑Style Production to Cloud‑Native Architecture
iQIYI Technical Product Team
iQIYI Technical Product Team
Jan 4, 2019 · Artificial Intelligence

Building a Deep Learning Training Platform on Cloud: Challenges, Runonce Service, and Storage Optimization

iQIYI built a cloud‑based deep‑learning training platform called Jarvis, replacing the initial Runonce service, by containerizing GPU tasks, adopting Ceph S3 storage with FUSE, optimizing data pipelines, and addressing compute, storage, and networking challenges to improve scalability and reduce GPU idle time.

AI trainingCloud PlatformContainerization
0 likes · 9 min read
Building a Deep Learning Training Platform on Cloud: Challenges, Runonce Service, and Storage Optimization
JD Retail Technology
JD Retail Technology
Jul 30, 2018 · Cloud Computing

Fibonacci: JD.com's Enterprise-Grade Serverless Function Platform

Fibonacci is JD.com's enterprise-grade serverless function platform combining container technology, serverless architecture, and event-driven mechanisms to provide high-availability, high-efficiency, and high-usability FaaS solutions.

Container TechnologyFaaSGPU computing
0 likes · 8 min read
Fibonacci: JD.com's Enterprise-Grade Serverless Function Platform
Architects' Tech Alliance
Architects' Tech Alliance
Jan 6, 2018 · Fundamentals

Survey of HPC Applications Supporting GPU Computing and Their Adoption Across Domains

The article surveys how NVIDIA's GPU ecosystem and the Intersect360 study reveal that among the 50 most common high‑performance computing applications, 34 already support GPU acceleration, highlighting the growing importance of GPU computing across scientific, engineering, and business fields.

Artificial IntelligenceCUDAGPU computing
0 likes · 7 min read
Survey of HPC Applications Supporting GPU Computing and Their Adoption Across Domains