Tagged articles
48 articles
Page 1 of 1
DataFunTalk
DataFunTalk
May 14, 2026 · Industry Insights

OpenAI’s Rapid Sprint: GPT‑5.6 Leaked and a $400 Subsidy to Oust Claude Code

Within three weeks of GPT‑5.5’s launch, internal code for GPT‑5.6 surfaced, prompting OpenAI to unveil a 2‑3× “ultrafast” mode and a two‑month free Codex offer worth $400 to lure Claude Code users, sparking a high‑speed AI competition with Anthropic’s Opus 4.7 Fast and highlighting a self‑reinforcing acceleration loop toward ASI.

AI accelerationClaude CodeCodex
0 likes · 7 min read
OpenAI’s Rapid Sprint: GPT‑5.6 Leaked and a $400 Subsidy to Oust Claude Code
HyperAI Super Neural
HyperAI Super Neural
Apr 7, 2026 · Artificial Intelligence

MIT’s DRiffusion Achieves 1.4–3.7× Faster Diffusion Sampling via Draft‑and‑Refine Parallelism

MIT researchers introduce DRiffusion, a draft‑and‑refine parallel framework that uncovers intrinsic parallelism in diffusion models, delivering 1.4–3.7× speedup on three GPUs while preserving near‑lossless image quality across Stable Diffusion 2.1, SDXL and SD3 evaluated on MS‑COCO.

AI accelerationDRiffusionMS-COCO
0 likes · 14 min read
MIT’s DRiffusion Achieves 1.4–3.7× Faster Diffusion Sampling via Draft‑and‑Refine Parallelism
Baidu Intelligent Cloud Tech Hub
Baidu Intelligent Cloud Tech Hub
Feb 12, 2026 · Artificial Intelligence

Deploying GLM-5 on Baidu Kunlun P800 XPU with vLLM‑Kunlun Plugin

This article explains how Baidu's new GLM-5 large model is adapted to the Kunlun P800 XPU, detailing the async reinforcement learning framework Slime, optimization techniques like INT8 quantization and tensor‑parallelism, and provides step‑by‑step deployment commands using the open‑source vLLM‑Kunlun plugin.

AI accelerationGLM-5INT8 Quantization
0 likes · 6 min read
Deploying GLM-5 on Baidu Kunlun P800 XPU with vLLM‑Kunlun Plugin
Baidu Intelligent Cloud Tech Hub
Baidu Intelligent Cloud Tech Hub
Nov 4, 2025 · Artificial Intelligence

How Baidu’s Baige Accelerates Multimodal Video Training with Context Parallelism

Baidu Baige’s enhanced veRL framework dramatically boosts video frame rates and resolution limits, cuts training time, reduces memory usage, and improves model accuracy by leveraging context parallelism and optimized attention on Ampere GPUs for multimodal mixed‑training scenarios.

AI accelerationContext ParallelismMultimodal Training
0 likes · 6 min read
How Baidu’s Baige Accelerates Multimodal Video Training with Context Parallelism
Architects' Tech Alliance
Architects' Tech Alliance
Oct 11, 2025 · Artificial Intelligence

Why NVLink Beats PCIe for AI: Deep Dive into GPU Interconnect Technologies

This article examines the architectural differences between Scale‑Out and Scale‑Up networking, compares PCIe, NVLink, UALink, Infiniband and RoCE, and explains why high‑bandwidth, low‑latency GPU interconnects like NVLink are essential for modern AI and HPC workloads.

AI accelerationGPU interconnectHigh‑performance computing
0 likes · 27 min read
Why NVLink Beats PCIe for AI: Deep Dive into GPU Interconnect Technologies
Architects' Tech Alliance
Architects' Tech Alliance
Sep 14, 2025 · Artificial Intelligence

Why Nvidia’s Blackwell GPUs Are Redefining AI Performance

The article analyzes Nvidia's 2023 Blackwell GPU series and GB200 NVL72 architecture, detailing their advanced 3‑4nm manufacturing, redesigned CUDA cores, next‑gen ray‑tracing and DLSS upgrades, massive compute and memory bandwidth gains, NVLink Gen5 improvements, and the diverse GB200 product configurations for high‑performance AI workloads.

AI accelerationBlackwell GPUGPU architecture
0 likes · 7 min read
Why Nvidia’s Blackwell GPUs Are Redefining AI Performance
Xiaohongshu Tech REDtech
Xiaohongshu Tech REDtech
Aug 19, 2025 · Artificial Intelligence

How Single Trajectory Distillation Boosts Diffusion Model Speed and Style Quality

The paper introduces Single Trajectory Distillation (STD), a novel training framework that aligns full PF‑ODE trajectories from a fixed noisy state, uses a Trajectory Bank to cut training cost, and adds an Asymmetric Adversarial Loss to markedly improve style consistency and aesthetic quality while accelerating image and video style‑transfer diffusion models.

AI accelerationStyle Transferconsistency models
0 likes · 14 min read
How Single Trajectory Distillation Boosts Diffusion Model Speed and Style Quality
Architects' Tech Alliance
Architects' Tech Alliance
Jul 9, 2025 · Fundamentals

How HBM5’s 3D Near‑Memory Architecture Revolutionizes AI and HPC Performance

HBM5 introduces a 3D near‑memory computing architecture that vertically stacks DRAM dies and integrates compute units within the memory stack, dramatically boosting bandwidth, reducing data‑movement power, and delivering significant performance and energy‑efficiency gains for AI, high‑performance computing, and data‑center workloads.

AI accelerationHBM5Near-Memory Computing
0 likes · 8 min read
How HBM5’s 3D Near‑Memory Architecture Revolutionizes AI and HPC Performance
Architects' Tech Alliance
Architects' Tech Alliance
Jul 8, 2025 · Fundamentals

Why Modern Data Center Switches Are the Backbone of AI Scaling

This article explains how data‑center switches are classified, the key components and performance metrics of Ethernet switch chips, market growth trends, the shift from OEO to full‑optical OCS designs, and how RDMA technologies like InfiniBand and RoCEv2 enable the low‑latency networking essential for large‑scale AI training.

AI accelerationData Center NetworkingRDMA
0 likes · 12 min read
Why Modern Data Center Switches Are the Backbone of AI Scaling
Architects' Tech Alliance
Architects' Tech Alliance
Jun 9, 2025 · Artificial Intelligence

What Makes Nvidia’s Blackwell GPUs a Game-Changer for AI Performance?

In March 2024 Nvidia unveiled the Blackwell GPU family and the GB200 NVL72 architecture, featuring 3‑4 nm processes, redesigned CUDA cores, next‑gen ray‑tracing, upgraded DLSS, massive FP16/FP8 compute gains, 8 TB/s memory bandwidth, and NVLink Gen5, while also presenting complex power, cooling, and packaging challenges for large‑scale AI deployments.

AI accelerationBlackwellGPU
0 likes · 6 min read
What Makes Nvidia’s Blackwell GPUs a Game-Changer for AI Performance?
Architects' Tech Alliance
Architects' Tech Alliance
Jun 6, 2025 · Artificial Intelligence

B30 vs H20: Which NVIDIA GPU Wins for AI Workloads and Budgets?

This article compares NVIDIA’s China‑specific B30 and high‑end H20 GPUs, detailing their CPU/CPU architecture updates, memory technologies, architectural differences, performance metrics, power and cooling characteristics, and price positioning, to help enterprises and developers choose the most suitable accelerator for AI and deep‑learning tasks.

AI accelerationB30GPU
0 likes · 13 min read
B30 vs H20: Which NVIDIA GPU Wins for AI Workloads and Budgets?
Architects' Tech Alliance
Architects' Tech Alliance
Apr 28, 2025 · Artificial Intelligence

NVLink High‑Speed Interconnect: Architecture, Evolution, and Performance

NVLink, NVIDIA's high‑bandwidth interconnect introduced with the P100 GPU, replaces PCIe by offering significantly higher data rates and lower latency for GPU‑GPU and GPU‑CPU communication, and has evolved through multiple generations to support modern AI and high‑performance computing workloads.

AI accelerationGPU interconnectNVLink
0 likes · 9 min read
NVLink High‑Speed Interconnect: Architecture, Evolution, and Performance
AntTech
AntTech
Mar 19, 2025 · Artificial Intelligence

Award-Winning HPCA 2025 Papers on Near‑DRAM Processing (UniNDP) and GPU‑Accelerated Fully Homomorphic Encryption (WarpDrive)

At HPCA 2025, two standout papers—UniNDP, a unified compilation and simulation tool for near‑DRAM processing architectures, and WarpDrive, a GPU‑based fully homomorphic encryption accelerator leveraging Tensor and CUDA cores—demonstrate significant performance gains for AI workloads and privacy‑preserving computation.

AI accelerationFully Homomorphic EncryptionGPU
0 likes · 5 min read
Award-Winning HPCA 2025 Papers on Near‑DRAM Processing (UniNDP) and GPU‑Accelerated Fully Homomorphic Encryption (WarpDrive)
Architects' Tech Alliance
Architects' Tech Alliance
Mar 5, 2025 · Industry Insights

How DeepSeek’s Open‑Source Tools Are Supercharging AI Model Performance

DeepSeek’s Open‑Source Week unveiled five high‑performance projects—FlashMLA, DeepEP, DeepGEMM, DualPipe/EPLB, and 3FS—each delivering novel GPU optimizations, communication kernels, matrix‑multiplication libraries, parallelism strategies, and a distributed file system that together dramatically accelerate large‑scale AI training and inference workloads.

AI accelerationDeepSeekDistributed Training
0 likes · 9 min read
How DeepSeek’s Open‑Source Tools Are Supercharging AI Model Performance
DataFunTalk
DataFunTalk
Feb 26, 2025 · Artificial Intelligence

DeepGEMM: An Open‑Source FP8 GEMM Library for Efficient AI Model Training and Inference

DeepGEMM is an open‑source FP8‑precision GEMM library that delivers up to 1350 TFLOPS on NVIDIA Hopper GPUs, offering JIT‑compiled, lightweight code (~300 lines) for dense and MoE matrix multiplication, with easy deployment, configurable environment variables, and performance advantages over CUTLASS for large AI models.

AI accelerationDeepGEMMFP8
0 likes · 7 min read
DeepGEMM: An Open‑Source FP8 GEMM Library for Efficient AI Model Training and Inference
Baidu Geek Talk
Baidu Geek Talk
Jan 15, 2025 · Artificial Intelligence

Understanding Large Model Inference Engines and Reducing Token Interval (TPOT)

Large‑model inference engines convert prompts into responses via a Prefill stage and an autoregressive Decoder, measured by TTFT and TPOT, and Baidu’s AIAK suite improves TPOT by separating tokenization, using static slot scheduling, and asynchronous execution, cutting token‑interval latency from ~35 ms to ~14 ms and boosting GPU utilization to about 75 % while also leveraging quantization and speculative execution for higher throughput.

AI accelerationGPU utilizationTPOT
0 likes · 10 min read
Understanding Large Model Inference Engines and Reducing Token Interval (TPOT)
DataFunSummit
DataFunSummit
Dec 4, 2024 · Artificial Intelligence

Accelerating Large Language Model Inference with the YiNian LLM Framework

This article presents the YiNian LLM framework, detailing how KVCache, prefill/decoding separation, continuous batching, PageAttention, and multi‑hardware scheduling are used to speed up large language model inference while managing GPU memory and latency.

AI accelerationContinuous BatchingGPU
0 likes · 20 min read
Accelerating Large Language Model Inference with the YiNian LLM Framework
Xiaohongshu Tech REDtech
Xiaohongshu Tech REDtech
Sep 19, 2024 · Artificial Intelligence

Target-Driven Distillation (TDD): A Multi‑Goal Distillation Method for Accelerating Diffusion Models

Target‑Driven Distillation (TDD) is a multi‑goal distillation method that flexibly selects short‑range target steps and decouples guidance during training, enabling 4‑to‑8‑step diffusion generation that preserves high‑resolution detail, works with LoRA, ControlNet, InstantID, and outperforms existing consistency distillation techniques in speed and quality.

AI accelerationDistillationdiffusion models
0 likes · 9 min read
Target-Driven Distillation (TDD): A Multi‑Goal Distillation Method for Accelerating Diffusion Models
ByteDance SYS Tech
ByteDance SYS Tech
Aug 12, 2024 · Cloud Native

How mGPU Enables Efficient GPU Sharing for AI Workloads

This article explains the mGPU solution that virtualizes NVIDIA GPUs for containers, detailing its driver architecture, compute and memory isolation mechanisms, performance benchmarks on ResNet‑50 inference, and how it boosts GPU utilization by over 50% for AI and high‑performance computing tasks.

AI accelerationCloud NativeGPU Sharing
0 likes · 10 min read
How mGPU Enables Efficient GPU Sharing for AI Workloads
Open Source Linux
Open Source Linux
Jul 16, 2024 · Artificial Intelligence

Can Quantum Computing Break AI’s Compute Barrier? Insights & Market Outlook

Quantum computing promises exponential parallelism and lower energy consumption, offering a disruptive solution to AI’s compute limits; with mature hardware‑software infrastructure, diverse technology roadmaps, emerging cloud platforms, and a market projected to grow from $4.7 billion in 2023 to over $8 trillion by 2035, it is poised to transform sectors such as finance, chemicals, and life sciences.

AI accelerationQuantum Computingmarket forecast
0 likes · 4 min read
Can Quantum Computing Break AI’s Compute Barrier? Insights & Market Outlook
Rare Earth Juejin Tech Community
Rare Earth Juejin Tech Community
May 1, 2024 · Artificial Intelligence

Hyper‑SD: Trajectory‑Segmented Consistency Model for Accelerating Diffusion Image Generation

Hyper‑SD introduces a trajectory‑segmented consistency distillation framework that combines trajectory‑preserving and trajectory‑reconstruction strategies, integrates human‑feedback learning and score distillation, and achieves state‑of‑the‑art low‑step image generation performance on both SD1.5 and SDXL models.

AI accelerationRLHFdiffusion models
0 likes · 10 min read
Hyper‑SD: Trajectory‑Segmented Consistency Model for Accelerating Diffusion Image Generation
Sohu Tech Products
Sohu Tech Products
Mar 27, 2024 · Artificial Intelligence

NVIDIA NeMo Framework, TensorRT‑LLM, and RAG for Large Language Model Solutions

NVIDIA’s comprehensive LLM ecosystem combines the full‑stack NeMo Framework for data curation, distributed training, fine‑tuning, inference acceleration with TensorRT‑LLM and Triton, plus Retrieval‑Augmented Generation and Guardrails, enabling efficient, low‑latency, knowledge‑grounded model deployment across clusters.

AI accelerationModel TrainingNeMo Framework
0 likes · 16 min read
NVIDIA NeMo Framework, TensorRT‑LLM, and RAG for Large Language Model Solutions
Volcano Engine Developer Services
Volcano Engine Developer Services
Mar 7, 2024 · Artificial Intelligence

How SDXL‑Lightning Generates High‑Quality Images in Just 2 Steps

SDXL‑Lightning, a new diffusion‑based text‑to‑image model from ByteDance, uses Progressive Adversarial Distillation to cut inference steps to as few as 2 while maintaining high resolution and fidelity, offering ten‑fold speed gains, open‑source access, and compatibility with SDXL, ControlNet, and ComfyUI.

AI accelerationdiffusionmodel distillation
0 likes · 8 min read
How SDXL‑Lightning Generates High‑Quality Images in Just 2 Steps
Alibaba Cloud Big Data AI Platform
Alibaba Cloud Big Data AI Platform
Feb 23, 2024 · Artificial Intelligence

How PAI‑TorchAcc Supercharges Large‑Model Training on Alibaba Cloud

PAI‑TorchAcc, an Alibaba Cloud AI platform accelerator, offers a seamless PyTorch interface that integrates HuggingFace models and employs LazyTensor‑based static graph conversion, multi‑strategy distributed training, and extensive GPU optimizations to dramatically boost throughput for 1B‑175B parameter models, surpassing PyTorch native and Megatron‑LM performance.

AI accelerationAlibaba CloudGPU Optimization
0 likes · 13 min read
How PAI‑TorchAcc Supercharges Large‑Model Training on Alibaba Cloud
AntTech
AntTech
Jan 9, 2024 · Artificial Intelligence

ATorch: Ant Group’s Open‑Source Distributed Training Acceleration Library for Large‑Scale AI Models

Ant Group’s newly open‑sourced ATorch library extends PyTorch with a layered architecture and automated resource‑aware strategies, boosting large‑model training efficiency up to 60% utilization, enhancing stability, and delivering significant throughput gains across multi‑node, multi‑GPU deployments.

AI accelerationDistributed TrainingPyTorch
0 likes · 6 min read
ATorch: Ant Group’s Open‑Source Distributed Training Acceleration Library for Large‑Scale AI Models
Alibaba Cloud Native
Alibaba Cloud Native
Dec 30, 2023 · Artificial Intelligence

How to Accelerate Stable Diffusion with TensorRT on Alibaba Cloud ACK

This guide explains how to set up Alibaba Cloud's ACK environment, install the Cloud Native AI Suite, configure TensorRT, and run Stable Diffusion with dramatically reduced latency and memory usage, including detailed commands, performance metrics, and reproducible code snippets.

AI accelerationGPU inferenceStable Diffusion
0 likes · 7 min read
How to Accelerate Stable Diffusion with TensorRT on Alibaba Cloud ACK
Architects' Tech Alliance
Architects' Tech Alliance
Dec 23, 2023 · Artificial Intelligence

Future Development Paths of Computing Power Technology (2023): Chip Architecture, Near‑Memory Computing, and Distributed xPU Systems

The article outlines the accelerating demand for high‑performance computing driven by AI, AR/VR, biotech and other workloads, examines the limits of Moore's law, and presents emerging solutions such as advanced chip architectures, chiplet integration, near‑memory/in‑memory computing, and distributed xPU‑based systems for scalable, efficient compute.

AI accelerationChipletNear-Memory Computing
0 likes · 11 min read
Future Development Paths of Computing Power Technology (2023): Chip Architecture, Near‑Memory Computing, and Distributed xPU Systems
Architects' Tech Alliance
Architects' Tech Alliance
Sep 11, 2023 · Artificial Intelligence

Open Acceleration Specification AI Server Design Guide (2023): Architecture, OAM Modules, UBB Board, and System Design

The 2023 Open Acceleration Specification AI Server Design Guide details the hardware architecture, OAM module and UBB board specifications, cooling, management, fault diagnosis, and software platform needed to build high‑performance, scalable AI compute clusters for large‑model training.

AI accelerationOAMUBB board
0 likes · 10 min read
Open Acceleration Specification AI Server Design Guide (2023): Architecture, OAM Modules, UBB Board, and System Design
Architects' Tech Alliance
Architects' Tech Alliance
Sep 4, 2023 · Artificial Intelligence

Overview of AI Chip Types, Architectures, and Market Trends

The article explains the various AI‑capable chips such as CPUs, GPUs, FPGAs, NPUs, and TPUs, compares their performance and efficiency, describes heterogeneous CPU+xPU solutions, and provides market share data while highlighting the growing adoption of specialized AI accelerators.

AI accelerationAI chipsCPU
0 likes · 7 min read
Overview of AI Chip Types, Architectures, and Market Trends
Architects' Tech Alliance
Architects' Tech Alliance
Mar 12, 2023 · Industry Insights

Who Leads the DPU Market? A Deep Dive into Global and Chinese Players

This article examines the highly concentrated DPU market, highlighting the dominant global vendors Nvidia, Broadcom and Intel, and provides a detailed analysis of emerging Chinese manufacturers—including NebulaX, Paratus, xFusion, Chiplet, and K2—covering their architectures, performance claims, and strategic positioning.

AI accelerationDPUData Processing Unit
0 likes · 12 min read
Who Leads the DPU Market? A Deep Dive into Global and Chinese Players
Architects' Tech Alliance
Architects' Tech Alliance
Feb 8, 2023 · Artificial Intelligence

Computing‑in‑Memory (CiM) Technology: Concepts, History, Advantages, Classifications and Application Scenarios

This article provides a comprehensive overview of Computing‑in‑Memory technology, covering its definition, historical evolution, performance advantages over traditional von Neumann architectures, various technical classifications, storage‑media choices, market drivers, and its pivotal role in AI and big‑data workloads across edge, cloud and automotive domains.

AI accelerationBig DataMemory Architecture
0 likes · 17 min read
Computing‑in‑Memory (CiM) Technology: Concepts, History, Advantages, Classifications and Application Scenarios
Alibaba Cloud Big Data AI Platform
Alibaba Cloud Big Data AI Platform
Sep 1, 2022 · Artificial Intelligence

How Uni‑Fold + Alibaba PAI Boost Protein Structure Prediction to 6.6k Amino Acids

DeepMind’s AlphaFold inspired Uni‑Fold, now accelerated with Alibaba Cloud’s PAI platform, can predict protein structures up to 6.6k amino acids—covering 99.992% of known sequences—delivering ten‑minute inference for SARS‑CoV‑2 spike trimers and setting new performance benchmarks for AI‑driven structural biology.

AI accelerationAlibaba PAIDeep Learning
0 likes · 7 min read
How Uni‑Fold + Alibaba PAI Boost Protein Structure Prediction to 6.6k Amino Acids
Baidu Geek Talk
Baidu Geek Talk
Aug 31, 2022 · Artificial Intelligence

Baidu Intelligent Cloud Launches Cloud-native AI 2.0 to Accelerate AI Engineering

Baidu Intelligent Cloud’s new Cloud‑native AI 2.0 platform tackles AI engineering bottlenecks by offering hybrid‑parallel large‑model training, flexible GPU virtualization, and an AI Accelerate Kit that boosts training efficiency over 50 % and cuts inference latency up to 63 %, raising GPU utilization from ~13 % to about 50 %.

AIAI accelerationGPU virtualization
0 likes · 15 min read
Baidu Intelligent Cloud Launches Cloud-native AI 2.0 to Accelerate AI Engineering
Xiaohongshu Tech REDtech
Xiaohongshu Tech REDtech
Jul 23, 2022 · Mobile Development

Xiaohongshu Deploys On‑Device Super‑Resolution with Huawei HMS Core for High‑Quality Short Videos

Xiaohongshu, partnering with Huawei HMS Core, now runs on‑device super‑resolution for short videos, instantly upscaling 540p to 1080p and enhancing 720p content using GPU/NPU via HiAI, cutting bandwidth and stutter while keeping power use low across hundreds of Huawei devices.

AI accelerationAndroid NDKHuawei HMS Core
0 likes · 9 min read
Xiaohongshu Deploys On‑Device Super‑Resolution with Huawei HMS Core for High‑Quality Short Videos
Baidu Tech Salon
Baidu Tech Salon
Jun 28, 2022 · Artificial Intelligence

How Kunlun XPU‑R Redefines AI Compute: Architecture, Performance, and Future Trends

The article presents a detailed technical review of Kunlun Chip's XPU‑R AI accelerator, covering its evolution from early FPGA prototypes to the current 7nm, 256 TOPS chip, the architectural choices that address AI workload demands, performance advantages over CPUs/GPUs, and the product ecosystem supporting diverse AI scenarios.

AI accelerationAI hardwareChip Design
0 likes · 20 min read
How Kunlun XPU‑R Redefines AI Compute: Architecture, Performance, and Future Trends
Alibaba Terminal Technology
Alibaba Terminal Technology
Apr 28, 2022 · Artificial Intelligence

How MNN’s Sparse Computing Boosts Mobile AI Inference Performance

This article details the design and implementation of sparse computation in Alibaba’s MNN inference engine, covering weight sparsity techniques, block‑sparse layouts, performance benchmarks on MobileNet models versus XNNPack, and real‑world deployment cases that demonstrate significant speedups and memory savings on mobile CPUs.

AI accelerationMNNblock sparsity
0 likes · 16 min read
How MNN’s Sparse Computing Boosts Mobile AI Inference Performance
Alibaba Cloud Developer
Alibaba Cloud Developer
Sep 25, 2019 · Artificial Intelligence

Alibaba Unveils Hanguang 800: The World's Fastest AI Chip Shattering Benchmarks

At the Hangzhou Cloud Expo, Alibaba introduced its first self‑developed AI processor, the Hanguang 800, which delivers up to 78,563 inferences per second on ResNet‑50—four times faster than leading chips—and demonstrates remarkable energy efficiency, powering internal services and upcoming AI cloud offerings.

AI ChipAI accelerationAlibaba
0 likes · 4 min read
Alibaba Unveils Hanguang 800: The World's Fastest AI Chip Shattering Benchmarks
Architects' Tech Alliance
Architects' Tech Alliance
Jun 20, 2019 · Industry Insights

Why the Open Compute Project Is Shaping the Future of Data Centers

The Open Compute Project, backed by leading tech giants, is driving open‑hardware standards, accelerating AI and edge computing, and fostering collaboration across data‑center operators, telecoms, and hardware vendors, as highlighted by the upcoming OCP China Day conference.

AI accelerationData centerEdge Computing
0 likes · 11 min read
Why the Open Compute Project Is Shaping the Future of Data Centers
Architects' Tech Alliance
Architects' Tech Alliance
Feb 2, 2019 · Artificial Intelligence

An Overview of NVIDIA NVLink: Architecture, Topology, and Performance

This article explains NVIDIA's NVLink interconnect technology, covering its history, protocol layers, bandwidth advantages over PCIe, topologies such as the HGX-1/DGX-1 mesh, the NVSwitch extension, and performance gains for deep‑learning and high‑performance computing workloads.

AI accelerationGPU interconnectNVLink
0 likes · 7 min read
An Overview of NVIDIA NVLink: Architecture, Topology, and Performance
Tencent Cloud Developer
Tencent Cloud Developer
Aug 17, 2018 · Cloud Computing

FPGA Acceleration: Exploration and Practice for Data Centers and Cloud Services

In his 2018 Trusted Cloud Conference talk, Tencent FPGA expert Zhang Heng explained how the rapid growth of data and AI workloads drives data‑center and cloud operators to adopt FPGA acceleration for its high‑throughput, low‑latency, programmable performance, citing Tencent’s successes in image transcoding, content‑moderation, AI inference and gene‑sequencing, while outlining ecosystem challenges and future plans for scalable cloud‑FPGA services.

AI accelerationData centerFPGA
0 likes · 18 min read
FPGA Acceleration: Exploration and Practice for Data Centers and Cloud Services
Tencent Architect
Tencent Architect
Jul 30, 2018 · Artificial Intelligence

Four‑Minute ImageNet Training: Tencent’s AI Platform Sets a New World Record

Tencent’s intelligent machine‑learning platform achieved a world‑record by training AlexNet in 4 minutes and ResNet‑50 in 6.6 minutes on ImageNet, using large batch sizes, mixed‑precision, LARS optimization, hierarchical synchronization, gradient fusion, and pipeline I/O techniques to overcome accuracy and scalability challenges.

AI accelerationDeep LearningImageNet
0 likes · 24 min read
Four‑Minute ImageNet Training: Tencent’s AI Platform Sets a New World Record