Tagged articles
70 articles
Page 1 of 1
SuanNi
SuanNi
Mar 14, 2026 · Industry Insights

How Meta’s MTIA Chips Achieved 25× Compute Boost in Just Two Years

This article analyzes Meta's rapid evolution of four generations of MTIA AI chips, detailing how modular hardware, inference‑first design, deep software integration, and aggressive iteration cycles delivered up to 30 PFLOPs of performance and dramatically reshaped the AI compute landscape.

AI chipsHardware accelerationIndustry analysis
0 likes · 13 min read
How Meta’s MTIA Chips Achieved 25× Compute Boost in Just Two Years
Baidu Intelligent Cloud Tech Hub
Baidu Intelligent Cloud Tech Hub
Mar 6, 2026 · Artificial Intelligence

How Baidu’s End‑to‑End Quantization Stack Supercharges Large‑Model Inference on Kunlun XPU

Baidu Baige built a full‑stack quantization pipeline that integrates model‑level, framework‑level, and hardware‑level optimizations on the Kunlun XPU platform, enabling FP16/BF16 large models to be compressed to 25‑50% of their original size while boosting inference speed by 30‑50% and dramatically reducing memory consumption for enterprise deployments.

AI inferenceHardware accelerationINT4
0 likes · 16 min read
How Baidu’s End‑to‑End Quantization Stack Supercharges Large‑Model Inference on Kunlun XPU
Architecture & Thinking
Architecture & Thinking
Mar 1, 2026 · Artificial Intelligence

Why DeepSeek V4 Prioritizes Chinese Chips Over Nvidia – A Game‑Changer for AI Compute

DeepSeek’s upcoming V4 model breaks industry norms by prioritizing Huawei’s Ascend chips over Nvidia GPUs, offering over 30% performance gains, ultra‑long context windows, native multimodal abilities, and dramatically lower inference costs, signaling a shift toward autonomous AI compute in China.

AI computeAI modelsChinese chips
0 likes · 6 min read
Why DeepSeek V4 Prioritizes Chinese Chips Over Nvidia – A Game‑Changer for AI Compute
Weekly Large Model Application
Weekly Large Model Application
Feb 27, 2026 · Industry Insights

Edge AI’s 2026 Boom: Taalas HC1’s Disruption and China’s Key Takeaways

The article explains how the Taalas HC1 edge‑AI chip, with 17,000 tokens/s inference speed, 90 % lower power and 1/20 the cost of Nvidia H200 GPUs, proves that dedicated, non‑general‑purpose silicon can overcome latency, privacy and expense barriers, making on‑device large‑model deployment essential in 2026 and offering a strategic roadmap for Chinese chip makers.

AI chipsChinaCost reduction
0 likes · 12 min read
Edge AI’s 2026 Boom: Taalas HC1’s Disruption and China’s Key Takeaways
Baidu Intelligent Cloud Tech Hub
Baidu Intelligent Cloud Tech Hub
Feb 6, 2026 · Artificial Intelligence

Accelerating GLM‑4.x Inference on Kunlun XPU with SGLang & vLLM

Baidu’s Baige team successfully adapted the GLM‑4.x series language models to the Kunlun XPU platform by leveraging SGLang and the vLLM‑Kunlun plugin, employing agile adaptation, precision alignment with torch_xray, and extensive performance tuning to achieve GPU‑level accuracy and superior inference speed.

AIHardware accelerationXPU
0 likes · 6 min read
Accelerating GLM‑4.x Inference on Kunlun XPU with SGLang & vLLM
AntTech
AntTech
Jan 14, 2026 · Artificial Intelligence

Boosting Secure AI: HAWK Accelerator and FHEFusion Compiler Break New Ground

This article highlights two cutting‑edge works from Ant Group’s research team—HAWK, a fixed‑word key decomposition switching accelerator that overcomes hardware challenges for FHE, and FHEFusion, a compiler framework that introduces operator fusion to dramatically speed CKKS‑based DNN inference—showcasing their designs, optimizations, and experimental gains.

Compiler OptimizationDNN inferenceFully Homomorphic Encryption
0 likes · 7 min read
Boosting Secure AI: HAWK Accelerator and FHEFusion Compiler Break New Ground
Architects' Tech Alliance
Architects' Tech Alliance
Sep 15, 2025 · Artificial Intelligence

Why CPUs and GPUs Struggle with AI and How Specialized AI Chips Are Changing the Game

The article examines the limitations of traditional von‑Neumann CPUs and power‑hungry GPUs for modern AI workloads, explains the rise of ASIC and FPGA based AI accelerators, compares major industry solutions, and highlights why reconfigurable, low‑power AI chips are becoming essential for robotics and edge computing.

AI chipsASICFPGA
0 likes · 11 min read
Why CPUs and GPUs Struggle with AI and How Specialized AI Chips Are Changing the Game
Architects' Tech Alliance
Architects' Tech Alliance
Jun 3, 2025 · Artificial Intelligence

Comprehensive Analysis of RDMA Technology: Principles, Features, Products, and Applications in HPC, AI, and Cloud Storage

The article provides an in‑depth technical overview of Remote Direct Memory Access (RDMA), covering its zero‑copy, kernel‑bypass, and protocol‑offload features, hardware and software ecosystems, and its impact on high‑performance computing, artificial intelligence, cloud storage, finance, and edge computing.

Hardware accelerationHigh‑performance computingNetwork Protocols
0 likes · 10 min read
Comprehensive Analysis of RDMA Technology: Principles, Features, Products, and Applications in HPC, AI, and Cloud Storage
AntTech
AntTech
May 20, 2025 · Information Security

FAST and Neo: New Hardware Accelerators for Scalable Fully Homomorphic Encryption

The article reviews two recent ISCA 2025 papers—FAST and Neo—that introduce hardware and GPU‑based accelerators employing hoisting, KLSS, and Tensor Core optimizations to significantly boost the performance of fully homomorphic encryption workloads.

Cryptographic OptimizationFully Homomorphic EncryptionGPU computing
0 likes · 6 min read
FAST and Neo: New Hardware Accelerators for Scalable Fully Homomorphic Encryption
DataFunTalk
DataFunTalk
Mar 3, 2025 · Artificial Intelligence

FlightVGM: FPGA-Accelerated Inference for Video Generation Models Wins Best Paper at FPGA 2025

The FlightVGM paper, awarded Best Paper at FPGA 2025, details a novel FPGA-based inference IP for video generation models that leverages time‑space activation sparsity, mixed‑precision DSP58 extensions, and adaptive scheduling to achieve up to 1.30× performance and 4.49× energy‑efficiency gains over a NVIDIA 3090 GPU while preserving model accuracy.

AIFPGAHardware acceleration
0 likes · 11 min read
FlightVGM: FPGA-Accelerated Inference for Video Generation Models Wins Best Paper at FPGA 2025
Architects' Tech Alliance
Architects' Tech Alliance
Feb 24, 2025 · Artificial Intelligence

NSA: Hardware‑Optimized Sparse Attention Mechanism from DeepSeek, Peking University and University of Washington

The NSA mechanism introduces a three‑branch hardware‑optimized sparse attention architecture—token compression, token selection, and sliding window—combined with learnable gating to balance global and local context, dramatically improving inference speed and efficiency for long‑context large language models.

AI ArchitectureDeepSeekHardware acceleration
0 likes · 5 min read
NSA: Hardware‑Optimized Sparse Attention Mechanism from DeepSeek, Peking University and University of Washington
Architects' Tech Alliance
Architects' Tech Alliance
Feb 4, 2025 · Artificial Intelligence

Why AI Frameworks Are the Backbone of Modern AI – Spotlight on MindSpore

The article explains what AI frameworks are, why they act as the operating system of artificial intelligence, showcases real‑world uses in transportation and finance, and provides an in‑depth analysis of Huawei's MindSpore framework, highlighting its development experience, hardware optimization, deployment flexibility, and enterprise‑grade security features.

AI FrameworkDeep LearningEnterprise AI
0 likes · 7 min read
Why AI Frameworks Are the Backbone of Modern AI – Spotlight on MindSpore
Architects' Tech Alliance
Architects' Tech Alliance
Dec 6, 2024 · Industry Insights

How GPU Virtualization Works: Layers, Techniques, and Real-World Use Cases

This article explains the fundamentals of GPU architecture, the need for GPU virtualization, and walks through user‑level, kernel‑level, hardware‑level, and full GPU virtualization techniques, illustrating each layer with diagrams and code examples while highlighting practical deployment scenarios.

GPUHardware accelerationSystem Architecture
0 likes · 10 min read
How GPU Virtualization Works: Layers, Techniques, and Real-World Use Cases
AntTech
AntTech
Oct 21, 2024 · Information Security

Second Homomorphic Encryption Computing Performance Optimization Forum – Hardware Accelerators

The second Homomorphic Encryption Computing Performance Optimization Forum, held on October 26 at the Summer Garden venue, gathers leading researchers to discuss hardware acceleration, cross‑disciplinary challenges, and recent advances in privacy‑preserving computation, presenting detailed abstracts and speaker bios for five technical sessions.

Hardware accelerationconference
0 likes · 9 min read
Second Homomorphic Encryption Computing Performance Optimization Forum – Hardware Accelerators
Baobao Algorithm Notes
Baobao Algorithm Notes
Aug 27, 2024 · Industry Insights

What Real‑World LLM Researchers Face: Scaling Limits, Data Bottlenecks, and Deployment Challenges

The author shares a candid account of recent large‑model experiments, highlighting why most labs struggle to exceed 100 B parameters, how data and hardware constraints shape model iteration, and the practical engineering, safety, and multimodal challenges that dictate real‑world LLM deployment.

AI industryAI scalingHardware acceleration
0 likes · 6 min read
What Real‑World LLM Researchers Face: Scaling Limits, Data Bottlenecks, and Deployment Challenges
Architects' Tech Alliance
Architects' Tech Alliance
Jul 13, 2024 · Operations

How to Supercharge Kunpeng CPUs: Real‑World Performance Tuning Techniques

This article provides a comprehensive guide to optimizing Kunpeng‑based servers, covering hardware characteristics, matrix multiplication benchmarks, Von Neumann architecture insights, soft and hard acceleration, compiler and JDK tweaks, NUMA tuning, Nginx and OpenSSL acceleration, disk and network optimizations, application‑level tuning, and a step‑by‑step MariaDB performance‑tuning checklist.

CPU performanceDatabase TuningHardware acceleration
0 likes · 16 min read
How to Supercharge Kunpeng CPUs: Real‑World Performance Tuning Techniques
Open Source Linux
Open Source Linux
May 22, 2024 · Artificial Intelligence

Why GPUs Are the Powerhouse Behind Modern AI: A Deep Dive

This article explains how GPUs, with their parallel architecture and extensive software ecosystem, have become essential for accelerating AI training and inference, outperforming CPUs and shaping the future of artificial intelligence across various industries.

Deep LearningGPUHardware acceleration
0 likes · 10 min read
Why GPUs Are the Powerhouse Behind Modern AI: A Deep Dive
360 Smart Cloud
360 Smart Cloud
Apr 3, 2024 · Backend Development

Understanding FFmpeg Hardware Acceleration Architecture and Implementation

FFmpeg provides a comprehensive, cross‑platform hardware acceleration framework that abstracts diverse GPU and dedicated video codec interfaces, defines HWContext types, device and frame contexts, and various codec configuration methods, enabling efficient video encoding, decoding, and filtering while addressing performance, compatibility, and pipeline complexity challenges.

GPUHardware accelerationMultimedia
0 likes · 10 min read
Understanding FFmpeg Hardware Acceleration Architecture and Implementation
vivo Internet Technology
vivo Internet Technology
Mar 13, 2024 · Operations

Optimizing VUA HTTPS Forwarding Performance with Intel QuickAssist Technology (QAT)

By integrating Intel QuickAssist hardware and AVX‑512 software acceleration into the VUA component of Vivo’s load‑balancing platform, the article demonstrates asynchronous OpenSSL offloading that boosts HTTPS forwarding throughput to roughly 44 000 QPS with QAT cards and 51 000 QPS with software, while preserving scalability and security.

HTTPSHardware accelerationIntel QAT
0 likes · 12 min read
Optimizing VUA HTTPS Forwarding Performance with Intel QuickAssist Technology (QAT)
Baidu Intelligent Cloud Tech Hub
Baidu Intelligent Cloud Tech Hub
Oct 18, 2023 · Cloud Computing

How AI Is Redefining Cloud Computing: From Scale‑Up to Serverless

The talk explores how the rise of large AI models is transforming cloud computing architecture, workloads, and services—shifting from traditional virtualization to heterogeneous compute, massive scaling, serverless infrastructures, and new networking designs that together enable agile AI‑native applications.

AI-nativeDistributed TrainingHardware acceleration
0 likes · 23 min read
How AI Is Redefining Cloud Computing: From Scale‑Up to Serverless
DataFunSummit
DataFunSummit
Sep 8, 2023 · Artificial Intelligence

AI Compiler Forum at DataFun Summit 2023: Tile-Based Deep Learning Compilation, Graph Scheduling for Domain‑Specific Accelerators, and Triton on Hopper

The DataFun Summit 2023 AI Compiler Forum gathered leading researchers to present cutting‑edge techniques on tile‑based deep learning compilation, efficient graph scheduling for domain‑specific accelerators, large‑model deployment, and the latest advancements of OpenAI Triton on NVIDIA Hopper, offering practical insights for AI system developers.

AI compilerGraph SchedulingHardware acceleration
0 likes · 8 min read
AI Compiler Forum at DataFun Summit 2023: Tile-Based Deep Learning Compilation, Graph Scheduling for Domain‑Specific Accelerators, and Triton on Hopper
Architects' Tech Alliance
Architects' Tech Alliance
Sep 6, 2023 · Industry Insights

What Is a DPU and Why It’s the Next Big Chip for Data Centers

The article explains the role of Data Processing Units (DPUs) as the third major data‑center chip after CPUs and GPUs, outlines China Mobile’s DPU‑focused bare‑metal server initiative, and provides links to technical papers and white‑papers that detail DPU architecture, performance benchmarks, and industry trends.

Bare Metal ServerDPUData center
0 likes · 4 min read
What Is a DPU and Why It’s the Next Big Chip for Data Centers
Network Intelligence Research Center (NIRC)
Network Intelligence Research Center (NIRC)
Jun 24, 2023 · Artificial Intelligence

How DFX Achieves Low-Latency Multi-FPGA Acceleration for Transformer Text Generation

The article reviews the DFX system—a multi‑FPGA server that uses model‑parallelism and a ring‑topology interconnect to accelerate GPT‑2 text generation, showing 3.78× higher throughput, 3.99× better energy efficiency, and 8.21× greater cost‑effectiveness compared with a four‑GPU V100 baseline.

FPGAGPT-2Hardware acceleration
0 likes · 6 min read
How DFX Achieves Low-Latency Multi-FPGA Acceleration for Transformer Text Generation
Xiaohongshu Tech REDtech
Xiaohongshu Tech REDtech
May 15, 2023 · Artificial Intelligence

GPU-Accelerated Inference Optimization for Large-Scale Machine Learning at Xiaohongshu

Xiaohongshu transformed its recommendation, advertising, and search inference pipeline by migrating to GPU‑centric hardware, deploying a custom TensorFlow‑Core Lambda service, and applying system‑level, virtualization, and compute‑level optimizations—including NUMA binding, kernel fusion, dynamic scaling, and FP16 quantization—achieving roughly 30× compute capacity growth, over 10% user‑metric gains, and more than 50% cluster‑resource savings.

GPU OptimizationHardware accelerationMachine Learning Inference
0 likes · 20 min read
GPU-Accelerated Inference Optimization for Large-Scale Machine Learning at Xiaohongshu
Baidu Intelligent Cloud Tech Hub
Baidu Intelligent Cloud Tech Hub
May 15, 2023 · Cloud Computing

How Baidu’s UNP Platform Supercharges Load‑Balancing to 1 Tbps

This article explains the limitations of traditional X86‑DPDK load‑balancing gateways and how Baidu’s third‑generation Universal Networking Platform (UNP) combines programmable ASICs, CPUs, and FPGA acceleration to deliver multi‑terabit throughput, ultra‑low latency, and dramatically lower cost and power consumption.

Baidu CloudHardware accelerationNetworking
0 likes · 11 min read
How Baidu’s UNP Platform Supercharges Load‑Balancing to 1 Tbps
Architects' Tech Alliance
Architects' Tech Alliance
May 15, 2023 · Artificial Intelligence

AI ASIC Landscape: Google TPU Evolution, Intel Habana Gaudi 2, IBM AIU, and Samsung Warboy NPU

The article surveys the rapid entry of leading vendors into the AI ASIC market, detailing Google’s TPU generations, Intel’s acquisition of Habana Labs and the Gaudi 2 chip, IBM’s upcoming AIU, Samsung’s Warboy NPU, and the performance, architectural, and future trends of ASICs for AI inference and training.

AI ASICGaudiHardware acceleration
0 likes · 11 min read
AI ASIC Landscape: Google TPU Evolution, Intel Habana Gaudi 2, IBM AIU, and Samsung Warboy NPU
Tencent Tech
Tencent Tech
Apr 18, 2023 · Artificial Intelligence

How Tencent’s Zixiao AI Chip Supercharges Real‑Time Meeting Subtitles

Tencent’s home‑grown Zixiao AI inference chip, combined with the LightRuntime engine, dramatically reduces latency and cost for real‑time subtitles in Tencent Meeting, handling tens of thousands of concurrent audio streams while meeting sub‑second delay requirements through hardware‑software co‑optimizations and mixed‑precision model tuning.

Hardware accelerationReal-time Speech RecognitionTencent Meeting
0 likes · 16 min read
How Tencent’s Zixiao AI Chip Supercharges Real‑Time Meeting Subtitles
Architects' Tech Alliance
Architects' Tech Alliance
Jan 13, 2023 · Fundamentals

2022 DPU Development Analysis Report and Related Network Technologies

The 2022 DPU Development Analysis Report outlines the evolution of Data Processing Units from CPU/NP and FPGA‑CPU architectures to ASIC‑CPU designs, discusses RDMA high‑speed networking, data‑plane forwarding techniques, network programmability, and the emerging open DPU software ecosystem, highlighting their performance, power, and cost implications for modern data centers.

ASICDPUData Plane
0 likes · 14 min read
2022 DPU Development Analysis Report and Related Network Technologies
Architects' Tech Alliance
Architects' Tech Alliance
Nov 1, 2022 · Databases

2022 China Database Industry Report: Emerging Hardware and Architectural Innovations

The September 2022 China Database Industry Analysis report highlights a wave of hardware‑driven innovations—including multi‑core CPUs, heterogeneous GPUs/TPUs/DPU, programmable FPGAs, CXL‑DDR5, persistent memory, NVMe‑oF, and RDMA‑based storage—that enable massive data storage and high‑concurrency real‑time computing across a range of novel database architectures and products.

GPUHardware accelerationOLTP
0 likes · 10 min read
2022 China Database Industry Report: Emerging Hardware and Architectural Innovations
Qingyun Technology Community
Qingyun Technology Community
Sep 15, 2022 · Cloud Computing

How GPU, VPU, and CPU Accelerate Cloud Video Transcoding: Architecture and Best Practices

This article explores the rapid growth of video traffic, explains why transcoding is essential, compares CPU, GPU, and VPU hardware for video processing, details the FFmpeg software stack, describes the design of a cloud‑native transcoding cluster, its scheduling, shard‑transcoding technique, and presents performance test results.

Distributed SystemsGPU AccelerationHardware acceleration
0 likes · 23 min read
How GPU, VPU, and CPU Accelerate Cloud Video Transcoding: Architecture and Best Practices
NetEase Smart Enterprise Tech+
NetEase Smart Enterprise Tech+
Sep 14, 2022 · Cloud Computing

How Intel’s End‑to‑End Audio‑Video Optimizations Power the Next Cloud Media Experience

The article reviews Intel’s end‑to‑end audio‑video optimization solutions presented at the 2022 NetEase Audio‑Video Technology Conference, covering market trends, hardware accelerators, software stacks, and future data‑center strategies that together enable high‑quality, cost‑effective streaming in the cloud era.

Hardware accelerationIntelMedia Optimization
0 likes · 7 min read
How Intel’s End‑to‑End Audio‑Video Optimizations Power the Next Cloud Media Experience
Baidu Tech Salon
Baidu Tech Salon
Jun 13, 2022 · Artificial Intelligence

Kunlun Core AI Chips: Making Computing Smarter

The 2022 Beijing Zhiyuan Conference report by Kunlun Core’s chip R&D director outlines AI chip market opportunities and challenges, describes the company’s shift from FPGA clusters to a programmable XPU‑R architecture with 7nm, 256 TOPS INT8 performance, GDDR6 memory and PCIe 4.0, and details current deployments and plans for third‑ and fourth‑generation chips.

AI ChipAI acceleratorChip Design
0 likes · 12 min read
Kunlun Core AI Chips: Making Computing Smarter
Alibaba Cloud Infrastructure
Alibaba Cloud Infrastructure
May 31, 2022 · Information Security

Fidas: FPGA‑Based Comprehensive Offloading for Cloud Intrusion Detection (ISCA 2022 Full‑Score Paper)

The ISCA 2022 full‑score paper “Fidas: Fortifying the Cloud via Comprehensive FPGA‑based Offloading for Intrusion Detection” presents a novel FPGA‑accelerated IDS architecture that jointly offloads regex matching and traffic classification, achieving high flexibility, rapid rule updates, balanced load, and line‑rate performance in cloud data centers.

FPGAHardware accelerationISCA
0 likes · 7 min read
Fidas: FPGA‑Based Comprehensive Offloading for Cloud Intrusion Detection (ISCA 2022 Full‑Score Paper)
Baidu Geek Talk
Baidu Geek Talk
Feb 7, 2022 · Mobile Development

Optimizing Video Playback: Soft/Hardware Decoding Strategies for Baidu Android App

The article evaluates software versus hardware video decoding for Baidu’s Android app, presents benchmark data showing surface‑mode hardware decoding’s superior efficiency, identifies compatibility and first‑frame latency challenges, and proposes a monitoring module plus seamless soft‑to‑hard decoder switching to achieve high hardware‑decode usage while maintaining fast startup and low error rates.

AndroidHardware accelerationMediaCodec
0 likes · 11 min read
Optimizing Video Playback: Soft/Hardware Decoding Strategies for Baidu Android App
NetEase Cloud Music Tech Team
NetEase Cloud Music Tech Team
Dec 29, 2021 · Frontend Development

Understanding Browser Compositing Layers: A Guide to CSS Hardware Acceleration

The article explains how browsers build render trees and use GPU‑accelerated compositing layers—created by properties like transform, will‑change, or media elements—to improve performance, avoid repaint glitches such as iOS timer flicker, and offers best‑practice tips for using these layers efficiently without excess memory use.

Browser RenderingCSS OptimizationGPU rendering
0 likes · 10 min read
Understanding Browser Compositing Layers: A Guide to CSS Hardware Acceleration
AntTech
AntTech
Dec 21, 2021 · Information Security

Hardware‑Software Integration Accelerates Privacy Computing: Technical Overview

The article explains how combining hardware and software solutions can address the data‑lifecycle security and cryptographic performance challenges of privacy computing, describing the underlying technology stack, acceleration techniques, and the integrated privacy‑computing appliance released by Ant Group.

Hardware accelerationPrivacy Computingcryptography
0 likes · 13 min read
Hardware‑Software Integration Accelerates Privacy Computing: Technical Overview
21CTO
21CTO
Dec 9, 2021 · Artificial Intelligence

How Alibaba’s DAMO Academy Is Redefining AI with the First 3D‑Stacked Compute‑Memory Chip

On December 3, Alibaba’s DAMO Academy announced its first AI chip that integrates memory and compute using hybrid‑bond 3D stacking, promising ten‑fold performance gains and 300× energy efficiency for AI workloads such as recommendation systems, and marking a shift from traditional von Neumann designs.

3D stackingAI ChipCompute-in-Memory
0 likes · 5 min read
How Alibaba’s DAMO Academy Is Redefining AI with the First 3D‑Stacked Compute‑Memory Chip
Architects' Tech Alliance
Architects' Tech Alliance
Oct 20, 2021 · Fundamentals

Overview of the Specialized Data Processing Unit (DPU) Technology Whitepaper

The whitepaper from the Institute of Computing Technology, Chinese Academy of Sciences, provides a comprehensive analysis of DPU background, technical characteristics, reference architecture, application scenarios, and a comparative review of existing DPU products, highlighting its role in modern data‑center infrastructures.

DPUData Processing UnitData center
0 likes · 24 min read
Overview of the Specialized Data Processing Unit (DPU) Technology Whitepaper
Architects' Tech Alliance
Architects' Tech Alliance
Sep 5, 2021 · Fundamentals

Overview of Data Processing Units (DPUs) and Their Evolution in Data Centers

Data Processing Units (DPUs) have evolved from early I/O processors to modern programmable ASICs and FPGA-based accelerators, integrating networking, storage, and compute functions to offload workloads from CPUs, with contributions from companies like Fungible, Nvidia, Intel, and emerging Chinese firms, shaping data‑center and edge architectures.

DPUData centerFPGA
0 likes · 13 min read
Overview of Data Processing Units (DPUs) and Their Evolution in Data Centers
WeChat Client Technology Team
WeChat Client Technology Team
Aug 10, 2021 · Mobile Development

How We Built a Cross‑Platform Hardware‑Accelerated Live‑Streaming SDK for WeChat Video Channels

This article details the design and implementation of a cross‑platform SDK that enables external hardware devices to stream live video on WeChat Video Channels, covering user authentication, network signaling, UI integration, audio‑video encoding, and hardware acceleration across Android, iOS, PC and embedded platforms.

Hardware accelerationSDKVideo Encoding
0 likes · 11 min read
How We Built a Cross‑Platform Hardware‑Accelerated Live‑Streaming SDK for WeChat Video Channels
Architects' Tech Alliance
Architects' Tech Alliance
Jul 23, 2021 · Artificial Intelligence

2021 Overview of China’s Data Processing Unit (DPU) Industry

The article provides a comprehensive analysis of China’s DPU market in 2021, covering DPU definitions, classifications, technology roadmaps, industry chain, business models, key applications, competitive landscape, and future trends in data centers, edge computing, telecom, and autonomous driving.

ChinaDPUData Processing Unit
0 likes · 12 min read
2021 Overview of China’s Data Processing Unit (DPU) Industry
NetEase Smart Enterprise Tech+
NetEase Smart Enterprise Tech+
Jul 20, 2021 · Backend Development

How NetEase Cloud Accelerates Video Transcoding with Slice‑Based Parallelism

NetEase Cloud’s video transcoding service boosts processing speed by combining hardware acceleration, custom codecs, AMD EPYC servers, and a slice‑based parallel transcoding pipeline, while optimizing cluster task scheduling and handling straggler issues to achieve significant performance gains across large‑scale media workloads.

Distributed ProcessingHardware accelerationVideo Transcoding
0 likes · 16 min read
How NetEase Cloud Accelerates Video Transcoding with Slice‑Based Parallelism
Architects' Tech Alliance
Architects' Tech Alliance
Jul 16, 2021 · Artificial Intelligence

AI Chip Landscape: GPUs, FPGAs, and ASICs for Deep Learning

The article explains how artificial intelligence relies on algorithms, compute and data, compares engineering and simulation methods, and details the roles, architectures, performance and energy characteristics of GPUs, FPGAs, and ASICs as the primary hardware accelerators for modern deep‑learning applications.

ASICChip DesignDeep Learning
0 likes · 14 min read
AI Chip Landscape: GPUs, FPGAs, and ASICs for Deep Learning
DataFunTalk
DataFunTalk
Feb 3, 2021 · Artificial Intelligence

Towards Best Possible Deep Learning Acceleration on the Edge – A Compression-Compilation Co-Design Framework

The lecture presented by Assistant Professor Yanzhi Wang introduces a compression‑compilation co‑design framework (CoCoPIE) that achieves real‑time deep‑learning inference on edge devices through novel pruning and quantization techniques, delivering up to 180× speedup without accuracy loss.

AIDeep LearningEdge Computing
0 likes · 5 min read
Towards Best Possible Deep Learning Acceleration on the Edge – A Compression-Compilation Co-Design Framework
JD Cloud Developers
JD Cloud Developers
Dec 21, 2020 · Artificial Intelligence

Weekly Tech Highlights: AI Chip, Cloud Forecasts, Docker M1 Preview & More

This week’s developer newsletter spotlights the Chinese Academy of Sciences’ pioneering GNN accelerator chip, IDC’s ten cloud computing predictions for China, the booming IoT market and 5G dominance, Docker’s M1‑compatible desktop preview, a carbon‑nanotube transistor breakthrough, IBM’s FHE initiative, and recent AI research on lifelong learning and reinforcement learning exploration.

DockerHardware accelerationIoT
0 likes · 7 min read
Weekly Tech Highlights: AI Chip, Cloud Forecasts, Docker M1 Preview & More
Meituan Technology Team
Meituan Technology Team
Nov 5, 2020 · Databases

Database System Research and Future Trends – Talk by Prof. Zhou Xuan (ECNU)

In his 2020 talk, Prof. Zhou Xuan outlined the evolution of database systems, highlighted ECNU’s research on distributed transactions, hardware‑aware modular designs, HTAP and system decoupling, and argued that future databases will be increasingly modular, hardware‑adapted, and driven by close academia‑industry collaboration.

Hardware accelerationcloud computingdatabase
0 likes · 38 min read
Database System Research and Future Trends – Talk by Prof. Zhou Xuan (ECNU)
JD Tech Talk
JD Tech Talk
Oct 28, 2020 · Backend Development

Performance Optimization of SSL/TLS in JD.com JDDLB Load Balancer Using Freescale Acceleration Cards

This article describes the architecture of JD.com’s JDDLB public‑traffic load balancer and details how offloading CPU‑intensive SSL/TLS cryptographic operations to Freescale C291 acceleration cards—via custom NGINX modules, OpenSSL Engine integration, and synchronous/asynchronous driver interfaces—significantly improves connection‑establishment rates and overall throughput.

BackendHardware accelerationOpenSSL
0 likes · 30 min read
Performance Optimization of SSL/TLS in JD.com JDDLB Load Balancer Using Freescale Acceleration Cards
IT Xianyu
IT Xianyu
Sep 27, 2020 · Cloud Computing

The Rise of Cloud Computing: From Moore's Law to Alibaba's Shenlong Architecture

The article examines the end of Moore's Law, the rapid growth of cloud computing, the challenges of virtualization overhead, and how Alibaba's Shenlong architecture leverages hardware acceleration to revive performance gains and reshape the future of hardware‑software co‑evolution.

AlibabaHardware accelerationMoore's Law
0 likes · 7 min read
The Rise of Cloud Computing: From Moore's Law to Alibaba's Shenlong Architecture
DataFunTalk
DataFunTalk
Mar 24, 2020 · Databases

ByteDance’s Enhancements to RocksDB: LazyBuffer, Adaptive Map, KV Separation, Multi‑Index, Extreme Compression, and New Hardware Support

This article describes ByteDance’s extensive improvements to the RocksDB storage engine—including LazyBuffer, Adaptive Map‑based lazy compaction, KV separation, adaptive multi‑index support, extreme compression techniques, and hardware acceleration—to reduce amplification, improve performance, and lower costs for large‑scale database workloads.

Hardware accelerationKV SeparationRocksDB
0 likes · 14 min read
ByteDance’s Enhancements to RocksDB: LazyBuffer, Adaptive Map, KV Separation, Multi‑Index, Extreme Compression, and New Hardware Support
Architects' Tech Alliance
Architects' Tech Alliance
Feb 8, 2020 · Cloud Computing

Demystifying FPGA: Architecture, Performance, and Microsoft's Data Center Deployment

FPGA, a reconfigurable hardware architecture, offers low latency and high efficiency compared to CPUs, GPUs, and ASICs, making it ideal for both compute‑intensive and communication‑intensive tasks, and Microsoft’s multi‑stage data‑center deployments illustrate its scalability, flexibility, and impact on cloud services.

Data centerFPGAHardware acceleration
0 likes · 21 min read
Demystifying FPGA: Architecture, Performance, and Microsoft's Data Center Deployment
Architects' Tech Alliance
Architects' Tech Alliance
Oct 11, 2019 · Cloud Computing

Understanding FPGA: Architecture, Advantages, and Microsoft’s Data‑Center Deployments

This article explains what FPGA (Field‑Programmable Gate Array) is, why it offers lower latency and higher energy efficiency than CPUs or GPUs for both compute‑intensive and communication‑intensive workloads, and details Microsoft’s three‑generation FPGA deployment strategy in its data‑center and cloud infrastructure.

Data centerFPGAHardware acceleration
0 likes · 20 min read
Understanding FPGA: Architecture, Advantages, and Microsoft’s Data‑Center Deployments
Qunar Tech Salon
Qunar Tech Salon
Sep 5, 2019 · Artificial Intelligence

Implementing Bilinear Interpolation on FPGA for Neural Network Acceleration

The article explains the principles of bilinear interpolation, why it is needed for smooth image scaling in neural‑network layers such as Interp and Resize, and details FPGA‑specific optimizations—including lookup‑table based coefficient pre‑computation, two‑line BRAM caching, and index‑driven data swapping—to reduce DSP usage and improve throughput.

BRAMBilinear InterpolationDSP
0 likes · 14 min read
Implementing Bilinear Interpolation on FPGA for Neural Network Acceleration
Tencent Cloud Developer
Tencent Cloud Developer
Aug 17, 2018 · Cloud Computing

FPGA Acceleration: Exploration and Practice for Data Centers and Cloud Services

In his 2018 Trusted Cloud Conference talk, Tencent FPGA expert Zhang Heng explained how the rapid growth of data and AI workloads drives data‑center and cloud operators to adopt FPGA acceleration for its high‑throughput, low‑latency, programmable performance, citing Tencent’s successes in image transcoding, content‑moderation, AI inference and gene‑sequencing, while outlining ecosystem challenges and future plans for scalable cloud‑FPGA services.

AI accelerationData centerFPGA
0 likes · 18 min read
FPGA Acceleration: Exploration and Practice for Data Centers and Cloud Services
JD Retail Technology
JD Retail Technology
Jul 16, 2018 · Mobile Development

Android RenderThread and Asynchronous Animation Rendering: Deep Dive

This article explains Android's RenderThread, its role in hardware-accelerated UI rendering, how it enables asynchronous animation via ViewPropertyAnimator, and provides code examples demonstrating RenderThread-driven animation that remains smooth even when the UI thread is blocked.

AndroidHardware accelerationRenderThread
0 likes · 15 min read
Android RenderThread and Asynchronous Animation Rendering: Deep Dive
Architects' Tech Alliance
Architects' Tech Alliance
Apr 18, 2018 · Fundamentals

Understanding GPU Architecture and Its Evolution

This article explains the historical development of graphics processing units, their internal structure, rendering pipeline, and how GPUs shifted graphics workloads from CPUs to specialized parallel hardware, highlighting key concepts such as vertex shaders, pixel shaders, SIMD architectures, and performance growth.

GPUHardware accelerationRendering Pipeline
0 likes · 11 min read
Understanding GPU Architecture and Its Evolution
Alibaba Cloud Developer
Alibaba Cloud Developer
Apr 9, 2018 · Databases

How FPGA Acceleration Supercharges X-Engine’s Compaction for 10× MySQL Performance

This article introduces Alibaba’s X‑Engine storage engine, the foundation of the next‑generation distributed database X‑DB, and explains how FPGA‑accelerated compaction and asynchronous scheduling dramatically improve write‑intensive OLTP performance, reduce CPU contention, and achieve up to 50 % throughput gains while maintaining fault tolerance.

FPGAHardware accelerationLSM‑Tree
0 likes · 21 min read
How FPGA Acceleration Supercharges X-Engine’s Compaction for 10× MySQL Performance
AntTech
AntTech
Jan 4, 2018 · Databases

Report on VLDB 2017 Conference: Insights and Highlights from Database Research

Attending VLDB 2017 in Munich, the report summarizes the conference’s broad coverage of database research—from new hardware‑accelerated prototypes and Spark‑based big‑data processing to Oracle and SAP HANA case studies, keynotes, notable papers, and reflections on industry trends and Chinese contributions.

Big DataHardware accelerationVLDB
0 likes · 22 min read
Report on VLDB 2017 Conference: Insights and Highlights from Database Research
Alibaba Cloud Infrastructure
Alibaba Cloud Infrastructure
Dec 11, 2017 · Operations

FPGA-Based High-Compression Image Encoding: Architecture, Optimization, and Performance Evaluation

This article describes a project that replaces CPU‑based image compression with an FPGA solution, detailing the system hierarchy, two‑phase development (function verification and performance boost), pipeline and frequency optimizations, software‑FPGA interaction, and a measured 25‑fold speedup over a 64‑core server.

FPGAHardware accelerationHigh Compression
0 likes · 6 min read
FPGA-Based High-Compression Image Encoding: Architecture, Optimization, and Performance Evaluation
Tencent Architect
Tencent Architect
Oct 20, 2017 · Artificial Intelligence

Design and Performance of a General‑Purpose FPGA CNN Accelerator for Real‑Time AI Services

This article presents a comprehensive overview of a universal FPGA‑based CNN accelerator, detailing its motivation, flexible architecture, compiler workflow, memory and compute unit designs, and performance comparisons that demonstrate significant latency and cost advantages over CPU and GPU solutions for real‑time AI inference.

AI inferenceCNN accelerationFPGA
0 likes · 13 min read
Design and Performance of a General‑Purpose FPGA CNN Accelerator for Real‑Time AI Services
21CTO
21CTO
Sep 13, 2017 · Mobile Development

Mastering Android Video Encoding: Choosing Encoders and Optimizing YUV Processing

This article examines Android video recording challenges, compares hardware (MediaCodec) and software (FFmpeg + x264/openh264) encoders, highlights device‑specific pitfalls such as color‑format support and alignment, and presents fast NEON‑based algorithms for scaling, rotation, and mirroring of YUV frames.

AndroidHardware accelerationMediaCodec
0 likes · 12 min read
Mastering Android Video Encoding: Choosing Encoders and Optimizing YUV Processing
Meituan Technology Team
Meituan Technology Team
Jan 19, 2017 · Mobile Development

Understanding Hardware Acceleration in Android Applications

Hardware acceleration in Android shifts intensive floating‑point UI work from the CPU to the GPU by building DisplayLists on the CPU and rasterizing them on the GPU, allowing parallel processing, selective redraw of unchanged elements, and significantly higher frame rates for animations and complex graphics.

AndroidCPUDisplayList
0 likes · 14 min read
Understanding Hardware Acceleration in Android Applications