Tagged articles

hardware acceleration

74 articles · Page 1 of 1

Jun 28, 2026 · Artificial Intelligence

When the Memory Wall Locks AI Compute, Is HBM the Key or Another Lock?

The article analyzes how the growing memory‑wall bottleneck forces GPUs to idle while waiting for data, compares on‑chip SRAM and high‑bandwidth memory (HBM) as remedies, and examines HBM’s technical advantages, supply constraints, and divergent manufacturing routes that may turn it into a new limitation.

AI computeGPUHBM

0 likes · 6 min read

When the Memory Wall Locks AI Compute, Is HBM the Key or Another Lock?

Old Zhang's AI Learning

Jun 15, 2026 · Artificial Intelligence

vLLM 0.23.0 Brings Faster Local LLM Deployment and Wider Hardware Support

Version 0.23.0 of the open‑source vLLM inference engine adds full DeepSeek‑V4 stability, Model Runner V2 coverage for Llama, Mistral, Qwen3 and new models, a production‑grade Rust front‑end, multi‑level KV‑cache offloading, extensive hardware optimizations across NVIDIA, AMD, Intel, TPU and RISC‑V, plus API enhancements, delivering up to 20 % performance gains while simplifying deployment.

DeepSeek-V4KV cache offloadingLLM Inference

0 likes · 8 min read

vLLM 0.23.0 Brings Faster Local LLM Deployment and Wider Hardware Support

Baidu Intelligent Cloud Tech Hub

Jun 1, 2026 · Cloud Computing

Cut Migration Time by 60%: Baidu Cloud Deploys Intel Xeon 6 QAT‑Accelerated Live VM Migration

The article analyzes the challenges of large‑scale live VM migration, introduces Intel Xeon 6 CPU‑integrated QAT hardware acceleration, compares pre‑ and post‑QAT workflows, and reports a 60% reduction in migration time, 20% CPU savings, and sub‑10 ms downtime in Baidu Smart Cloud production.

Cloud ComputingIntel QATPerformance Optimization

0 likes · 10 min read

Cut Migration Time by 60%: Baidu Cloud Deploys Intel Xeon 6 QAT‑Accelerated Live VM Migration

Machine Heart

May 31, 2026 · Artificial Intelligence

Can Low-Bit Models Cut Inference Costs Better Than Small Models?

The article analyzes how low‑bit quantization differs from simply using smaller LLMs, examines hardware‑level precision reduction, compares post‑training quantization with native low‑bit designs, and explains the runtime and testing requirements needed to achieve real inference cost savings.

LLM Inferencecost optimizationhardware acceleration

0 likes · 7 min read

Can Low-Bit Models Cut Inference Costs Better Than Small Models?

SuanNi

Mar 14, 2026 · Industry Insights

How Meta’s MTIA Chips Achieved 25× Compute Boost in Just Two Years

This article analyzes Meta's rapid evolution of four generations of MTIA AI chips, detailing how modular hardware, inference‑first design, deep software integration, and aggressive iteration cycles delivered up to 30 PFLOPs of performance and dramatically reshaped the AI compute landscape.

AI chipsIndustry AnalysisMTIA

0 likes · 13 min read

How Meta’s MTIA Chips Achieved 25× Compute Boost in Just Two Years

Baidu Intelligent Cloud Tech Hub

Mar 6, 2026 · Artificial Intelligence

How Baidu’s End‑to‑End Quantization Stack Supercharges Large‑Model Inference on Kunlun XPU

Baidu Baige built a full‑stack quantization pipeline that integrates model‑level, framework‑level, and hardware‑level optimizations on the Kunlun XPU platform, enabling FP16/BF16 large models to be compressed to 25‑50% of their original size while boosting inference speed by 30‑50% and dramatically reducing memory consumption for enterprise deployments.

AI inferenceINT4INT8

0 likes · 16 min read

How Baidu’s End‑to‑End Quantization Stack Supercharges Large‑Model Inference on Kunlun XPU

Architecture & Thinking

Mar 1, 2026 · Artificial Intelligence

Why DeepSeek V4 Prioritizes Chinese Chips Over Nvidia – A Game‑Changer for AI Compute

DeepSeek’s upcoming V4 model breaks industry norms by prioritizing Huawei’s Ascend chips over Nvidia GPUs, offering over 30% performance gains, ultra‑long context windows, native multimodal abilities, and dramatically lower inference costs, signaling a shift toward autonomous AI compute in China.

AI computeAI modelsChinese chips

0 likes · 6 min read

Why DeepSeek V4 Prioritizes Chinese Chips Over Nvidia – A Game‑Changer for AI Compute

Weekly Large Model Application

Feb 27, 2026 · Industry Insights

Edge AI’s 2026 Boom: Taalas HC1’s Disruption and China’s Key Takeaways

The article explains how the Taalas HC1 edge‑AI chip, with 17,000 tokens/s inference speed, 90 % lower power and 1/20 the cost of Nvidia H200 GPUs, proves that dedicated, non‑general‑purpose silicon can overcome latency, privacy and expense barriers, making on‑device large‑model deployment essential in 2026 and offering a strategic roadmap for Chinese chip makers.

AI chipsChinaTaalas HC1

0 likes · 12 min read

Edge AI’s 2026 Boom: Taalas HC1’s Disruption and China’s Key Takeaways

Baidu Intelligent Cloud Tech Hub

Feb 6, 2026 · Artificial Intelligence

Accelerating GLM‑4.x Inference on Kunlun XPU with SGLang & vLLM

Baidu’s Baige team successfully adapted the GLM‑4.x series language models to the Kunlun XPU platform by leveraging SGLang and the vLLM‑Kunlun plugin, employing agile adaptation, precision alignment with torch_xray, and extensive performance tuning to achieve GPU‑level accuracy and superior inference speed.

AIXPUhardware acceleration

0 likes · 6 min read

Accelerating GLM‑4.x Inference on Kunlun XPU with SGLang & vLLM

AntTech

Jan 14, 2026 · Artificial Intelligence

Boosting Secure AI: HAWK Accelerator and FHEFusion Compiler Break New Ground

This article highlights two cutting‑edge works from Ant Group’s research team—HAWK, a fixed‑word key decomposition switching accelerator that overcomes hardware challenges for FHE, and FHEFusion, a compiler framework that introduces operator fusion to dramatically speed CKKS‑based DNN inference—showcasing their designs, optimizations, and experimental gains.

DNN inferenceFully Homomorphic EncryptionSecure AI

0 likes · 7 min read

Boosting Secure AI: HAWK Accelerator and FHEFusion Compiler Break New Ground

Architects' Tech Alliance

Nov 1, 2025 · Artificial Intelligence

Why Optical Computing Could Break the AI Power Wall – A Deep Dive

This article systematically reviews the development background, core technologies, industry challenges, and practical progress of optical computing, highlighting its strategic value as a new post‑Moore computing paradigm for AI workloads in the future.

AI hardwarehardware accelerationoptical computing

0 likes · 17 min read

Why Optical Computing Could Break the AI Power Wall – A Deep Dive

Architects' Tech Alliance

Sep 15, 2025 · Artificial Intelligence

Why CPUs and GPUs Struggle with AI and How Specialized AI Chips Are Changing the Game

The article examines the limitations of traditional von‑Neumann CPUs and power‑hungry GPUs for modern AI workloads, explains the rise of ASIC and FPGA based AI accelerators, compares major industry solutions, and highlights why reconfigurable, low‑power AI chips are becoming essential for robotics and edge computing.

AI chipsASICFPGA

0 likes · 11 min read

Why CPUs and GPUs Struggle with AI and How Specialized AI Chips Are Changing the Game

Architects' Tech Alliance

Jun 3, 2025 · Artificial Intelligence

Comprehensive Analysis of RDMA Technology: Principles, Features, Products, and Applications in HPC, AI, and Cloud Storage

The article provides an in‑depth technical overview of Remote Direct Memory Access (RDMA), covering its zero‑copy, kernel‑bypass, and protocol‑offload features, hardware and software ecosystems, and its impact on high‑performance computing, artificial intelligence, cloud storage, finance, and edge computing.

High-performance computingNetwork ProtocolsRDMA

0 likes · 10 min read

Comprehensive Analysis of RDMA Technology: Principles, Features, Products, and Applications in HPC, AI, and Cloud Storage

AntTech

May 20, 2025 · Information Security

FAST and Neo: New Hardware Accelerators for Scalable Fully Homomorphic Encryption

The article reviews two recent ISCA 2025 papers—FAST and Neo—that introduce hardware and GPU‑based accelerators employing hoisting, KLSS, and Tensor Core optimizations to significantly boost the performance of fully homomorphic encryption workloads.

Cryptographic OptimizationFully Homomorphic EncryptionGPU computing

0 likes · 6 min read

FAST and Neo: New Hardware Accelerators for Scalable Fully Homomorphic Encryption

DataFunTalk

Mar 3, 2025 · Artificial Intelligence

FlightVGM: FPGA-Accelerated Inference for Video Generation Models Wins Best Paper at FPGA 2025

The FlightVGM paper, awarded Best Paper at FPGA 2025, details a novel FPGA-based inference IP for video generation models that leverages time‑space activation sparsity, mixed‑precision DSP58 extensions, and adaptive scheduling to achieve up to 1.30× performance and 4.49× energy‑efficiency gains over a NVIDIA 3090 GPU while preserving model accuracy.

AIFPGAhardware acceleration

0 likes · 11 min read

FlightVGM: FPGA-Accelerated Inference for Video Generation Models Wins Best Paper at FPGA 2025

Architects' Tech Alliance

Feb 24, 2025 · Artificial Intelligence

NSA: Hardware‑Optimized Sparse Attention Mechanism from DeepSeek, Peking University and University of Washington

The NSA mechanism introduces a three‑branch hardware‑optimized sparse attention architecture—token compression, token selection, and sliding window—combined with learnable gating to balance global and local context, dramatically improving inference speed and efficiency for long‑context large language models.

AI ArchitectureDeepSeekSparse attention

0 likes · 5 min read

NSA: Hardware‑Optimized Sparse Attention Mechanism from DeepSeek, Peking University and University of Washington

Architects' Tech Alliance

Feb 4, 2025 · Artificial Intelligence

Why AI Frameworks Are the Backbone of Modern AI – Spotlight on MindSpore

The article explains what AI frameworks are, why they act as the operating system of artificial intelligence, showcases real‑world uses in transportation and finance, and provides an in‑depth analysis of Huawei's MindSpore framework, highlighting its development experience, hardware optimization, deployment flexibility, and enterprise‑grade security features.

AI FrameworkEnterprise AIMindSpore

0 likes · 7 min read

Why AI Frameworks Are the Backbone of Modern AI – Spotlight on MindSpore

Architects' Tech Alliance

Dec 6, 2024 · Industry Insights

How GPU Virtualization Works: Layers, Techniques, and Real-World Use Cases

This article explains the fundamentals of GPU architecture, the need for GPU virtualization, and walks through user‑level, kernel‑level, hardware‑level, and full GPU virtualization techniques, illustrating each layer with diagrams and code examples while highlighting practical deployment scenarios.

GPUcloud infrastructurehardware acceleration

0 likes · 10 min read

How GPU Virtualization Works: Layers, Techniques, and Real-World Use Cases

AntTech

Oct 21, 2024 · Information Security

Second Homomorphic Encryption Computing Performance Optimization Forum – Hardware Accelerators

The second Homomorphic Encryption Computing Performance Optimization Forum, held on October 26 at the Summer Garden venue, gathers leading researchers to discuss hardware acceleration, cross‑disciplinary challenges, and recent advances in privacy‑preserving computation, presenting detailed abstracts and speaker bios for five technical sessions.

conferencehardware acceleration

0 likes · 9 min read

Second Homomorphic Encryption Computing Performance Optimization Forum – Hardware Accelerators

Linux Code Review Hub

Sep 2, 2024 · Fundamentals

Understanding DMA and the RIFFA Architecture: Block vs Scatter‑Gather

This article explains DMA fundamentals, compares Block DMA and Scatter‑Gather DMA step by step, and evaluates the open‑source RIFFA PCIe framework, including its hardware flow, software components, board‑level performance tests, features, drawbacks, and licensing terms.

Block DMADMAFPGA

0 likes · 18 min read

Understanding DMA and the RIFFA Architecture: Block vs Scatter‑Gather

Baobao Algorithm Notes

Aug 27, 2024 · Industry Insights

What Real‑World LLM Researchers Face: Scaling Limits, Data Bottlenecks, and Deployment Challenges

The author shares a candid account of recent large‑model experiments, highlighting why most labs struggle to exceed 100 B parameters, how data and hardware constraints shape model iteration, and the practical engineering, safety, and multimodal challenges that dictate real‑world LLM deployment.

AI industryAI scalingLLM

0 likes · 6 min read

What Real‑World LLM Researchers Face: Scaling Limits, Data Bottlenecks, and Deployment Challenges

Architects' Tech Alliance

Jul 13, 2024 · Operations

How to Supercharge Kunpeng CPUs: Real‑World Performance Tuning Techniques

This article provides a comprehensive guide to optimizing Kunpeng‑based servers, covering hardware characteristics, matrix multiplication benchmarks, Von Neumann architecture insights, soft and hard acceleration, compiler and JDK tweaks, NUMA tuning, Nginx and OpenSSL acceleration, disk and network optimizations, application‑level tuning, and a step‑by‑step MariaDB performance‑tuning checklist.

CPU performanceDatabase TuningKunpeng

0 likes · 16 min read

How to Supercharge Kunpeng CPUs: Real‑World Performance Tuning Techniques

Open Source Linux

May 22, 2024 · Artificial Intelligence

Why GPUs Are the Powerhouse Behind Modern AI: A Deep Dive

This article explains how GPUs, with their parallel architecture and extensive software ecosystem, have become essential for accelerating AI training and inference, outperforming CPUs and shaping the future of artificial intelligence across various industries.

GPUartificial-intelligencedeep learning

0 likes · 10 min read

Why GPUs Are the Powerhouse Behind Modern AI: A Deep Dive

AI Cyberspace

May 15, 2024 · Cloud Computing

How Intel’s HDSLB Redefines High‑Performance L4 Load Balancing for Cloud and Edge

Intel’s HDSLB series introduces a high‑density, scalable L4 load balancer that leverages Intel Xeon CPU instructions and E810 NIC acceleration to deliver multi‑core linear performance, superior throughput, and robust features for cloud and edge networking, outperforming traditional LB solutions.

IntelL4cloud-computing

0 likes · 16 min read

How Intel’s HDSLB Redefines High‑Performance L4 Load Balancing for Cloud and Edge

360 Smart Cloud

Apr 3, 2024 · Backend Development

Understanding FFmpeg Hardware Acceleration Architecture and Implementation

FFmpeg provides a comprehensive, cross‑platform hardware acceleration framework that abstracts diverse GPU and dedicated video codec interfaces, defines HWContext types, device and frame contexts, and various codec configuration methods, enabling efficient video encoding, decoding, and filtering while addressing performance, compatibility, and pipeline complexity challenges.

FFmpegGPUVideo Processing

0 likes · 10 min read

Understanding FFmpeg Hardware Acceleration Architecture and Implementation

vivo Internet Technology

Mar 13, 2024 · Operations

Optimizing VUA HTTPS Forwarding Performance with Intel QuickAssist Technology (QAT)

By integrating Intel QuickAssist hardware and AVX‑512 software acceleration into the VUA component of Vivo’s load‑balancing platform, the article demonstrates asynchronous OpenSSL offloading that boosts HTTPS forwarding throughput to roughly 44 000 QPS with QAT cards and 51 000 QPS with software, while preserving scalability and security.

HTTPSIntel QATSSL/TLS

0 likes · 12 min read

Optimizing VUA HTTPS Forwarding Performance with Intel QuickAssist Technology (QAT)

Baidu Intelligent Cloud Tech Hub

Oct 18, 2023 · Cloud Computing

How AI Is Redefining Cloud Computing: From Scale‑Up to Serverless

The talk explores how the rise of large AI models is transforming cloud computing architecture, workloads, and services—shifting from traditional virtualization to heterogeneous compute, massive scaling, serverless infrastructures, and new networking designs that together enable agile AI‑native applications.

AI-nativeCloud ComputingServerless

0 likes · 23 min read

How AI Is Redefining Cloud Computing: From Scale‑Up to Serverless

DataFunSummit

Sep 8, 2023 · Artificial Intelligence

AI Compiler Forum at DataFun Summit 2023: Tile-Based Deep Learning Compilation, Graph Scheduling for Domain‑Specific Accelerators, and Triton on Hopper

The DataFun Summit 2023 AI Compiler Forum gathered leading researchers to present cutting‑edge techniques on tile‑based deep learning compilation, efficient graph scheduling for domain‑specific accelerators, large‑model deployment, and the latest advancements of OpenAI Triton on NVIDIA Hopper, offering practical insights for AI system developers.

AI compilerGraph SchedulingLarge Model Deployment

0 likes · 8 min read

AI Compiler Forum at DataFun Summit 2023: Tile-Based Deep Learning Compilation, Graph Scheduling for Domain‑Specific Accelerators, and Triton on Hopper

Architects' Tech Alliance

Sep 6, 2023 · Industry Insights

What Is a DPU and Why It’s the Next Big Chip for Data Centers

The article explains the role of Data Processing Units (DPUs) as the third major data‑center chip after CPUs and GPUs, outlines China Mobile’s DPU‑focused bare‑metal server initiative, and provides links to technical papers and white‑papers that detail DPU architecture, performance benchmarks, and industry trends.

Bare Metal ServerCloud ComputingDPU

0 likes · 4 min read

What Is a DPU and Why It’s the Next Big Chip for Data Centers

Network Intelligence Research Center (NIRC)

Jun 24, 2023 · Artificial Intelligence

How DFX Achieves Low-Latency Multi-FPGA Acceleration for Transformer Text Generation

The article reviews the DFX system—a multi‑FPGA server that uses model‑parallelism and a ring‑topology interconnect to accelerate GPT‑2 text generation, showing 3.78× higher throughput, 3.99× better energy efficiency, and 8.21× greater cost‑effectiveness compared with a four‑GPU V100 baseline.

FPGAGPT-2Transformer

0 likes · 6 min read

How DFX Achieves Low-Latency Multi-FPGA Acceleration for Transformer Text Generation

Xiaohongshu Tech REDtech

May 15, 2023 · Artificial Intelligence

GPU-Accelerated Inference Optimization for Large-Scale Machine Learning at Xiaohongshu

Xiaohongshu transformed its recommendation, advertising, and search inference pipeline by migrating to GPU‑centric hardware, deploying a custom TensorFlow‑Core Lambda service, and applying system‑level, virtualization, and compute‑level optimizations—including NUMA binding, kernel fusion, dynamic scaling, and FP16 quantization—achieving roughly 30× compute capacity growth, over 10% user‑metric gains, and more than 50% cluster‑resource savings.

GPU OptimizationMachine Learning InferenceSystem Performance

0 likes · 20 min read

GPU-Accelerated Inference Optimization for Large-Scale Machine Learning at Xiaohongshu

Baidu Intelligent Cloud Tech Hub

May 15, 2023 · Cloud Computing

How Baidu’s UNP Platform Supercharges Load‑Balancing to 1 Tbps

This article explains the limitations of traditional X86‑DPDK load‑balancing gateways and how Baidu’s third‑generation Universal Networking Platform (UNP) combines programmable ASICs, CPUs, and FPGA acceleration to deliver multi‑terabit throughput, ultra‑low latency, and dramatically lower cost and power consumption.

Baidu CloudCloud ComputingUNP

0 likes · 11 min read

How Baidu’s UNP Platform Supercharges Load‑Balancing to 1 Tbps

Architects' Tech Alliance

May 15, 2023 · Artificial Intelligence

AI ASIC Landscape: Google TPU Evolution, Intel Habana Gaudi 2, IBM AIU, and Samsung Warboy NPU

The article surveys the rapid entry of leading vendors into the AI ASIC market, detailing Google’s TPU generations, Intel’s acquisition of Habana Labs and the Gaudi 2 chip, IBM’s upcoming AIU, Samsung’s Warboy NPU, and the performance, architectural, and future trends of ASICs for AI inference and training.

AI ASICGaudiTPU

0 likes · 11 min read

AI ASIC Landscape: Google TPU Evolution, Intel Habana Gaudi 2, IBM AIU, and Samsung Warboy NPU

Tencent Tech

Apr 18, 2023 · Artificial Intelligence

How Tencent’s Zixiao AI Chip Supercharges Real‑Time Meeting Subtitles

Tencent’s home‑grown Zixiao AI inference chip, combined with the LightRuntime engine, dramatically reduces latency and cost for real‑time subtitles in Tencent Meeting, handling tens of thousands of concurrent audio streams while meeting sub‑second delay requirements through hardware‑software co‑optimizations and mixed‑precision model tuning.

Real-time Speech RecognitionTencent Meetinghardware acceleration

0 likes · 16 min read

How Tencent’s Zixiao AI Chip Supercharges Real‑Time Meeting Subtitles

Laravel Tech Community

Mar 1, 2023 · Backend Development

FFmpeg 6.0 “Von Neumann” Release: New Codecs, Filters, and ABI Changes

FFmpeg 6.0 "Von Neumann" introduces a host of new decoders, encoders, filters, hardware‑accelerated AV1 support, ABI versioning, and numerous performance and API improvements, marking a major, more structured release cycle for the multimedia framework.

AV1FFmpegFilters

0 likes · 5 min read

FFmpeg 6.0 “Von Neumann” Release: New Codecs, Filters, and ABI Changes

Architects' Tech Alliance

Jan 13, 2023 · Fundamentals

2022 DPU Development Analysis Report and Related Network Technologies

The 2022 DPU Development Analysis Report outlines the evolution of Data Processing Units from CPU/NP and FPGA‑CPU architectures to ASIC‑CPU designs, discusses RDMA high‑speed networking, data‑plane forwarding techniques, network programmability, and the emerging open DPU software ecosystem, highlighting their performance, power, and cost implications for modern data centers.

ASICDPUData Plane

0 likes · 14 min read

2022 DPU Development Analysis Report and Related Network Technologies

Architects' Tech Alliance

Nov 1, 2022 · Databases

2022 China Database Industry Report: Emerging Hardware and Architectural Innovations

The September 2022 China Database Industry Analysis report highlights a wave of hardware‑driven innovations—including multi‑core CPUs, heterogeneous GPUs/TPUs/DPU, programmable FPGAs, CXL‑DDR5, persistent memory, NVMe‑oF, and RDMA‑based storage—that enable massive data storage and high‑concurrency real‑time computing across a range of novel database architectures and products.

DatabasesGPUOLTP

0 likes · 10 min read

2022 China Database Industry Report: Emerging Hardware and Architectural Innovations

Qingyun Technology Community

Sep 15, 2022 · Cloud Computing

How GPU, VPU, and CPU Accelerate Cloud Video Transcoding: Architecture and Best Practices

This article explores the rapid growth of video traffic, explains why transcoding is essential, compares CPU, GPU, and VPU hardware for video processing, details the FFmpeg software stack, describes the design of a cloud‑native transcoding cluster, its scheduling, shard‑transcoding technique, and presents performance test results.

Cloud ComputingFFmpegGPU Acceleration

0 likes · 23 min read

How GPU, VPU, and CPU Accelerate Cloud Video Transcoding: Architecture and Best Practices

NetEase Smart Enterprise Tech+

Sep 14, 2022 · Cloud Computing

How Intel’s End‑to‑End Audio‑Video Optimizations Power the Next Cloud Media Experience

The article reviews Intel’s end‑to‑end audio‑video optimization solutions presented at the 2022 NetEase Audio‑Video Technology Conference, covering market trends, hardware accelerators, software stacks, and future data‑center strategies that together enable high‑quality, cost‑effective streaming in the cloud era.

Cloud ComputingIntelMedia Optimization

0 likes · 7 min read

How Intel’s End‑to‑End Audio‑Video Optimizations Power the Next Cloud Media Experience

Baidu Tech Salon

Jun 13, 2022 · Artificial Intelligence

Kunlun Core AI Chips: Making Computing Smarter

The 2022 Beijing Zhiyuan Conference report by Kunlun Core’s chip R&D director outlines AI chip market opportunities and challenges, describes the company’s shift from FPGA clusters to a programmable XPU‑R architecture with 7nm, 256 TOPS INT8 performance, GDDR6 memory and PCIe 4.0, and details current deployments and plans for third‑ and fourth‑generation chips.

AI acceleratorAI chipGDDR6

0 likes · 12 min read

Kunlun Core AI Chips: Making Computing Smarter

Alibaba Cloud Infrastructure

May 31, 2022 · Information Security

Fidas: FPGA‑Based Comprehensive Offloading for Cloud Intrusion Detection (ISCA 2022 Full‑Score Paper)

The ISCA 2022 full‑score paper “Fidas: Fortifying the Cloud via Comprehensive FPGA‑based Offloading for Intrusion Detection” presents a novel FPGA‑accelerated IDS architecture that jointly offloads regex matching and traffic classification, achieving high flexibility, rapid rule updates, balanced load, and line‑rate performance in cloud data centers.

FPGAISCAIntrusion Detection

0 likes · 7 min read

Fidas: FPGA‑Based Comprehensive Offloading for Cloud Intrusion Detection (ISCA 2022 Full‑Score Paper)

Baidu Geek Talk

Feb 7, 2022 · Mobile Development

Optimizing Video Playback: Soft/Hardware Decoding Strategies for Baidu Android App

The article evaluates software versus hardware video decoding for Baidu’s Android app, presents benchmark data showing surface‑mode hardware decoding’s superior efficiency, identifies compatibility and first‑frame latency challenges, and proposes a monitoring module plus seamless soft‑to‑hard decoder switching to achieve high hardware‑decode usage while maintaining fast startup and low error rates.

AndroidMediaCodecPerformance Optimization

0 likes · 11 min read

Optimizing Video Playback: Soft/Hardware Decoding Strategies for Baidu Android App

NetEase Cloud Music Tech Team

Dec 29, 2021 · Frontend Development

Understanding Browser Compositing Layers: A Guide to CSS Hardware Acceleration

The article explains how browsers build render trees and use GPU‑accelerated compositing layers—created by properties like transform, will‑change, or media elements—to improve performance, avoid repaint glitches such as iOS timer flicker, and offers best‑practice tips for using these layers efficiently without excess memory use.

CSS OptimizationGPU renderingbrowser rendering

0 likes · 10 min read

Understanding Browser Compositing Layers: A Guide to CSS Hardware Acceleration

AntTech

Dec 21, 2021 · Information Security

Hardware‑Software Integration Accelerates Privacy Computing: Technical Overview

The article explains how combining hardware and software solutions can address the data‑lifecycle security and cryptographic performance challenges of privacy computing, describing the underlying technology stack, acceleration techniques, and the integrated privacy‑computing appliance released by Ant Group.

Data SecurityPrivacy Computingcryptography

0 likes · 13 min read

Hardware‑Software Integration Accelerates Privacy Computing: Technical Overview

21CTO

Dec 9, 2021 · Artificial Intelligence

How Alibaba’s DAMO Academy Is Redefining AI with the First 3D‑Stacked Compute‑Memory Chip

On December 3, Alibaba’s DAMO Academy announced its first AI chip that integrates memory and compute using hybrid‑bond 3D stacking, promising ten‑fold performance gains and 300× energy efficiency for AI workloads such as recommendation systems, and marking a shift from traditional von Neumann designs.

3D stackingAI chipCompute-in-Memory

0 likes · 5 min read

How Alibaba’s DAMO Academy Is Redefining AI with the First 3D‑Stacked Compute‑Memory Chip

Architects' Tech Alliance

Oct 20, 2021 · Fundamentals

Overview of the Specialized Data Processing Unit (DPU) Technology Whitepaper

The whitepaper from the Institute of Computing Technology, Chinese Academy of Sciences, provides a comprehensive analysis of DPU background, technical characteristics, reference architecture, application scenarios, and a comparative review of existing DPU products, highlighting its role in modern data‑center infrastructures.

DPUData CenterData Processing Unit

0 likes · 24 min read

Overview of the Specialized Data Processing Unit (DPU) Technology Whitepaper

Architects' Tech Alliance

Sep 5, 2021 · Fundamentals

Overview of Data Processing Units (DPUs) and Their Evolution in Data Centers

Data Processing Units (DPUs) have evolved from early I/O processors to modern programmable ASICs and FPGA-based accelerators, integrating networking, storage, and compute functions to offload workloads from CPUs, with contributions from companies like Fungible, Nvidia, Intel, and emerging Chinese firms, shaping data‑center and edge architectures.

DPUData CenterFPGA

0 likes · 13 min read

Overview of Data Processing Units (DPUs) and Their Evolution in Data Centers

WeChat Client Technology Team

Aug 10, 2021 · Mobile Development

How We Built a Cross‑Platform Hardware‑Accelerated Live‑Streaming SDK for WeChat Video Channels

This article details the design and implementation of a cross‑platform SDK that enables external hardware devices to stream live video on WeChat Video Channels, covering user authentication, network signaling, UI integration, audio‑video encoding, and hardware acceleration across Android, iOS, PC and embedded platforms.

Video Encodinghardware accelerationsdk

0 likes · 11 min read

How We Built a Cross‑Platform Hardware‑Accelerated Live‑Streaming SDK for WeChat Video Channels

Architects' Tech Alliance

Jul 23, 2021 · Artificial Intelligence

2021 Overview of China’s Data Processing Unit (DPU) Industry

The article provides a comprehensive analysis of China’s DPU market in 2021, covering DPU definitions, classifications, technology roadmaps, industry chain, business models, key applications, competitive landscape, and future trends in data centers, edge computing, telecom, and autonomous driving.

ChinaDPUData Center

0 likes · 12 min read

2021 Overview of China’s Data Processing Unit (DPU) Industry

NetEase Smart Enterprise Tech+

Jul 20, 2021 · Backend Development

How NetEase Cloud Accelerates Video Transcoding with Slice‑Based Parallelism

NetEase Cloud’s video transcoding service boosts processing speed by combining hardware acceleration, custom codecs, AMD EPYC servers, and a slice‑based parallel transcoding pipeline, while optimizing cluster task scheduling and handling straggler issues to achieve significant performance gains across large‑scale media workloads.

Distributed ProcessingTask schedulingVideo Transcoding

0 likes · 16 min read

How NetEase Cloud Accelerates Video Transcoding with Slice‑Based Parallelism

Architects' Tech Alliance

Jul 16, 2021 · Artificial Intelligence

AI Chip Landscape: GPUs, FPGAs, and ASICs for Deep Learning

The article explains how artificial intelligence relies on algorithms, compute and data, compares engineering and simulation methods, and details the roles, architectures, performance and energy characteristics of GPUs, FPGAs, and ASICs as the primary hardware accelerators for modern deep‑learning applications.

ASICFPGAGPU

0 likes · 14 min read

AI Chip Landscape: GPUs, FPGAs, and ASICs for Deep Learning

DataFunTalk

Feb 3, 2021 · Artificial Intelligence

Towards Best Possible Deep Learning Acceleration on the Edge – A Compression-Compilation Co-Design Framework

The lecture presented by Assistant Professor Yanzhi Wang introduces a compression‑compilation co‑design framework (CoCoPIE) that achieves real‑time deep‑learning inference on edge devices through novel pruning and quantization techniques, delivering up to 180× speedup without accuracy loss.

AIdeep learningedge computing

0 likes · 5 min read

Towards Best Possible Deep Learning Acceleration on the Edge – A Compression-Compilation Co-Design Framework

JD Cloud Developers

Dec 21, 2020 · Artificial Intelligence

Weekly Tech Highlights: AI Chip, Cloud Forecasts, Docker M1 Preview & More

This week’s developer newsletter spotlights the Chinese Academy of Sciences’ pioneering GNN accelerator chip, IDC’s ten cloud computing predictions for China, the booming IoT market and 5G dominance, Docker’s M1‑compatible desktop preview, a carbon‑nanotube transistor breakthrough, IBM’s FHE initiative, and recent AI research on lifelong learning and reinforcement learning exploration.

DockerIoTartificial-intelligence

0 likes · 7 min read

Weekly Tech Highlights: AI Chip, Cloud Forecasts, Docker M1 Preview & More

Architects' Tech Alliance

Nov 5, 2020 · Fundamentals

SmartNICs: Types, Advantages, and Design Considerations for Data Center Acceleration

The article provides a comprehensive overview of SmartNICs, describing their three main architectures—multicore ASIC, FPGA‑based, and FPGA‑enhanced—detailing performance benefits, design trade‑offs, and example feature extensions for modern data‑center networking workloads.

ASICData CenterFPGA

0 likes · 12 min read

SmartNICs: Types, Advantages, and Design Considerations for Data Center Acceleration

Meituan Technology Team

Nov 5, 2020 · Databases

Database System Research and Future Trends – Talk by Prof. Zhou Xuan (ECNU)

In his 2020 talk, Prof. Zhou Xuan outlined the evolution of database systems, highlighted ECNU’s research on distributed transactions, hardware‑aware modular designs, HTAP and system decoupling, and argued that future databases will be increasingly modular, hardware‑adapted, and driven by close academia‑industry collaboration.

Cloud Computingdatabasehardware acceleration

0 likes · 38 min read

Database System Research and Future Trends – Talk by Prof. Zhou Xuan (ECNU)

JD Tech Talk

Oct 28, 2020 · Backend Development

Performance Optimization of SSL/TLS in JD.com JDDLB Load Balancer Using Freescale Acceleration Cards

This article describes the architecture of JD.com’s JDDLB public‑traffic load balancer and details how offloading CPU‑intensive SSL/TLS cryptographic operations to Freescale C291 acceleration cards—via custom NGINX modules, OpenSSL Engine integration, and synchronous/asynchronous driver interfaces—significantly improves connection‑establishment rates and overall throughput.

OpenSSLPerformance OptimizationSSL/TLS

0 likes · 30 min read

Performance Optimization of SSL/TLS in JD.com JDDLB Load Balancer Using Freescale Acceleration Cards

IT Xianyu

Sep 27, 2020 · Cloud Computing

The Rise of Cloud Computing: From Moore's Law to Alibaba's Shenlong Architecture

The article examines the end of Moore's Law, the rapid growth of cloud computing, the challenges of virtualization overhead, and how Alibaba's Shenlong architecture leverages hardware acceleration to revive performance gains and reshape the future of hardware‑software co‑evolution.

AlibabaCloud ComputingMoore's Law

0 likes · 7 min read

The Rise of Cloud Computing: From Moore's Law to Alibaba's Shenlong Architecture

DataFunTalk

Mar 24, 2020 · Databases

ByteDance’s Enhancements to RocksDB: LazyBuffer, Adaptive Map, KV Separation, Multi‑Index, Extreme Compression, and New Hardware Support

This article describes ByteDance’s extensive improvements to the RocksDB storage engine—including LazyBuffer, Adaptive Map‑based lazy compaction, KV separation, adaptive multi‑index support, extreme compression techniques, and hardware acceleration—to reduce amplification, improve performance, and lower costs for large‑scale database workloads.

CompactionIndexingKV Separation

0 likes · 14 min read

ByteDance’s Enhancements to RocksDB: LazyBuffer, Adaptive Map, KV Separation, Multi‑Index, Extreme Compression, and New Hardware Support

Architects' Tech Alliance

Feb 8, 2020 · Cloud Computing

Demystifying FPGA: Architecture, Performance, and Microsoft's Data Center Deployment

FPGA, a reconfigurable hardware architecture, offers low latency and high efficiency compared to CPUs, GPUs, and ASICs, making it ideal for both compute‑intensive and communication‑intensive tasks, and Microsoft’s multi‑stage data‑center deployments illustrate its scalability, flexibility, and impact on cloud services.

Data CenterFPGANetwork Virtualization

0 likes · 21 min read

Demystifying FPGA: Architecture, Performance, and Microsoft's Data Center Deployment

Architects' Tech Alliance

Oct 11, 2019 · Cloud Computing

Understanding FPGA: Architecture, Advantages, and Microsoft’s Data‑Center Deployments

This article explains what FPGA (Field‑Programmable Gate Array) is, why it offers lower latency and higher energy efficiency than CPUs or GPUs for both compute‑intensive and communication‑intensive workloads, and details Microsoft’s three‑generation FPGA deployment strategy in its data‑center and cloud infrastructure.

Data CenterFPGAhardware acceleration

0 likes · 20 min read

Understanding FPGA: Architecture, Advantages, and Microsoft’s Data‑Center Deployments

Architects' Tech Alliance

Sep 8, 2019 · Fundamentals

An Overview of FPGA Technology: History, Architecture, Development Process, and Applications

This article provides a comprehensive overview of FPGA technology, covering its definition, historical development, major manufacturers, internal architecture, development workflow, challenges, and typical application scenarios such as data centers, telecommunications, and AI acceleration.

Data CenterFPGAartificial-intelligence

0 likes · 12 min read

An Overview of FPGA Technology: History, Architecture, Development Process, and Applications

Architects' Tech Alliance

Sep 5, 2019 · Fundamentals

GPU Origin, Architecture, and Acceleration Technologies (CUDA & OpenCL)

This article explains the history and origin of GPUs, compares CPU and GPU architectures, describes the GPU processing pipeline, and introduces acceleration technologies such as CUDA and OpenCL, highlighting their programming models, supported languages, and key performance metrics.

CUDAGPUGraphics Processing

0 likes · 14 min read

GPU Origin, Architecture, and Acceleration Technologies (CUDA & OpenCL)

Qunar Tech Salon

Sep 5, 2019 · Artificial Intelligence

Implementing Bilinear Interpolation on FPGA for Neural Network Acceleration

The article explains the principles of bilinear interpolation, why it is needed for smooth image scaling in neural‑network layers such as Interp and Resize, and details FPGA‑specific optimizations—including lookup‑table based coefficient pre‑computation, two‑line BRAM caching, and index‑driven data swapping—to reduce DSP usage and improve throughput.

BRAMBilinear InterpolationDSP

0 likes · 14 min read

Implementing Bilinear Interpolation on FPGA for Neural Network Acceleration

Efficient Ops

Oct 17, 2018 · Cloud Computing

How OpenStack Cyborg Unifies Management of GPUs, FPGAs, ASICs and Other Accelerators

The OpenStack Cyborg project provides a generic framework that lets cloud platforms discover, schedule, and control proprietary accelerators such as GPUs, FPGAs, ASICs and SoCs, solving resource waste in AI, NFV, edge and HPC workloads.

FPGAGPUOpenStack

0 likes · 5 min read

How OpenStack Cyborg Unifies Management of GPUs, FPGAs, ASICs and Other Accelerators

Tencent Cloud Developer

Aug 17, 2018 · Cloud Computing

FPGA Acceleration: Exploration and Practice for Data Centers and Cloud Services

In his 2018 Trusted Cloud Conference talk, Tencent FPGA expert Zhang Heng explained how the rapid growth of data and AI workloads drives data‑center and cloud operators to adopt FPGA acceleration for its high‑throughput, low‑latency, programmable performance, citing Tencent’s successes in image transcoding, content‑moderation, AI inference and gene‑sequencing, while outlining ecosystem challenges and future plans for scalable cloud‑FPGA services.

AI accelerationData CenterFPGA

0 likes · 18 min read

FPGA Acceleration: Exploration and Practice for Data Centers and Cloud Services

JD Retail Technology

Jul 16, 2018 · Mobile Development

Android RenderThread and Asynchronous Animation Rendering: Deep Dive

This article explains Android's RenderThread, its role in hardware-accelerated UI rendering, how it enables asynchronous animation via ViewPropertyAnimator, and provides code examples demonstrating RenderThread-driven animation that remains smooth even when the UI thread is blocked.

AndroidRenderThreadUI Performance

0 likes · 15 min read

Android RenderThread and Asynchronous Animation Rendering: Deep Dive

Alibaba Cloud Developer

Jun 15, 2018 · Cloud Computing

How Alibaba Cloud’s New F3 FPGA Instance Revolutionizes Cloud Acceleration

Alibaba Cloud introduces the F3 FPGA instance, a dual‑chip VU9P‑based accelerator that combines a unified HDK/SDK platform, secure IP marketplace, and ultra‑high‑speed interconnects to make FPGA acceleration more accessible, flexible, and cost‑effective for cloud users.

Alibaba CloudCloud ComputingFPGA

0 likes · 14 min read

How Alibaba Cloud’s New F3 FPGA Instance Revolutionizes Cloud Acceleration

Architects' Tech Alliance

Apr 18, 2018 · Fundamentals

Understanding GPU Architecture and Its Evolution

This article explains the historical development of graphics processing units, their internal structure, rendering pipeline, and how GPUs shifted graphics workloads from CPUs to specialized parallel hardware, highlighting key concepts such as vertex shaders, pixel shaders, SIMD architectures, and performance growth.

GPURendering Pipelinecomputer fundamentals

0 likes · 11 min read

Understanding GPU Architecture and Its Evolution

Alibaba Cloud Developer

Apr 9, 2018 · Databases

How FPGA Acceleration Supercharges X-Engine’s Compaction for 10× MySQL Performance

This article introduces Alibaba’s X‑Engine storage engine, the foundation of the next‑generation distributed database X‑DB, and explains how FPGA‑accelerated compaction and asynchronous scheduling dramatically improve write‑intensive OLTP performance, reduce CPU contention, and achieve up to 50 % throughput gains while maintaining fault tolerance.

CompactionFPGALSM‑Tree

0 likes · 21 min read

How FPGA Acceleration Supercharges X-Engine’s Compaction for 10× MySQL Performance

AntTech

Jan 4, 2018 · Databases

Report on VLDB 2017 Conference: Insights and Highlights from Database Research

Attending VLDB 2017 in Munich, the report summarizes the conference’s broad coverage of database research—from new hardware‑accelerated prototypes and Spark‑based big‑data processing to Oracle and SAP HANA case studies, keynotes, notable papers, and reflections on industry trends and Chinese contributions.

Big DataQuery OptimizationVLDB

0 likes · 22 min read

Report on VLDB 2017 Conference: Insights and Highlights from Database Research

Alibaba Cloud Infrastructure

Dec 11, 2017 · Operations

FPGA-Based High-Compression Image Encoding: Architecture, Optimization, and Performance Evaluation

This article describes a project that replaces CPU‑based image compression with an FPGA solution, detailing the system hierarchy, two‑phase development (function verification and performance boost), pipeline and frequency optimizations, software‑FPGA interaction, and a measured 25‑fold speedup over a 64‑core server.

FPGAHigh CompressionParallel Encoding

0 likes · 6 min read

FPGA-Based High-Compression Image Encoding: Architecture, Optimization, and Performance Evaluation

Tencent Architect

Oct 20, 2017 · Artificial Intelligence

Design and Performance of a General‑Purpose FPGA CNN Accelerator for Real‑Time AI Services

This article presents a comprehensive overview of a universal FPGA‑based CNN accelerator, detailing its motivation, flexible architecture, compiler workflow, memory and compute unit designs, and performance comparisons that demonstrate significant latency and cost advantages over CPU and GPU solutions for real‑time AI inference.

AI inferenceCNN accelerationFPGA

0 likes · 13 min read

Design and Performance of a General‑Purpose FPGA CNN Accelerator for Real‑Time AI Services

21CTO

Sep 13, 2017 · Mobile Development

Mastering Android Video Encoding: Choosing Encoders and Optimizing YUV Processing

This article examines Android video recording challenges, compares hardware (MediaCodec) and software (FFmpeg + x264/openh264) encoders, highlights device‑specific pitfalls such as color‑format support and alignment, and presents fast NEON‑based algorithms for scaling, rotation, and mirroring of YUV frames.

AndroidFFmpegMediaCodec

0 likes · 12 min read

Mastering Android Video Encoding: Choosing Encoders and Optimizing YUV Processing

Meituan Technology Team

Jan 19, 2017 · Mobile Development

Understanding Hardware Acceleration in Android Applications

Hardware acceleration in Android shifts intensive floating‑point UI work from the CPU to the GPU by building DisplayLists on the CPU and rasterizing them on the GPU, allowing parallel processing, selective redraw of unchanged elements, and significantly higher frame rates for animations and complex graphics.

AndroidCPUDisplayList

0 likes · 14 min read

Understanding Hardware Acceleration in Android Applications