Tag

hardware acceleration

1 views collected around this technical thread.

Architects' Tech Alliance
Architects' Tech Alliance
Jun 3, 2025 · Artificial Intelligence

Comprehensive Analysis of RDMA Technology: Principles, Features, Products, and Applications in HPC, AI, and Cloud Storage

The article provides an in‑depth technical overview of Remote Direct Memory Access (RDMA), covering its zero‑copy, kernel‑bypass, and protocol‑offload features, hardware and software ecosystems, and its impact on high‑performance computing, artificial intelligence, cloud storage, finance, and edge computing.

Artificial IntelligenceHigh Performance ComputingRDMA
0 likes · 10 min read
Comprehensive Analysis of RDMA Technology: Principles, Features, Products, and Applications in HPC, AI, and Cloud Storage
AntTech
AntTech
May 20, 2025 · Information Security

FAST and Neo: New Hardware Accelerators for Scalable Fully Homomorphic Encryption

The article reviews two recent ISCA 2025 papers—FAST and Neo—that introduce hardware and GPU‑based accelerators employing hoisting, KLSS, and Tensor Core optimizations to significantly boost the performance of fully homomorphic encryption workloads.

Cryptographic OptimizationFully Homomorphic EncryptionGPU computing
0 likes · 6 min read
FAST and Neo: New Hardware Accelerators for Scalable Fully Homomorphic Encryption
DataFunTalk
DataFunTalk
Mar 3, 2025 · Artificial Intelligence

FlightVGM: FPGA-Accelerated Inference for Video Generation Models Wins Best Paper at FPGA 2025

The FlightVGM paper, awarded Best Paper at FPGA 2025, details a novel FPGA-based inference IP for video generation models that leverages time‑space activation sparsity, mixed‑precision DSP58 extensions, and adaptive scheduling to achieve up to 1.30× performance and 4.49× energy‑efficiency gains over a NVIDIA 3090 GPU while preserving model accuracy.

AIFPGAMixed Precision
0 likes · 11 min read
FlightVGM: FPGA-Accelerated Inference for Video Generation Models Wins Best Paper at FPGA 2025
Architects' Tech Alliance
Architects' Tech Alliance
Feb 24, 2025 · Artificial Intelligence

NSA: Hardware‑Optimized Sparse Attention Mechanism from DeepSeek, Peking University and University of Washington

The NSA mechanism introduces a three‑branch hardware‑optimized sparse attention architecture—token compression, token selection, and sliding window—combined with learnable gating to balance global and local context, dramatically improving inference speed and efficiency for long‑context large language models.

AI architectureDeepSeekSparse Attention
0 likes · 5 min read
NSA: Hardware‑Optimized Sparse Attention Mechanism from DeepSeek, Peking University and University of Washington
Architects' Tech Alliance
Architects' Tech Alliance
Nov 26, 2024 · Artificial Intelligence

Get Ready for a Shakeout in Edge NPUs

The article examines the rapid growth and increasing complexity of edge AI NPUs, discussing challenges in software and hardware acceleration, supply‑chain constraints, and the need for integrated engine solutions to sustain performance and power efficiency.

NPUedge AIhardware acceleration
0 likes · 9 min read
Get Ready for a Shakeout in Edge NPUs
AntTech
AntTech
Oct 21, 2024 · Information Security

Second Homomorphic Encryption Computing Performance Optimization Forum – Hardware Accelerators

The second Homomorphic Encryption Computing Performance Optimization Forum, held on October 26 at the Summer Garden venue, gathers leading researchers to discuss hardware acceleration, cross‑disciplinary challenges, and recent advances in privacy‑preserving computation, presenting detailed abstracts and speaker bios for five technical sessions.

Conferencecryptographyhardware acceleration
0 likes · 9 min read
Second Homomorphic Encryption Computing Performance Optimization Forum – Hardware Accelerators
360 Smart Cloud
360 Smart Cloud
Apr 3, 2024 · Backend Development

Understanding FFmpeg Hardware Acceleration Architecture and Implementation

FFmpeg provides a comprehensive, cross‑platform hardware acceleration framework that abstracts diverse GPU and dedicated video codec interfaces, defines HWContext types, device and frame contexts, and various codec configuration methods, enabling efficient video encoding, decoding, and filtering while addressing performance, compatibility, and pipeline complexity challenges.

Backend DevelopmentFFmpegGPU
0 likes · 10 min read
Understanding FFmpeg Hardware Acceleration Architecture and Implementation
vivo Internet Technology
vivo Internet Technology
Mar 13, 2024 · Operations

Optimizing VUA HTTPS Forwarding Performance with Intel QuickAssist Technology (QAT)

By integrating Intel QuickAssist hardware and AVX‑512 software acceleration into the VUA component of Vivo’s load‑balancing platform, the article demonstrates asynchronous OpenSSL offloading that boosts HTTPS forwarding throughput to roughly 44 000 QPS with QAT cards and 51 000 QPS with software, while preserving scalability and security.

HTTPSIntel QATSSL/TLS
0 likes · 12 min read
Optimizing VUA HTTPS Forwarding Performance with Intel QuickAssist Technology (QAT)
DataFunSummit
DataFunSummit
Sep 8, 2023 · Artificial Intelligence

AI Compiler Forum at DataFun Summit 2023: Tile-Based Deep Learning Compilation, Graph Scheduling for Domain‑Specific Accelerators, and Triton on Hopper

The DataFun Summit 2023 AI Compiler Forum gathered leading researchers to present cutting‑edge techniques on tile‑based deep learning compilation, efficient graph scheduling for domain‑specific accelerators, large‑model deployment, and the latest advancements of OpenAI Triton on NVIDIA Hopper, offering practical insights for AI system developers.

AI CompilerGraph SchedulingLarge Model Deployment
0 likes · 8 min read
AI Compiler Forum at DataFun Summit 2023: Tile-Based Deep Learning Compilation, Graph Scheduling for Domain‑Specific Accelerators, and Triton on Hopper
Xiaohongshu Tech REDtech
Xiaohongshu Tech REDtech
May 15, 2023 · Artificial Intelligence

GPU-Accelerated Inference Optimization for Large-Scale Machine Learning at Xiaohongshu

Xiaohongshu transformed its recommendation, advertising, and search inference pipeline by migrating to GPU‑centric hardware, deploying a custom TensorFlow‑Core Lambda service, and applying system‑level, virtualization, and compute‑level optimizations—including NUMA binding, kernel fusion, dynamic scaling, and FP16 quantization—achieving roughly 30× compute capacity growth, over 10% user‑metric gains, and more than 50% cluster‑resource savings.

GPU optimizationLarge ModelsMachine Learning Inference
0 likes · 20 min read
GPU-Accelerated Inference Optimization for Large-Scale Machine Learning at Xiaohongshu
Architects' Tech Alliance
Architects' Tech Alliance
May 15, 2023 · Artificial Intelligence

AI ASIC Landscape: Google TPU Evolution, Intel Habana Gaudi 2, IBM AIU, and Samsung Warboy NPU

The article surveys the rapid entry of leading vendors into the AI ASIC market, detailing Google’s TPU generations, Intel’s acquisition of Habana Labs and the Gaudi 2 chip, IBM’s upcoming AIU, Samsung’s Warboy NPU, and the performance, architectural, and future trends of ASICs for AI inference and training.

AI ASICGaudiTPU
0 likes · 11 min read
AI ASIC Landscape: Google TPU Evolution, Intel Habana Gaudi 2, IBM AIU, and Samsung Warboy NPU
Tencent Tech
Tencent Tech
Apr 18, 2023 · Artificial Intelligence

How Tencent’s Zixiao AI Chip Supercharges Real‑Time Meeting Subtitles

Tencent’s home‑grown Zixiao AI inference chip, combined with the LightRuntime engine, dramatically reduces latency and cost for real‑time subtitles in Tencent Meeting, handling tens of thousands of concurrent audio streams while meeting sub‑second delay requirements through hardware‑software co‑optimizations and mixed‑precision model tuning.

AI inferenceTencent Meetinghardware acceleration
0 likes · 16 min read
How Tencent’s Zixiao AI Chip Supercharges Real‑Time Meeting Subtitles
Laravel Tech Community
Laravel Tech Community
Mar 1, 2023 · Backend Development

FFmpeg 6.0 “Von Neumann” Release: New Codecs, Filters, and ABI Changes

FFmpeg 6.0 "Von Neumann" introduces a host of new decoders, encoders, filters, hardware‑accelerated AV1 support, ABI versioning, and numerous performance and API improvements, marking a major, more structured release cycle for the multimedia framework.

AV1FFmpegFilters
0 likes · 5 min read
FFmpeg 6.0 “Von Neumann” Release: New Codecs, Filters, and ABI Changes
Architects' Tech Alliance
Architects' Tech Alliance
Jan 13, 2023 · Fundamentals

2022 DPU Development Analysis Report and Related Network Technologies

The 2022 DPU Development Analysis Report outlines the evolution of Data Processing Units from CPU/NP and FPGA‑CPU architectures to ASIC‑CPU designs, discusses RDMA high‑speed networking, data‑plane forwarding techniques, network programmability, and the emerging open DPU software ecosystem, highlighting their performance, power, and cost implications for modern data centers.

ASICDPUData Plane
0 likes · 14 min read
2022 DPU Development Analysis Report and Related Network Technologies
Xiaohongshu Tech REDtech
Xiaohongshu Tech REDtech
Jan 10, 2023 · Artificial Intelligence

AI‑Driven Video Coding: Expert Q&A on Intelligent Compression, Standards, and Future Directions

Experts Wang Shenshe and Chen Jing discuss how deep‑learning‑based video coding is reshaping traditional compression by offering modest quality gains but facing theoretical, hardware, and standardization hurdles, while debating hybrid versus end‑to‑end designs, rate control, 3‑D support, and the balance between human‑centric perception and machine‑oriented efficiency.

AIVideo Codingcompression standards
0 likes · 16 min read
AI‑Driven Video Coding: Expert Q&A on Intelligent Compression, Standards, and Future Directions
Architects' Tech Alliance
Architects' Tech Alliance
Nov 1, 2022 · Databases

2022 China Database Industry Report: Emerging Hardware and Architectural Innovations

The September 2022 China Database Industry Analysis report highlights a wave of hardware‑driven innovations—including multi‑core CPUs, heterogeneous GPUs/TPUs/DPU, programmable FPGAs, CXL‑DDR5, persistent memory, NVMe‑oF, and RDMA‑based storage—that enable massive data storage and high‑concurrency real‑time computing across a range of novel database architectures and products.

GPUOLTPdatabases
0 likes · 10 min read
2022 China Database Industry Report: Emerging Hardware and Architectural Innovations
Baidu Tech Salon
Baidu Tech Salon
Jun 13, 2022 · Artificial Intelligence

Kunlun Core AI Chips: Making Computing Smarter

The 2022 Beijing Zhiyuan Conference report by Kunlun Core’s chip R&D director outlines AI chip market opportunities and challenges, describes the company’s shift from FPGA clusters to a programmable XPU‑R architecture with 7nm, 256 TOPS INT8 performance, GDDR6 memory and PCIe 4.0, and details current deployments and plans for third‑ and fourth‑generation chips.

AI acceleratorAI chipGDDR6
0 likes · 12 min read
Kunlun Core AI Chips: Making Computing Smarter
Alibaba Cloud Infrastructure
Alibaba Cloud Infrastructure
May 31, 2022 · Information Security

Fidas: FPGA‑Based Comprehensive Offloading for Cloud Intrusion Detection (ISCA 2022 Full‑Score Paper)

The ISCA 2022 full‑score paper “Fidas: Fortifying the Cloud via Comprehensive FPGA‑based Offloading for Intrusion Detection” presents a novel FPGA‑accelerated IDS architecture that jointly offloads regex matching and traffic classification, achieving high flexibility, rapid rule updates, balanced load, and line‑rate performance in cloud data centers.

FPGAISCAIntrusion detection
0 likes · 7 min read
Fidas: FPGA‑Based Comprehensive Offloading for Cloud Intrusion Detection (ISCA 2022 Full‑Score Paper)
Baidu Geek Talk
Baidu Geek Talk
Feb 7, 2022 · Mobile Development

Optimizing Video Playback: Soft/Hardware Decoding Strategies for Baidu Android App

The article evaluates software versus hardware video decoding for Baidu’s Android app, presents benchmark data showing surface‑mode hardware decoding’s superior efficiency, identifies compatibility and first‑frame latency challenges, and proposes a monitoring module plus seamless soft‑to‑hard decoder switching to achieve high hardware‑decode usage while maintaining fast startup and low error rates.

FFmpegMediaCodecandroid
0 likes · 11 min read
Optimizing Video Playback: Soft/Hardware Decoding Strategies for Baidu Android App
NetEase Cloud Music Tech Team
NetEase Cloud Music Tech Team
Dec 29, 2021 · Frontend Development

Understanding Browser Compositing Layers: A Guide to CSS Hardware Acceleration

The article explains how browsers build render trees and use GPU‑accelerated compositing layers—created by properties like transform, will‑change, or media elements—to improve performance, avoid repaint glitches such as iOS timer flicker, and offers best‑practice tips for using these layers efficiently without excess memory use.

CSS OptimizationGPU renderingbrowser rendering
0 likes · 10 min read
Understanding Browser Compositing Layers: A Guide to CSS Hardware Acceleration