Tagged articles
99 articles
Page 1 of 1
DataFunSummit
DataFunSummit
May 17, 2026 · Artificial Intelligence

How Agentic Architecture Powers Next‑Generation Recommendation and Search Systems

The article reviews cutting‑edge AI search and recommendation techniques—including Alibaba Cloud's Agentic RAG, Huawei Noah's LLM‑enhanced recommender, Baidu's generative ranking model GRAB, and Elasticsearch‑based vector RAG—detailing their challenges, architectural evolutions, performance gains, and real‑world deployment results.

AI searchAgentic RAGElasticsearch
0 likes · 6 min read
How Agentic Architecture Powers Next‑Generation Recommendation and Search Systems
Machine Heart
Machine Heart
May 12, 2026 · Industry Insights

Guanglun Intelligence, Google, and NVIDIA Co‑Define Physical AI Simulation Standards

The article argues that as AI shifts from a compute‑driven to a data‑driven era, large‑scale physical simulation becomes the CUDA‑like foundation for physical AI, and details how global leaders—including NVIDIA, Google DeepMind, Disney Research, and China’s Guanglun Intelligence—are racing to set unified simulation standards through the open‑source Newton engine.

GPU AccelerationGuanglun IntelligenceIndustry standards
0 likes · 16 min read
Guanglun Intelligence, Google, and NVIDIA Co‑Define Physical AI Simulation Standards
DataFunTalk
DataFunTalk
May 5, 2026 · Artificial Intelligence

Agent Architecture in Action: Building Next‑Gen Recommendation and Search Systems

This article reviews cutting‑edge AI search and recommendation techniques—including Alibaba Cloud's Agentic RAG, Huawei Noah's LLM‑enhanced recommendation pipeline, and Baidu's generative ranking model GRAB—detailing their architectural evolution, multimodal retrieval strategies, GPU acceleration, and measured performance gains.

AI searchAgentic RAGGPU Acceleration
0 likes · 6 min read
Agent Architecture in Action: Building Next‑Gen Recommendation and Search Systems
DataFunSummit
DataFunSummit
May 4, 2026 · Artificial Intelligence

Inside Alibaba Cloud AI Search: Agentic RAG Architecture and Multi‑Agent Techniques

Alibaba Cloud AI Search tackles high‑concurrency, multimodal, and multi‑hop queries by evolving its Agentic RAG architecture from a single agent to a coordinated multi‑agent system that integrates planning, retrieval, and generation, leverages hybrid vector‑text‑DB‑graph recall, GPU‑accelerated indexing, quantization, NL2SQL, and multimodal search, with performance data and real‑world case studies.

AI searchAgentic RAGAlibaba Cloud
0 likes · 6 min read
Inside Alibaba Cloud AI Search: Agentic RAG Architecture and Multi‑Agent Techniques
Machine Heart
Machine Heart
May 4, 2026 · Artificial Intelligence

Mega MoE vs SonicMoE: Which Will Lead the Next AI Speed Race?

SonicMoE, a new ultra‑fast Mixture‑of‑Experts model from Tri Dao and Ion Stoica’s team, achieves peak throughput on Nvidia Blackwell GPUs, outperforms DeepSeek’s DeepGEMM, and introduces algorithmic redesigns that decouple activation memory from expert granularity while fusing I/O‑aware kernels for up to double the speed of existing MoE frameworks.

AI PerformanceBlackwellGPU Acceleration
0 likes · 12 min read
Mega MoE vs SonicMoE: Which Will Lead the Next AI Speed Race?
AI Architecture Path
AI Architecture Path
May 2, 2026 · Artificial Intelligence

Warp Open‑Sources Its AI Terminal: GPU‑Accelerated UI, Agentic Development, 50K+ Stars

Warp, the AI‑native terminal built in Rust with a custom GPU‑accelerated UI framework, has been open‑sourced on GitHub, quickly surpassing 50 000 stars; the article details its development history, open‑source motivations, architecture, core features, installation options, and a comparative analysis with iTerm2 and Ghostty.

AI terminalAgentic Development EnvironmentGPU Acceleration
0 likes · 12 min read
Warp Open‑Sources Its AI Terminal: GPU‑Accelerated UI, Agentic Development, 50K+ Stars
AI Engineering
AI Engineering
Apr 28, 2026 · Artificial Intelligence

Insanely Fast Whisper speeds audio transcription 19× with Flash Attention 2

The open‑source Insanely Fast Whisper CLI tool leverages Flash Attention 2 to accelerate OpenAI Whisper transcription by 19 times—cutting a 2.5‑hour audio from 31 minutes to just 98 seconds on an Nvidia A100—while preserving accuracy and adding multilingual, speaker‑diarization, and precise timestamp features.

CLI toolFlash Attention 2GPU Acceleration
0 likes · 4 min read
Insanely Fast Whisper speeds audio transcription 19× with Flash Attention 2
Bighead's Algorithm Notes
Bighead's Algorithm Notes
Apr 13, 2026 · Artificial Intelligence

FactorMiner: Tsinghua’s Self‑Evolving Agent with Skill and Experience Memory for Alpha Factor Mining

FactorMiner is a lightweight, flexible self‑evolving agent framework that combines a modular skill architecture with structured experience memory, using a Ralph loop to guide search, reduce redundancy, and build a diverse, high‑quality alpha factor library that outperforms baselines across A‑share and cryptocurrency markets while leveraging GPU‑accelerated evaluation.

Alpha Factor MiningExperience MemoryFactorMiner
0 likes · 13 min read
FactorMiner: Tsinghua’s Self‑Evolving Agent with Skill and Experience Memory for Alpha Factor Mining
IT Services Circle
IT Services Circle
Apr 9, 2026 · Operations

Exploring Warp: A GPU‑Accelerated Rust Terminal with Built‑in AI

This article introduces Warp, a Rust‑based modern terminal that leverages GPU acceleration, supports multiple shells, offers AI assistance via models like DeepSeek, and provides step‑by‑step installation, configuration, and usage guidance across Windows, macOS, and Linux.

GPU AccelerationRustcross-platform
0 likes · 6 min read
Exploring Warp: A GPU‑Accelerated Rust Terminal with Built‑in AI
Old Zhang's AI Learning
Old Zhang's AI Learning
Mar 27, 2026 · Artificial Intelligence

vLLM’s Four Major 2026 Updates: Semantic Router Athena, Nemotron 3 Super, P‑EAGLE, and Model Runner V2

The March 2026 vLLM release bundle introduces four substantial upgrades—Semantic Router v0.2 Athena, NVIDIA Nemotron 3 Super, the parallel speculative decoding P‑EAGLE, and a completely re‑architected Model Runner V2—each backed by concrete benchmarks, architectural diagrams, and code examples that demonstrate how the engine evolves from a pure inference engine to a full‑stack AI serving platform.

GPU AccelerationModel Runner V2Nemotron-3-Super
0 likes · 17 min read
vLLM’s Four Major 2026 Updates: Semantic Router Athena, Nemotron 3 Super, P‑EAGLE, and Model Runner V2
AI Engineering
AI Engineering
Mar 11, 2026 · Artificial Intelligence

Run Claude Code Locally with Qwen 3.5 to Skip Anthropic API Costs

This guide shows how to replace Anthropic's API by running a local Qwen 3.5 model with llama.cpp, configuring Claude Code via ANTHROPIC_BASE_URL, and includes hardware checks, build steps, model download, server launch, speed‑fix tips, and usage instructions for secure, cost‑free development.

Anthropic APIClaude CodeGPU Acceleration
0 likes · 8 min read
Run Claude Code Locally with Qwen 3.5 to Skip Anthropic API Costs
Network Intelligence Research Center (NIRC)
Network Intelligence Research Center (NIRC)
Jan 25, 2026 · Artificial Intelligence

RecFlow Breaks DLRM Inference Bottleneck with Fine-Grained GPU Parallelism

RecFlow, a new inference engine from Beijing University of Posts and Telecommunications and Meituan, tackles the resource mismatch of DLRM models by coordinating embedding and DNN operators at the intra‑SM level and introducing interference‑aware adaptive scheduling and incremental batching, achieving up to 9.34× higher throughput on RTX 3090.

DLRMFine-grained parallelismGPU Acceleration
0 likes · 7 min read
RecFlow Breaks DLRM Inference Bottleneck with Fine-Grained GPU Parallelism
Design Hub
Design Hub
Jan 17, 2026 · Artificial Intelligence

FLUX.2 Klein Generates Images in Under a Second and Unlocks Midjourney‑Style Prompts

The article reviews Black Forest Labs' FLUX.2 Klein model, highlighting its sub‑second 1024×1024 image generation, low‑VRAM requirements, four‑step inference speedups, and competitive quality versus SD3 and Midjourney V6, while also sharing Midjourney‑style prompt examples for creative design.

AI image generationFLUX.2GPU Acceleration
0 likes · 8 min read
FLUX.2 Klein Generates Images in Under a Second and Unlocks Midjourney‑Style Prompts
Data Party THU
Data Party THU
Dec 20, 2025 · Artificial Intelligence

Master 20 Essential PyTorch Concepts: From Tensors to Model Deployment

This guide walks you through 20 fundamental PyTorch concepts—including tensor creation, operations, autograd, model building, data loading, GPU acceleration, and best‑practice tricks—providing clear code snippets and step‑by‑step explanations so you can quickly prototype, train, and deploy neural networks.

Deep LearningGPU AccelerationModel Training
0 likes · 16 min read
Master 20 Essential PyTorch Concepts: From Tensors to Model Deployment
DataFunSummit
DataFunSummit
Dec 19, 2025 · Artificial Intelligence

How Agentic RAG, LLM‑Powered Recommendations, and Generative Ranking Transform AI Search and Ads

This article surveys cutting‑edge AI techniques—including Alibaba Cloud's Agentic RAG for multimodal search, Huawei Noah's LLM‑enhanced recommendation evolution, and Baidu's generative ranking (GRAB) for ads—detailing their architectures, optimization tricks, performance gains, and real‑world deployment results.

AI searchAgentic RAGGPU Acceleration
0 likes · 9 min read
How Agentic RAG, LLM‑Powered Recommendations, and Generative Ranking Transform AI Search and Ads
Big Data Technology & Architecture
Big Data Technology & Architecture
Dec 10, 2025 · Big Data

What’s New in Apache Spark 4.0? Deep Dive into 2025 Core Updates

The 2025 release of Apache Spark 4.0 brings a comprehensive overhaul—including default ANSI SQL mode, full SQL scripting support, a new Real‑Time streaming mode, adaptive query execution, dynamic memory management, and GPU‑accelerated MLlib—significantly boosting performance, reliability, and developer productivity across big‑data workloads.

Apache SparkBig DataGPU Acceleration
0 likes · 9 min read
What’s New in Apache Spark 4.0? Deep Dive into 2025 Core Updates
Alibaba Cloud Big Data AI Platform
Alibaba Cloud Big Data AI Platform
Nov 6, 2025 · Artificial Intelligence

How GPU‑Accelerated NN‑Descent Boosts Vector Search Speed by Up to 13×

This article explains how unstructured multimedia data is transformed into vectors for similarity search, introduces GPU parallelism and the NN‑Descent algorithm to replace traditional HNSW indexing in OpenSearch, and presents benchmark results showing up to a thirteen‑fold speed improvement while maintaining comparable recall.

GPU AccelerationNN-DescentOpenSearch
0 likes · 12 min read
How GPU‑Accelerated NN‑Descent Boosts Vector Search Speed by Up to 13×
21CTO
21CTO
Oct 20, 2025 · Artificial Intelligence

Real-Time Frame Model (RTFM): Single‑GPU World Model Redefines 3D Generation

World Labs unveiled RTFM, a real‑time frame model that runs on a single H100 GPU, generating persistent, interactive 3D worlds from 2D images without explicit 3D representations, highlighting the growing computational demands of generative world models and their potential to reshape AI-driven spatial intelligence.

3D generationDiffusion TransformerGPU Acceleration
0 likes · 9 min read
Real-Time Frame Model (RTFM): Single‑GPU World Model Redefines 3D Generation
Efficient Ops
Efficient Ops
Oct 14, 2025 · Artificial Intelligence

Unlock High‑Throughput LLM Inference with vLLM: Install, Run, and Optimize

This guide explains what vLLM is, how its PagedAttention architecture boosts LLM throughput, provides step‑by‑step installation commands, showcases core examples for text generation, chat, embedding and classification, and details advanced performance features such as quantization, LoRA support, and distributed parallelism.

GPU AccelerationLLM inferencePython
0 likes · 8 min read
Unlock High‑Throughput LLM Inference with vLLM: Install, Run, and Optimize
Raymond Ops
Raymond Ops
Sep 17, 2025 · Cloud Native

Enable GPU Acceleration in Docker and Kubernetes with NVIDIA Toolkit

This guide walks through checking the system environment, installing the NVIDIA Docker plugin, configuring Docker to use the NVIDIA runtime, verifying GPU access, deploying the NVIDIA device plugin in a Kubernetes cluster, creating a GPU‑enabled pod, and testing GPU‑accelerated video processing with FFmpeg.

Container ToolkitDockerGPU
0 likes · 12 min read
Enable GPU Acceleration in Docker and Kubernetes with NVIDIA Toolkit
Data STUDIO
Data STUDIO
Sep 8, 2025 · Artificial Intelligence

CuPy vs NumPy: Achieving Over 10× Speedup with GPU Acceleration

The article explains how replacing NumPy with the GPU‑compatible CuPy library can dramatically accelerate array computations, walks through installation prerequisites, demonstrates benchmark scripts showing up to ten‑fold speed improvements, discusses data type effects, custom kernels, and hybrid CPU‑GPU workflows for large‑scale data processing.

BenchmarkCUDACuPy
0 likes · 21 min read
CuPy vs NumPy: Achieving Over 10× Speedup with GPU Acceleration
JavaEdge
JavaEdge
Jun 30, 2025 · Artificial Intelligence

How GPULlama3.java Brings GPU‑Accelerated Llama 3 to Pure Java

GPULlama3.java, released by Manchester University's Beehive Lab, is the first native Java implementation of Llama 3 that leverages TornadoVM to automatically accelerate inference on GPUs without writing CUDA or native code, supporting NVIDIA, Intel and Apple Silicon back‑ends and modern Java 21 features.

AIGPU AccelerationJava
0 likes · 7 min read
How GPULlama3.java Brings GPU‑Accelerated Llama 3 to Pure Java
DataFunSummit
DataFunSummit
Jun 12, 2025 · Artificial Intelligence

How Alibaba Cloud’s AI Search Evolves with Agentic RAG and Multi‑Model Innovations

This article details Alibaba Cloud AI Search’s development journey, covering its dual product lines, the evolution of Agentic RAG technology, multi‑agent architectures, vector retrieval breakthroughs, GPU‑accelerated indexing, NL2SQL capabilities, deployment models, and future directions for AI‑driven search solutions.

AI searchGPU AccelerationOpenSearch
0 likes · 33 min read
How Alibaba Cloud’s AI Search Evolves with Agentic RAG and Multi‑Model Innovations
Architects' Tech Alliance
Architects' Tech Alliance
May 8, 2025 · Industry Insights

How AI Storage Is Redefining Data‑Compute Synergy: Trends, Tech, and Roadmap

This article analyses the emergence of AI‑focused storage, detailing its ultra‑high bandwidth, concurrency, scale and low‑latency characteristics, the architectural shift from layered to fused designs, the specific performance and data‑management demands of training and inference, and a three‑phase roadmap for future storage innovations.

AI storageGPU AccelerationHigh‑performance computing
0 likes · 12 min read
How AI Storage Is Redefining Data‑Compute Synergy: Trends, Tech, and Roadmap
DeWu Technology
DeWu Technology
Feb 17, 2025 · Artificial Intelligence

Optimizing Large Model Inference: High‑Performance Frameworks and Techniques

The article reviews high‑performance inference strategies for large language models such as Deepseek‑R1, detailing CPU‑GPU process separation, Paged and Radix Attention, Chunked Prefill, output‑length reduction, tensor‑parallel multi‑GPU scaling, and speculative decoding, each shown to markedly boost throughput and cut latency in real deployments.

AIDistributed inferenceGPU Acceleration
0 likes · 22 min read
Optimizing Large Model Inference: High‑Performance Frameworks and Techniques
Architect's Alchemy Furnace
Architect's Alchemy Furnace
Feb 5, 2025 · Artificial Intelligence

Deploy DeepSeek R1 Locally with Ollama: Step‑by‑Step Guide for Windows & Linux

This article provides a comprehensive guide to locally deploying DeepSeek R1 models using Ollama on Windows and Linux, covering model variants, hardware requirements, installation steps, command‑line operations, visual client options, usage examples, performance tuning, and best‑practice recommendations for developers and enterprises.

AI modelDeepSeekDocker
0 likes · 10 min read
Deploy DeepSeek R1 Locally with Ollama: Step‑by‑Step Guide for Windows & Linux
DevOps
DevOps
Jan 6, 2025 · Artificial Intelligence

Ten Popular Large Language Model Deployment Engines and Tools: Features, Advantages, and Limitations

This article reviews ten mainstream LLM deployment solutions—including WebLLM, LM Studio, Ollama, vLLM, LightLLM, OpenLLM, HuggingFace TGI, GPT4ALL, llama.cpp, and Triton Inference Server—detailing their technical characteristics, strengths, drawbacks, and example deployment workflows for both personal and enterprise environments.

AI inferenceGPU AccelerationLLM
0 likes · 16 min read
Ten Popular Large Language Model Deployment Engines and Tools: Features, Advantages, and Limitations
Alibaba Cloud Big Data AI Platform
Alibaba Cloud Big Data AI Platform
Dec 18, 2024 · Artificial Intelligence

Can GPU Graph Algorithms Boost Vector Search Performance by 10×?

This article explains how OpenSearch's GPU‑accelerated vector search leverages parallel graph algorithms to achieve up to tenfold speed improvements over CPU solutions, detailing ANNS techniques, performance benchmarks, and practical GPU specifications for high‑QPS AI applications.

GPU AccelerationOpenSearchapproximate nearest neighbor
0 likes · 11 min read
Can GPU Graph Algorithms Boost Vector Search Performance by 10×?
AntTech
AntTech
Nov 29, 2024 · Artificial Intelligence

AI Inference with Trusted Execution Environment: HyperGPU and DistMSM Accelerated Zero‑Knowledge Proofs Win 2024 Financial Cipher Cup Innovation Award

The award‑winning solution combines a GPU‑accelerated TEE framework (HyperGPU) and a multi‑GPU zkSNARK acceleration scheme (DistMSM) to provide fast, privacy‑preserving AI inference proofs, earning the third‑place Innovation Team prize at the 2024 Financial Cipher Cup competition.

AIDistMSMFinancial Cipher
0 likes · 6 min read
AI Inference with Trusted Execution Environment: HyperGPU and DistMSM Accelerated Zero‑Knowledge Proofs Win 2024 Financial Cipher Cup Innovation Award
Baidu Geek Talk
Baidu Geek Talk
Oct 22, 2024 · Big Data

How Baidu’s DATAPILOT Uses NVIDIA RAPIDS to Supercharge SQL Analytics

Baidu’s DATAPILOT platform combines natural‑language interaction with GPU‑accelerated Spark‑RAPIDS to turn complex, multi‑table SQL queries into seconds‑fast results, boosting ad‑revenue analysis efficiency by up to five‑fold while reducing infrastructure costs.

Apache SparkBaiduBig Data
0 likes · 10 min read
How Baidu’s DATAPILOT Uses NVIDIA RAPIDS to Supercharge SQL Analytics
Baidu Geek Talk
Baidu Geek Talk
Oct 9, 2024 · Artificial Intelligence

How Baidu’s Baige 4.0 Architecture Redefines AI Compute Efficiency

This article analyzes Baidu's Baige 4.0 AI infrastructure, detailing its four‑layer architecture, XMAN 5.0 hardware, HPN network, BCCL communication library, and AIAK inference upgrades, and explains how these innovations address large‑model training and inference challenges while boosting performance, utilization, and cost efficiency.

AI InfrastructureCluster ManagementGPU Acceleration
0 likes · 16 min read
How Baidu’s Baige 4.0 Architecture Redefines AI Compute Efficiency
DataFunSummit
DataFunSummit
Jun 17, 2024 · Artificial Intelligence

Strategies for Reducing Cost and Improving Efficiency in Recommendation Systems with Alibaba Cloud PAI‑Rec

This article discusses how Alibaba Cloud’s AI platform PAI‑Rec reduces recommendation system costs and boosts efficiency by optimizing training resources, leveraging FeatureStore, EasyRec and TorchEasyRec frameworks, detailing workflow stages, feature consistency, GPU acceleration, componentized model configuration, and practical deployment timelines.

AI PlatformFeature StoreGPU Acceleration
0 likes · 14 min read
Strategies for Reducing Cost and Improving Efficiency in Recommendation Systems with Alibaba Cloud PAI‑Rec
DataFunTalk
DataFunTalk
Jun 3, 2024 · Artificial Intelligence

Deploying Speech AI Services Quickly with NVIDIA Riva

This article explains how to use NVIDIA Riva to rapidly deploy speech AI services, covering Riva's overview, Chinese ASR model updates, TTS capabilities, customization options, the Quickstart tool, and a Q&A session that clarifies deployment, model fine‑tuning, and integration with NeMo and Triton.

ASRGPU AccelerationNVIDIA Riva
0 likes · 13 min read
Deploying Speech AI Services Quickly with NVIDIA Riva
Baidu Intelligent Cloud Tech Hub
Baidu Intelligent Cloud Tech Hub
Apr 24, 2024 · Artificial Intelligence

How to Build and Accelerate Multi‑Chip AI Clusters for Large‑Model Training

With AI training demands outgrowing single‑chip GPU clusters, this article explains how to construct and speed up heterogeneous AI clusters—combining GPUs, Kunlun, and Ascend chips—by addressing interconnect, distributed parallel strategies, and specialized acceleration suites to achieve high MFU and efficient large‑model training.

AI clusteringDistributed TrainingGPU Acceleration
0 likes · 15 min read
How to Build and Accelerate Multi‑Chip AI Clusters for Large‑Model Training
Didi Tech
Didi Tech
Apr 16, 2024 · Artificial Intelligence

Optimizing DSP Deep Model Latency by Externalizing Feature Processing with EzFeaFly

By externalizing feature processing with the EzFeaFly tool and feeding a dense index/value tensor directly to the GPU, the DSP platform decouples feature transformation from model inference, cutting instance usage by ~40%, reducing inference latency 70‑80%, and achieving over 60% end‑to‑end latency improvement while lowering costs.

DSPGPU AccelerationPython
0 likes · 11 min read
Optimizing DSP Deep Model Latency by Externalizing Feature Processing with EzFeaFly
Bilibili Tech
Bilibili Tech
Mar 5, 2024 · Game Development

Bilibili Color Space Conversion Engine for Video Processing

Bilibili's color space conversion engine processes user‑uploaded videos with varied color parameters into a unified format, using layered filters, precomputed optimizations, CPU and CUDA implementations, handling transformations, quantization, chroma subsampling, matrix conversion, transfer functions, gamut and tone mapping, HDR dynamic metadata, and achieving high performance for millions of users.

GPU AccelerationHDRcolor space
0 likes · 19 min read
Bilibili Color Space Conversion Engine for Video Processing
DataFunTalk
DataFunTalk
Jan 31, 2024 · Artificial Intelligence

Introduction to NVIDIA TensorRT-LLM Inference Framework

TensorRT-LLM is NVIDIA's scalable inference framework for large language models that combines TensorRT compilation, fast kernels, multi‑GPU parallelism, low‑precision quantization, and a PyTorch‑like API to deliver high‑performance LLM serving with extensive customization and future‑focused enhancements.

GPU AccelerationLLM inferenceNvidia
0 likes · 12 min read
Introduction to NVIDIA TensorRT-LLM Inference Framework
JD Retail Technology
JD Retail Technology
Jan 30, 2024 · Artificial Intelligence

Next-Generation Multi‑GPU Synchronous Training Architecture for Large‑Scale Sparse Recommendation Models

The article details JD Retail's evolution from TensorFlow‑based sparse training to a custom high‑performance parameter server and a fully GPU‑accelerated, multi‑node, multi‑card synchronous training framework that leverages GPU‑RDMA, two‑level CPU‑DRAM/GPU‑HBM caching, and pipeline parallelism to overcome storage, I/O, and compute challenges of trillion‑parameter recommendation systems.

AI InfrastructureGPU AccelerationParameter Server
0 likes · 12 min read
Next-Generation Multi‑GPU Synchronous Training Architecture for Large‑Scale Sparse Recommendation Models
JD Retail Technology
JD Retail Technology
Jan 25, 2024 · Artificial Intelligence

Optimizing High‑Concurrency Online Inference for Recommendation Models with Distributed Heterogeneous Computing and GPU Acceleration

This article describes how JD Retail's advertising technology team tackled the high‑compute demands of modern recommendation models by designing a distributed graph‑partitioned heterogeneous computing framework, introducing TensorBatch request aggregation, leveraging deep‑learning compiler bucketing and asynchronous compilation, and implementing a multi‑stream GPU architecture to dramatically improve online inference throughput and latency.

Deep Learning CompilerGPU Accelerationdistributed computing
0 likes · 13 min read
Optimizing High‑Concurrency Online Inference for Recommendation Models with Distributed Heterogeneous Computing and GPU Acceleration
DataFunSummit
DataFunSummit
Nov 19, 2023 · Artificial Intelligence

Overview of NVIDIA Merlin for Recommendation Systems

This article introduces NVIDIA's Merlin suite, covering product overview, Merlin Models & Systems, the TensorFlow Distributed Embedding (TFDE) plugin, the Hierarchical‑KV library, and the Hierarchical Parameter Server (HPS), while highlighting their architecture, performance benefits, and ease of integration for large‑scale recommendation workloads.

Distributed EmbeddingGPU AccelerationHierarchical KV
0 likes · 13 min read
Overview of NVIDIA Merlin for Recommendation Systems
DaTaobao Tech
DaTaobao Tech
May 24, 2023 · Mobile Development

Understanding and Optimizing Mobile Page Performance and Jank

Effective mobile page performance requires identifying three jank types—screen tearing, frame drops, and long unresponsiveness—monitoring metrics such as response time, animation latency, idle time, and SM, understanding the CPU‑GPU rendering pipeline, and applying optimizations like hardware acceleration, transform‑based animations, reduced layout thrashing, task slicing, and GPU‑friendly techniques.

Browser RenderingGPU AccelerationJank
0 likes · 13 min read
Understanding and Optimizing Mobile Page Performance and Jank
Bilibili Tech
Bilibili Tech
Apr 21, 2023 · Artificial Intelligence

Design and Optimization of Bilibili's Large-Scale Video Duplicate Detection System

Bilibili built a massive video‑duplicate detection platform that trains a self‑supervised ResNet‑50 feature extractor, removes black borders, and uses a two‑stage ANN‑plus‑segment‑level matching pipeline accelerated by custom GPU decoding and inference, boosting duplicate rejection 7.5×, recall 3.75×, and cutting manual misses from 65 to 5 per day.

Deep LearningGPU Accelerationfeature extraction
0 likes · 19 min read
Design and Optimization of Bilibili's Large-Scale Video Duplicate Detection System
Baidu Intelligent Cloud Tech Hub
Baidu Intelligent Cloud Tech Hub
Apr 17, 2023 · Artificial Intelligence

How NVIDIA’s GPU‑Powered AI is Revolutionizing Drug Discovery and Genomics

The article outlines NVIDIA’s CLARA platform, BioNeMo framework, and GPU‑accelerated tools such as CLARA Parabricks and RAPIDS, demonstrating how AI and high‑performance computing dramatically speed up drug‑target identification, molecular generation, protein structure prediction, and high‑throughput DNA/RNA sequencing, with benchmarks showing up to 80‑fold acceleration.

AI drug discoveryBioNeMoCLARA
0 likes · 11 min read
How NVIDIA’s GPU‑Powered AI is Revolutionizing Drug Discovery and Genomics
DataFunSummit
DataFunSummit
Apr 9, 2023 · Artificial Intelligence

PGLBox: An Industrial-Scale GPU‑Accelerated Graph Learning Framework

This article introduces the development trends of graph learning frameworks, explains GPU acceleration techniques such as UVA and multi‑GPU pipelines, details the design of the PaddlePaddle Graph Learning (PGL) framework and its large‑scale engine PGLBox, and demonstrates how these technologies enable industrial‑grade graph representation learning with billions of nodes and edges.

GPU AccelerationPGLBoxPaddlePaddle
0 likes · 18 min read
PGLBox: An Industrial-Scale GPU‑Accelerated Graph Learning Framework
Tencent Cloud Developer
Tencent Cloud Developer
Mar 22, 2023 · Artificial Intelligence

How AngelPTM Cuts Large Model Training Costs with ZeRO-Cache Optimizations

This article analyzes Tencent's AngelPTM framework, detailing its ZeRO-Cache strategy, unified storage management, multi‑stream async execution, SSD tiered storage, and performance benchmarks that show up to 95% larger model capacity and over 44% speedup compared to community solutions.

AI InfrastructureGPU AccelerationMemory Optimization
0 likes · 12 min read
How AngelPTM Cuts Large Model Training Costs with ZeRO-Cache Optimizations
Alibaba Cloud Big Data AI Platform
Alibaba Cloud Big Data AI Platform
Mar 20, 2023 · Artificial Intelligence

How HybridBackend Supercharged Ximalaya’s Recommendation Engine with GPU Acceleration

This article details how Ximalaya’s AI Cloud adopted the open‑source HybridBackend framework to overcome sparse data access and distributed training bottlenecks, achieving multi‑GPU utilization gains, faster model training, and significant cost reductions across its recommendation services.

Distributed TrainingGPU AccelerationHybridBackend
0 likes · 9 min read
How HybridBackend Supercharged Ximalaya’s Recommendation Engine with GPU Acceleration
Baidu Geek Talk
Baidu Geek Talk
Mar 9, 2023 · Industry Insights

How Baidu’s ERNIE‑ViLG 2.0 and PaddlePaddle Boost AI Painting Performance

This article analyzes Baidu’s ERNIE‑ViLG 2.0 and PaddlePaddle‑optimized Stable Diffusion models, presenting benchmark comparisons, hardware‑specific speed and memory gains, and the underlying inference optimizations that enable low‑cost, high‑throughput AI‑generated image creation.

AI paintingAIGCGPU Acceleration
0 likes · 9 min read
How Baidu’s ERNIE‑ViLG 2.0 and PaddlePaddle Boost AI Painting Performance
DataFunTalk
DataFunTalk
Feb 11, 2023 · Artificial Intelligence

Accelerating Computer Vision Pipelines with CV-CUDA: Reducing Complexity and Performance Bottlenecks

This article explains how moving image preprocessing and post‑processing to GPU with the open‑source CV‑CUDA library dramatically reduces system complexity, eliminates CPU‑GPU bottlenecks, and delivers up to thirty‑fold performance gains for computer‑vision workloads across training and inference stages.

CV-CUDAComputer VisionDeep Learning
0 likes · 16 min read
Accelerating Computer Vision Pipelines with CV-CUDA: Reducing Complexity and Performance Bottlenecks
Baidu Geek Talk
Baidu Geek Talk
Dec 27, 2022 · Artificial Intelligence

How to Supercharge AI Model Training: Bottlenecks and Cutting‑Edge Acceleration Techniques

This article systematically examines the major performance bottlenecks in AI model training, explains the underlying hardware and software causes, and presents a comprehensive set of acceleration strategies—including data‑loading optimizations, compute‑side enhancements, communication tricks, and the AIAK‑Training suite—backed by real‑world case studies and quantitative results.

AI trainingAIAK-TrainingDistributed Training
0 likes · 33 min read
How to Supercharge AI Model Training: Bottlenecks and Cutting‑Edge Acceleration Techniques
Baidu Intelligent Cloud Tech Hub
Baidu Intelligent Cloud Tech Hub
Dec 27, 2022 · Artificial Intelligence

How to Supercharge AI Inference: End‑to‑End Acceleration Strategies and Baidu’s AIAK‑Inference

This article presents a comprehensive analysis of AI inference bottlenecks, explores industry acceleration techniques such as model simplification, operator fusion, and single‑operator optimization, and details Baidu Cloud's AIAK‑Inference suite with practical demos showing up to 90% latency reduction.

AI inferenceAIAK-InferenceBaidu Cloud
0 likes · 16 min read
How to Supercharge AI Inference: End‑to‑End Acceleration Strategies and Baidu’s AIAK‑Inference
Baidu Intelligent Cloud Tech Hub
Baidu Intelligent Cloud Tech Hub
Dec 22, 2022 · Artificial Intelligence

How to Supercharge AI Model Training: Bottlenecks and Acceleration Techniques

This article systematically analyzes the main performance bottlenecks in AI model training, explains why acceleration is essential, and presents current hardware‑ and software‑based solutions—including data‑loading optimizations, operator fusion, mixed‑precision and Tensor Core usage, as well as distributed communication strategies—followed by real‑world case studies of Baidu's AIAK‑Training suite that demonstrate significant speed‑ups.

AI trainingDistributed TrainingGPU Acceleration
0 likes · 31 min read
How to Supercharge AI Model Training: Bottlenecks and Acceleration Techniques
Architects Research Society
Architects Research Society
Nov 30, 2022 · Artificial Intelligence

A Comprehensive Overview of Machine Learning Tools and Libraries

An extensive survey ranks and compares a wide range of machine learning libraries and frameworks—both deep and shallow learning—detailing their languages, types, GPU acceleration, distributed computing capabilities, and typical academic and industrial applications, based on Google search popularity as of May.

Deep LearningGPU Accelerationdistributed computing
0 likes · 20 min read
A Comprehensive Overview of Machine Learning Tools and Libraries
Bilibili Tech
Bilibili Tech
Nov 8, 2022 · Industry Insights

BANG Engine: Multi‑Level Pipelines & GPU Acceleration for Faster Video Transcoding

To meet Bilibili’s demanding live and on‑demand video transcoding needs, the BANG engine combines a multi‑stage pipeline architecture, frame‑block and multi‑frame parallelism, SIMD‑based CPU acceleration, and TensorRT/TensorFlow GPU inference, offering configurable string‑based pipelines that dramatically increase throughput while simplifying integration.

BilibiliGPU AccelerationTensorRT
0 likes · 18 min read
BANG Engine: Multi‑Level Pipelines & GPU Acceleration for Faster Video Transcoding
Cloud Native Technology Community
Cloud Native Technology Community
Nov 7, 2022 · Cloud Computing

How Edge Computing Is Transforming Automotive Manufacturing

This article explores how edge computing, combined with cloud-native technologies, 5G, and GPU acceleration, enables real‑time data processing, intelligent inspection, digital twins, and autonomous driving in the automotive industry, outlining practical architectures, hardware choices, and deployment patterns.

5GCloud NativeEdge Computing
0 likes · 19 min read
How Edge Computing Is Transforming Automotive Manufacturing
DataFunTalk
DataFunTalk
Oct 31, 2022 · Artificial Intelligence

NVIDIA Merlin HugeCTR: System Overview, Architecture, and Performance

This article introduces NVIDIA Merlin's HugeCTR recommendation system framework, covering its three main modules—NV Tabular, HugeCTR, and Triton—detailing model‑parallel embedding handling, CUDA kernel fusion, mixed‑precision training, hierarchical parameter server inference, Sparse Operation Kit for TensorFlow, performance benchmarks, and practical deployment considerations.

EmbeddingGPU AccelerationHugeCTR
0 likes · 19 min read
NVIDIA Merlin HugeCTR: System Overview, Architecture, and Performance
MaGe Linux Operations
MaGe Linux Operations
Oct 7, 2022 · Fundamentals

Boost Python Performance 100× with Taichi: Real‑World Speedup Examples

Discover how importing the Taichi library can accelerate Python code by up to 100 times, with detailed examples ranging from prime counting and longest common subsequence dynamic programming to reaction‑diffusion simulations, including performance metrics, GPU support, and concise code snippets.

GPU AccelerationNumerical ComputingPython performance
0 likes · 8 min read
Boost Python Performance 100× with Taichi: Real‑World Speedup Examples
Qingyun Technology Community
Qingyun Technology Community
Sep 15, 2022 · Cloud Computing

How GPU, VPU, and CPU Accelerate Cloud Video Transcoding: Architecture and Best Practices

This article explores the rapid growth of video traffic, explains why transcoding is essential, compares CPU, GPU, and VPU hardware for video processing, details the FFmpeg software stack, describes the design of a cloud‑native transcoding cluster, its scheduling, shard‑transcoding technique, and presents performance test results.

Distributed SystemsGPU AccelerationHardware acceleration
0 likes · 23 min read
How GPU, VPU, and CPU Accelerate Cloud Video Transcoding: Architecture and Best Practices
DataFunSummit
DataFunSummit
Sep 4, 2022 · Artificial Intelligence

Sparse Features in Machine Learning: Challenges, NVIDIA Ampere Structured Sparsity, Knowledge Distillation, and GAN Model Compression

This talk explores the challenges and opportunities of leveraging sparsity in machine learning models, covering fine‑grained and coarse‑grained sparsity, NVIDIA Ampere’s 2:4 structured sparsity, knowledge‑distillation techniques for converting unstructured to structured sparsity, and model compression strategies for generative adversarial networks.

Deep LearningGANGPU Acceleration
0 likes · 14 min read
Sparse Features in Machine Learning: Challenges, NVIDIA Ampere Structured Sparsity, Knowledge Distillation, and GAN Model Compression
Meituan Technology Team
Meituan Technology Team
Jul 6, 2022 · Artificial Intelligence

Engineering Practices for Large-Scale Deep Learning Models in Meituan Takeaway Advertising

The article details Meituan's engineering journey from small DNNs to hundred‑gigabyte deep learning models for food‑delivery ads, analyzing online latency and offline efficiency challenges and presenting distributed storage, CPU/GPU acceleration, OpenVINO, TensorRT, CodeGen, and data‑pipeline optimizations that dramatically improve throughput, memory usage, and sample‑building speed.

CPU accelerationDeep LearningGPU Acceleration
0 likes · 45 min read
Engineering Practices for Large-Scale Deep Learning Models in Meituan Takeaway Advertising
21CTO
21CTO
Jun 6, 2022 · Fundamentals

How a Peking University Student Dominated Global EDA Competitions and Won ACM’s Top Student Award

Guo Zizheng, a Peking University Turing Class senior, secured first place in the ACM Student Research Competition, published eight first‑author EDA papers, pioneered GPU‑accelerated static timing analysis, and earned the prestigious Student May Fourth Medal, highlighting China's rising talent in chip design automation.

Academic AwardsEDAGPU Acceleration
0 likes · 7 min read
How a Peking University Student Dominated Global EDA Competitions and Won ACM’s Top Student Award
IT Services Circle
IT Services Circle
May 8, 2022 · Information Security

An Introduction to Hashcat: Features, Usage, and Command Options

This article introduces Hashcat, the world’s fastest password‑recovery tool, outlines its extensive feature set, provides the project’s GitHub address, and explains how to download, install, and run basic commands with common options for various hash types and attack modes.

GPU AccelerationHashcatcommand-line
0 likes · 4 min read
An Introduction to Hashcat: Features, Usage, and Command Options
DataFunTalk
DataFunTalk
Apr 22, 2022 · Artificial Intelligence

Inference Optimization Techniques and GPU Parallel Acceleration for Tencent Intelligent Dialogue Models

This article presents a comprehensive overview of inference optimization methods—including model pruning, quantization, knowledge distillation, caching, instruction‑set acceleration, and operator fusion—and details a GPU‑centric parallel acceleration methodology with CUDA basics, performance‑analysis tools, theoretical limits, and practical case studies, all illustrated with real‑world examples from Tencent's intelligent dialogue products.

GPU AccelerationOperator fusioncaching
0 likes · 18 min read
Inference Optimization Techniques and GPU Parallel Acceleration for Tencent Intelligent Dialogue Models
Alipay Experience Technology
Alipay Experience Technology
Oct 13, 2021 · Artificial Intelligence

How ant‑tfjs Boosts Web AI Inference: WebGL, Wasm, and GPU Optimizations

This article examines high‑performance web computing for TensorFlow.js models, comparing tfjs and ant‑tfjs on WebGL, Wasm, and GPU backends, and details a series of optimizations—including pre‑encoding, shader handling, graph fusion, vectorization, and memory layout—that double inference speed on mobile devices.

Frontend AIGPU AccelerationTensorFlow.js
0 likes · 11 min read
How ant‑tfjs Boosts Web AI Inference: WebGL, Wasm, and GPU Optimizations
政采云技术
政采云技术
Sep 28, 2021 · Frontend Development

Browser Rendering: Reflow and Repaint

This article explains the browser rendering pipeline, the concepts of reflow and repaint, how they are triggered, their performance impact, and provides practical techniques such as minimizing layout thrashing, using GPU‑accelerated properties, and leveraging requestAnimationFrame to optimize front‑end performance.

Browser RenderingGPU AccelerationPerformance Optimization
0 likes · 17 min read
Browser Rendering: Reflow and Repaint
Alimama Tech
Alimama Tech
Sep 8, 2021 · Artificial Intelligence

Engineering Optimizations for Large‑Scale Advertising Recall Models: Full‑Cache Scoring and Index Flattening

Alibaba Mama’s advertising platform modernized its Tree‑based Deep Model by introducing a dual‑tower full‑library DNN with aggressive pre‑filtering and custom GPU TopK kernels, and a flattened‑tree model that retains beam search with multi‑head attention, while applying memory‑aware tricks such as attention swapping, softmax approximation, tiled‑matmul splitting, TensorCore batching, INT8 quantization and cache‑resident ad vectors, enabling multi‑fold latency reductions with minimal recall loss.

Beam SearchGPU AccelerationModel Optimization
0 likes · 15 min read
Engineering Optimizations for Large‑Scale Advertising Recall Models: Full‑Cache Scoring and Index Flattening
MaGe Linux Operations
MaGe Linux Operations
Jul 26, 2021 · Fundamentals

Boost NumPy Performance 10× with CuPy: GPU Acceleration Guide

This article explains how CuPy mirrors NumPy's API to run array and matrix operations on NVIDIA GPUs, providing step‑by‑step installation, code examples, and benchmark results that demonstrate speedups ranging from 10× to over 700× compared to CPU‑only NumPy.

CUDACuPyGPU Acceleration
0 likes · 5 min read
Boost NumPy Performance 10× with CuPy: GPU Acceleration Guide
Architects Research Society
Architects Research Society
May 31, 2021 · Artificial Intelligence

Comprehensive Survey of Machine Learning Tools and Libraries

This article presents a detailed overview and ranking of numerous machine learning tools and libraries, distinguishing deep and shallow learning approaches, highlighting language support, GPU acceleration, and distributed computing capabilities, and provides insights into their academic and industrial usage.

GPU Accelerationdistributed computingshallow learning
0 likes · 9 min read
Comprehensive Survey of Machine Learning Tools and Libraries
Python Programming Learning Circle
Python Programming Learning Circle
May 19, 2021 · Operations

Introducing WSLg: Full Linux GUI Support on Windows 10

Microsoft’s recent preview of WSLg brings full Linux GUI support to Windows 10, enabling users to install and run desktop Linux distributions, launch IDEs and GUI applications with audio, microphone, and GPU acceleration, and seamlessly integrate development workflows across Windows and Linux environments.

GPU AccelerationIDE integrationLinux GUI
0 likes · 4 min read
Introducing WSLg: Full Linux GUI Support on Windows 10
ITPUB
ITPUB
Apr 29, 2021 · Operations

Run Linux GUI Apps on Windows 10 with WSLg: Full Guide and Architecture

Microsoft's WSLg update lets Windows 10 users install and run Linux desktop environments, IDEs, audio‑enabled apps, and GPU‑accelerated software directly on the host, with automatic backend services and a clear architecture overview.

Audio supportGPU AccelerationLinux GUI
0 likes · 5 min read
Run Linux GUI Apps on Windows 10 with WSLg: Full Guide and Architecture
DataFunTalk
DataFunTalk
Apr 28, 2021 · Big Data

Accelerating Apache Spark 3.0 with NVIDIA RAPIDS: Architecture, Performance Gains, and New Features

This article explains how NVIDIA's RAPIDS Accelerator leverages GPUs to speed up Apache Spark 3.0 workloads, detailing the underlying architecture, benchmark results on TPC‑DS and recommendation models, required configuration changes, supported operators, shuffle optimizations, and the enhancements introduced in versions 0.2 and 0.3.

Apache SparkBig DataGPU Acceleration
0 likes · 19 min read
Accelerating Apache Spark 3.0 with NVIDIA RAPIDS: Architecture, Performance Gains, and New Features
Alibaba Cloud Infrastructure
Alibaba Cloud Infrastructure
Apr 22, 2021 · Artificial Intelligence

Alibaba Cloud Breaks MLPerf Inference Performance Records with Zhenduan Heterogeneous Computing Platform

Alibaba Cloud's Zhenduan heterogeneous computing acceleration platform achieved historic breakthroughs in the MLPerf inference benchmark, processing over 1.07 million images per second on 8 NVIDIA A100 GPUs, setting multiple first‑place records and dramatically improving e‑commerce recommendation speed and overall AI workload efficiency.

AI inferenceAlibaba CloudGPU Acceleration
0 likes · 7 min read
Alibaba Cloud Breaks MLPerf Inference Performance Records with Zhenduan Heterogeneous Computing Platform
iQIYI Technical Product Team
iQIYI Technical Product Team
Feb 5, 2021 · Artificial Intelligence

Efficient General‑Purpose Frame Extraction for AI Video Inference Services

The paper presents a unified, high‑performance frame‑extraction framework that dynamically selects CPU or GPU decoding, leverages multithreaded and CUDA‑accelerated pipelines, keeps frames in memory, and achieves up to ten‑fold latency reductions for diverse AI video‑inference tasks.

AI video inferenceCPU optimizationGPU Acceleration
0 likes · 14 min read
Efficient General‑Purpose Frame Extraction for AI Video Inference Services
Suning Technology
Suning Technology
Sep 17, 2020 · Artificial Intelligence

Unlocking Retail Innovation: 3D Digital Storebuilding with Multi‑Camera Vision

This article explores how 3D digital storebuilding integrates multiple visual sensors, GPU acceleration, and advanced camera calibration to create high‑precision, real‑time digital twins of retail spaces, enabling fine‑grained lifecycle management and immersive customer experiences.

3D reconstructionGPU Accelerationcamera calibration
0 likes · 15 min read
Unlocking Retail Innovation: 3D Digital Storebuilding with Multi‑Camera Vision
WecTeam
WecTeam
Sep 11, 2020 · Frontend Development

Simulate Life with WebGL and Speed Up Pages Using CSS content-visibility

This week’s WecTeam Front‑end newsletter showcases a WebGL implementation of Conway’s Game of Life that treats the GPU as a parallel for‑loop accelerator, and introduces the CSS content-visibility:auto property, which lets browsers defer layout and rendering of off‑screen elements to dramatically improve initial page load performance.

Content-VisibilityConway's Game of LifeGPU Acceleration
0 likes · 2 min read
Simulate Life with WebGL and Speed Up Pages Using CSS content-visibility
Liangxu Linux
Liangxu Linux
May 30, 2020 · Operations

WSL 2 Brings Linux GUI Apps and GPU Acceleration to Windows 10

Microsoft’s Windows Subsystem for Linux 2 adds a full Linux kernel, native GUI application support, GPU hardware acceleration for AI workloads, and a simplified installation command, marking a major step toward tighter Windows‑Linux integration in the upcoming Windows 10 2004 update.

GPU AccelerationGUISubsystem
0 likes · 5 min read
WSL 2 Brings Linux GUI Apps and GPU Acceleration to Windows 10
Alibaba Cloud Developer
Alibaba Cloud Developer
Apr 16, 2020 · Artificial Intelligence

How Mars Supercharges Numpy, Pandas, and Scikit‑Learn with Parallel and GPU Acceleration

This article explains how the Mars framework enables parallel and distributed execution of core Python data‑science libraries—Numpy, Pandas, and Scikit‑Learn—while integrating with RAPIDS for GPU acceleration, and demonstrates its performance advantages through code examples and benchmark results.

GPU AccelerationMarsNumPy
0 likes · 16 min read
How Mars Supercharges Numpy, Pandas, and Scikit‑Learn with Parallel and GPU Acceleration
Alibaba Cloud Developer
Alibaba Cloud Developer
Jun 12, 2019 · Artificial Intelligence

How Alibaba’s PAISoar Accelerates Deep Learning: 101× Speedup on 128 GPUs

Alibaba engineers detail the PAISoar distributed training framework, showing how RDMA‑optimized hardware, Ring AllReduce algorithms, and user‑friendly APIs boost deep‑learning models—like the GreenNet CNN—to 101‑fold speedups on 128 GPUs, dramatically reducing training time from days to under a day.

AI InfrastructureDeep LearningDistributed Training
0 likes · 17 min read
How Alibaba’s PAISoar Accelerates Deep Learning: 101× Speedup on 128 GPUs
Alibaba Cloud Developer
Alibaba Cloud Developer
Dec 18, 2018 · Databases

Inside Alibaba AnalyticDB: Architecture, Core Technologies, and Real‑Time Data Warehouse Innovations

This article provides an in‑depth technical overview of Alibaba's AnalyticDB, covering the challenges of massive real‑time analytics, the cloud‑native multi‑tenant architecture, data model, import/export capabilities, high‑performance SQL parser, the Xuanwu storage engine, Xihe compute engine, optimizer, GPU acceleration, and elastic scaling features.

AnalyticDBGPU AccelerationSQL Parser
0 likes · 38 min read
Inside Alibaba AnalyticDB: Architecture, Core Technologies, and Real‑Time Data Warehouse Innovations
MaGe Linux Operations
MaGe Linux Operations
Nov 22, 2018 · Artificial Intelligence

Accelerating TensorFlow Deep Learning: GPU & Distributed Training Techniques

This article explains how to speed up TensorFlow deep‑learning model training using single‑GPU acceleration, multi‑GPU parallelism, and distributed TensorFlow on Kubernetes, covering device placement, session parameters, synchronous vs asynchronous training modes, and practical code examples to improve performance and scalability.

Deep LearningDistributed TrainingGPU Acceleration
0 likes · 10 min read
Accelerating TensorFlow Deep Learning: GPU & Distributed Training Techniques
Tencent TDS Service
Tencent TDS Service
Jul 12, 2018 · Artificial Intelligence

How to Engineer MobileNet for Efficient Image Classification on Mobile Devices

This article details the engineering of MobileNet V1 for image classification on mobile terminals, covering its depthwise separable convolution architecture, data collection and preprocessing, model training with transfer learning, TensorFlow Lite conversion, deployment on iOS/Android, and GPU acceleration techniques for faster inference.

Deep LearningGPU AccelerationMobile Deployment
0 likes · 19 min read
How to Engineer MobileNet for Efficient Image Classification on Mobile Devices
21CTO
21CTO
Jan 29, 2018 · Fundamentals

How Tencent Cut Hundreds of Gigabytes of Bandwidth with Advanced Image Compression

This article reviews the evolution of image formats such as JPEG, WebP, HEVC, and Tencent's proprietary WXAM and SHARP, explains psychovisual JPEG optimization with Guetzli, details GPU‑accelerated performance tweaks, and shows how these techniques saved terabytes of bandwidth and reduced user download latency across Tencent's massive image platform.

Bandwidth ReductionGPU AccelerationGuetzli
0 likes · 14 min read
How Tencent Cut Hundreds of Gigabytes of Bandwidth with Advanced Image Compression
Tencent Architect
Tencent Architect
Jan 27, 2018 · Fundamentals

Advances in Image Compression: From JPEG to WebP, HEVC, WXAM, SHARP, and Guetzli Optimizations at Tencent TPS

The article reviews recent developments in image compression formats such as JPEG, WebP, HEVC, and Tencent's proprietary WXAM/SHARP, explains Guetzli's perceptual encoding, details extensive GPU‑based performance optimizations, and demonstrates how these techniques dramatically reduce bandwidth usage in Tencent's massive image storage platform.

GPU AccelerationGuetzliJPEG
0 likes · 13 min read
Advances in Image Compression: From JPEG to WebP, HEVC, WXAM, SHARP, and Guetzli Optimizations at Tencent TPS
Tencent IMWeb Frontend Team
Tencent IMWeb Frontend Team
Dec 7, 2017 · Frontend Development

How to Achieve Smooth 60 FPS Web Animations on Low‑End Devices

This article explains why 60 FPS is the benchmark for fluid web animations, shows how to measure frame rates with requestAnimationFrame, compares CSS and JavaScript animation performance on TV‑box hardware, and provides a step‑by‑step optimization guide using GPU acceleration, will‑change, and dev‑tools.

GPU AccelerationPerformance OptimizationWeb animation
0 likes · 16 min read
How to Achieve Smooth 60 FPS Web Animations on Low‑End Devices
MaGe Linux Operations
MaGe Linux Operations
Apr 19, 2017 · Artificial Intelligence

Accelerate TensorFlow Deep Learning with GPU, Multi‑GPU, and Distributed Training

This article explains how to speed up TensorFlow deep‑learning model training by using a single GPU, configuring session parameters, assigning operations to specific devices, employing multi‑GPU parallelism, and leveraging distributed TensorFlow on Kubernetes, while also discussing synchronous versus asynchronous training modes and practical best practices.

Deep LearningDistributed TrainingGPU Acceleration
0 likes · 11 min read
Accelerate TensorFlow Deep Learning with GPU, Multi‑GPU, and Distributed Training
ITPUB
ITPUB
Sep 6, 2016 · Artificial Intelligence

Deep Learning Platforms: From Google’s DistBelief to Open‑Source MXNet and TensorFlow

The article reviews the evolution, challenges, and commercial and open‑source deep learning platforms—including DistBelief, COTS, Adam, MXNet, TensorFlow, and Petuum—while highlighting real‑world applications such as image recognition, recommendation, sentiment analysis, and crowd monitoring.

AI applicationsDistributed TrainingGPU Acceleration
0 likes · 10 min read
Deep Learning Platforms: From Google’s DistBelief to Open‑Source MXNet and TensorFlow
ITPUB
ITPUB
Jan 19, 2016 · Databases

Surprising PostgreSQL Features That Redefine What a Database Can Do

This article showcases seven remarkable PostgreSQL extensions—including multi‑master replication, Greenplum MPP OLAP, pg_shard/FDW sharding, PostGIS 3D GIS, GPU‑accelerated PG‑Strom, PipelineDB streaming, and the versatile FDW interface—illustrating how they enable high‑availability, massive analytics, geographic intelligence, and real‑time data processing.

Database ExtensionsFDWGIS
0 likes · 5 min read
Surprising PostgreSQL Features That Redefine What a Database Can Do