Tagged articles

inference engine

14 articles · Page 1 of 1

Jul 2, 2026 · Artificial Intelligence

vLLM 0.24.0 Release: New Features for Faster, Memory‑Efficient Large‑Model Deployment

The vLLM 0.24.0 update adds MiniMax‑M3, DeepSeek‑V4, DiffusionGemma support, a Streaming Parser Engine, and a new device_ids parameter, delivering faster inference, lower memory use, and broader hardware compatibility for large‑model deployments.

DeepSeek-V4DiffusionGemmaMiniMax M3

0 likes · 9 min read

vLLM 0.24.0 Release: New Features for Faster, Memory‑Efficient Large‑Model Deployment

AI Large-Model Wave and Transformation Guide

Jun 8, 2026 · Artificial Intelligence

Designing a High‑Reliability Cognitive Reasoning System with Ontology‑Based Architecture

The article presents a detailed architecture for a high‑reliability cognitive reasoning system that combines logical inference, semantic constraints, and a seven‑layer defense to achieve efficient deduction and strict error prevention across critical domains such as medical diagnosis and financial risk control.

Knowledge GraphOntologycognitive reasoning

0 likes · 6 min read

Designing a High‑Reliability Cognitive Reasoning System with Ontology‑Based Architecture

Lao Guo's Learning Space

May 11, 2026 · Artificial Intelligence

Redis Creator Releases Pure‑C Engine That Makes DeepSeek V4 Run Fast on Mac

Redis founder antirez unveiled ds4.c, a pure‑C inference engine that leverages Objective‑C and Metal to run DeepSeek V4 locally on Mac devices, delivering about 27 token/s on an M3 Ultra—far slower than GPU servers but offering a dependency‑free, on‑device solution that keeps data private.

AIC#DeepSeek

0 likes · 8 min read

Redis Creator Releases Pure‑C Engine That Makes DeepSeek V4 Run Fast on Mac

DeepHub IMBA

Apr 4, 2026 · Artificial Intelligence

Building Mini-vLLM from Scratch: KV‑Cache, Dynamic Batching, and Distributed Inference

This article walks through constructing Mini-vLLM, a from‑scratch LLM inference engine that tackles the O(N²) attention cost with KV‑cache, boosts throughput via dynamic batching, adds observability with Prometheus/Grafana, supports gRPC, and scales across multiple workers, with benchmark numbers demonstrating its CPU‑only performance.

DockerDynamic BatchingKV cache

0 likes · 12 min read

Building Mini-vLLM from Scratch: KV‑Cache, Dynamic Batching, and Distributed Inference

JavaEdge

Jun 27, 2025 · Artificial Intelligence

Why Inference Engines Are Essential for Deploying Large Language Models in Production

The article explains what inference engines are, why they are needed beyond raw Python scripts, and outlines best practices such as model quantization, batching, and parallelism, while comparing popular open‑source and commercial options for production AI workloads.

AI DeploymentBatchingLLM

0 likes · 14 min read

Why Inference Engines Are Essential for Deploying Large Language Models in Production

DataFunSummit

Dec 24, 2024 · Artificial Intelligence

Considerations and Practices for Domesticating Large‑Model Inference Engines

This article examines the importance of domestic large‑model inference engines, compares Chinese and international chips, evaluates four architectural approaches, discusses practical challenges such as performance loss and model support, and outlines future expectations for high‑performance, heterogeneous‑chip inference solutions.

Domestic ChipPerformance Optimizationinference engine

0 likes · 9 min read

Considerations and Practices for Domesticating Large‑Model Inference Engines

DataFunSummit

Sep 11, 2023 · Artificial Intelligence

Challenges and Insights for Deploying Large Models on Edge with MNN

The talk presents an overview of the MNN inference engine, outlines the end‑to‑end workflow for deploying large language models on mobile devices, discusses technical challenges and practical solutions, and concludes with future directions for edge AI deployment.

AIEdge deploymentMNN

0 likes · 2 min read

Challenges and Insights for Deploying Large Models on Edge with MNN

OPPO Kernel Craftsman

Oct 28, 2022 · Artificial Intelligence

ShaderNN: A GPU Shader‑Based Lightweight Inference Engine for Mobile AI Applications

ShaderNN is an open‑source, sub‑2 MB GPU‑shader inference engine that runs TensorFlow, PyTorch and ONNX models directly on mobile graphics textures via OpenGL fragment and compute shaders, delivering real‑time, low‑power AI for image‑heavy tasks while eliminating third‑party dependencies and achieving up to 90 % speed gains.

GPUShaderinference engine

0 likes · 11 min read

ShaderNN: A GPU Shader‑Based Lightweight Inference Engine for Mobile AI Applications

ByteDance Terminal Technology

Jul 29, 2022 · Artificial Intelligence

Pitaya: ByteDance’s End‑Side AI Engineering Platform Overview

Pitaya, built by ByteDance’s Client AI and MLX teams, is a comprehensive end‑side AI engineering platform that provides a full workflow from model development and data preparation to deployment, monitoring, and federated learning, supporting large‑scale commercial scenarios across multiple apps.

AI platformedge AIfederated learning

0 likes · 14 min read

Pitaya: ByteDance’s End‑Side AI Engineering Platform Overview

DataFunTalk

Apr 14, 2022 · Artificial Intelligence

PaddlePaddle Deep Learning Platform: Architecture, Core Technologies, and Real‑World Applications

The article presents a comprehensive overview of Baidu's open‑source deep learning platform PaddlePaddle, detailing its full‑stack architecture, core technologies such as unified dynamic‑static graph, large‑scale distributed training, multi‑platform inference, an extensive model zoo, hardware adaptation, and showcases a real‑world deployment case in power‑grid monitoring.

AI FrameworkPaddlePaddledistributed training

0 likes · 15 min read

PaddlePaddle Deep Learning Platform: Architecture, Core Technologies, and Real‑World Applications

DaTaobao Tech

Mar 11, 2022 · Artificial Intelligence

How Alibaba’s MNN Engine Achieves 350% CPU Speedup and Sparse Acceleration

Alibaba’s MNN, a lightweight high‑performance deep‑learning inference engine, earned top honors in China’s 2022 “Science & Innovation China” awards, and delivers impressive gains such as 350% speedup on X86 CPUs, 2.1‑2.3× acceleration on ARM with sparse models, plus integrated OpenCV/Numpy functionality for edge AI deployment.

AI DeploymentAlibabaMNN

0 likes · 4 min read

How Alibaba’s MNN Engine Achieves 350% CPU Speedup and Sparse Acceleration

Alibaba Terminal Technology

Feb 3, 2021 · Frontend Development

How Front-End AI Inference Engines Achieve Real-Time Smart Recognition

This article explains on‑device machine learning concepts, compares front‑end inference engines such as TensorFlow.js, ONNX.js and WebDNN across CPU, WASM and WebGL, and presents practical optimization techniques like vectorization, memory layout, graph fusion and mixed‑precision to boost performance for real‑time applications.

frontendinference enginemachine learning

0 likes · 11 min read

How Front-End AI Inference Engines Achieve Real-Time Smart Recognition

Alibaba Cloud Developer

Jul 2, 2019 · Artificial Intelligence

How MNN Powers Mobile AI: Inside Alibaba’s Open‑Source Inference Engine

Alibaba’s MNN (Mobile Neural Network) engine, now open‑sourced on GitHub, showcases how a lightweight, end‑side deep‑learning inference framework tackles fragmentation, optimizes model conversion, scheduling, and execution across diverse devices, delivering significant performance gains for mobile and IoT AI applications.

MNNModel OptimizationOperator fusion

0 likes · 15 min read

How MNN Powers Mobile AI: Inside Alibaba’s Open‑Source Inference Engine

Alibaba Cloud Developer

Aug 30, 2017 · Artificial Intelligence

How Alibaba’s Knowledge Graph Powers Real‑Time Product Governance with AI

Alibaba’s massive product knowledge graph combines billions of triples, AI‑driven inference, and semantic reasoning to enable millisecond‑level, explainable detection of illegal or counterfeit items across its e‑commerce ecosystem, improving platform governance and consumer experience.

AIAlibabaKnowledge Graph

0 likes · 8 min read

How Alibaba’s Knowledge Graph Powers Real‑Time Product Governance with AI