Tagged articles

TVM

8 articles · Page 1 of 1

Machine Learning Algorithms & Natural Language Processing

May 7, 2026 · Artificial Intelligence

How TileLang Enables Efficient Small Operators in Large LLMs (DeepSeek V4 Report)

The article analyzes TileLang, the DSL behind DeepSeek V4, showing how its Fragment and Parallel abstractions, host‑side codegen via TVM‑FFI, and Z3 prover integration let developers implement fused small operators with hand‑written performance, faster development, and easier maintenance.

DeepSeekGPU compilerLLM

0 likes · 11 min read

How TileLang Enables Efficient Small Operators in Large LLMs (DeepSeek V4 Report)

Network Intelligence Research Center (NIRC)

Nov 24, 2025 · Artificial Intelligence

Simplifying AI Operator Development with TileLang DSL

TileLang is a Python‑style DSL built on TVM that separates algorithm logic from hardware scheduling, offers beginner to expert interfaces, supports multiple GPU and CPU backends, and delivers performance on par with or better than existing AI kernels, as demonstrated with GEMM, FlashAttention and other benchmarks.

AI operatorsGEMMGPU

0 likes · 10 min read

Simplifying AI Operator Development with TileLang DSL

Network Intelligence Research Center (NIRC)

Sep 3, 2025 · Artificial Intelligence

Understanding AI Compilers: A TVM Example

The article explains how AI compilers transform high‑level models into efficient hardware code, using TVM to illustrate operator optimization, automated scheduling, and end‑to‑end compilation workflow with concrete code examples and performance considerations.

AI compilerDeep LearningTVM

0 likes · 8 min read

Understanding AI Compilers: A TVM Example

DaTaobao Tech

Jul 15, 2022 · Artificial Intelligence

Edge AI Model Evaluation and Optimization with TensorFlow, JAX, and TVM

The article demonstrates how to evaluate, compress, and convert deep‑learning models for edge devices using TensorFlow, JAX, and TVM—showing a faster iPhone‑based MNIST training benchmark, FLOPs measurement scripts, TFLite/ONNX/CoreML conversion, TVM compilation with auto‑tuning, and up to 50 % speed improvements on mobile NPU hardware.

JAXTVMTensorFlow

0 likes · 29 min read

Edge AI Model Evaluation and Optimization with TensorFlow, JAX, and TVM

Alibaba Terminal Technology

Jun 22, 2022 · Artificial Intelligence

How Fast Can Your Smartphone Run ML Models? Exploring Edge AI Optimization

This article examines the computational capabilities of modern mobile devices for machine learning, compares training times on a MacBook and iPhone, explains model evaluation metrics like FLOPs, and provides step‑by‑step guides for converting and optimizing models using TensorFlow, PyTorch, ONNX, JAX, and TVM for edge deployment.

JAXModel OptimizationTVM

0 likes · 29 min read

How Fast Can Your Smartphone Run ML Models? Exploring Edge AI Optimization

Meituan Technology Team

Mar 3, 2022 · Artificial Intelligence

GPU Optimization Practices for Meituan Delivery Search and Recommendation Model Inference

Meituan’s delivery search and recommendation service migrated from separate CPU‑only models to a unified multi‑task model running on a heterogeneous CPU‑GPU architecture, applying system‑level placement, All‑On‑GPU lookup, FP16 mixed precision, operator fusion, TensorRT and TVM compilation, which together delivered roughly a four‑fold increase in inference throughput while maintaining cost.

GPUTVMTensorFlow

0 likes · 24 min read

GPU Optimization Practices for Meituan Delivery Search and Recommendation Model Inference

Meituan Technology Team

Sep 9, 2021 · Artificial Intelligence

GPU Optimization Practices for CTR Models at Meituan

Meituan accelerates CTR model inference by fusing operators with TVM, optimizing CPU‑GPU data transfers, manually tuning high‑frequency subgraphs, and dynamically offloading workloads, achieving up to ten‑fold throughput gains on Tesla T4 GPUs while keeping latency stable and only modestly increasing beyond 128 QPS, though compilation remains slow and large‑model support needs improvement.

CTRDeep LearningGPU

0 likes · 16 min read

GPU Optimization Practices for CTR Models at Meituan

Ctrip Technology

Jul 23, 2020 · Artificial Intelligence

Inference Performance Optimization for AI Applications: Methods, Case Studies, and Future Directions

This article examines the challenges of deep learning inference, outlines general optimization methodologies—including system-level and model-level techniques—presents practical case studies such as Transformer translation model improvements, and discusses future trends in automated compilation and performance tuning for AI services.

AI inferenceDeep LearningPerformance Optimization

0 likes · 15 min read

Inference Performance Optimization for AI Applications: Methods, Case Studies, and Future Directions