Tagged articles
38 articles
Page 1 of 1
Old Zhang's AI Learning
Old Zhang's AI Learning
May 16, 2026 · Artificial Intelligence

vLLM 0.21.0 Arrives: Speculative Decoding Now Supports Reasoning Models

The vLLM 0.21.0 release brings five major updates—including Transformers v4 deprecation, a C++20 build requirement, KV offload with hybrid memory, speculative decoding that respects thinking budgets, and a Blackwell token‑speed backend—while offering detailed upgrade guidance for different user groups.

C++20InferenceKV cache
0 likes · 12 min read
vLLM 0.21.0 Arrives: Speculative Decoding Now Supports Reasoning Models
Old Zhang's AI Learning
Old Zhang's AI Learning
Apr 28, 2026 · Artificial Intelligence

vLLM 0.20 Arrives with DeepSeek V4 Support – What’s New?

The vLLM 0.20.0 release dramatically upgrades the inference engine with DeepSeek V4 support, default CUDA 13, PyTorch 2.11, Transformers v5 compatibility, FlashAttention 4 MLA prefill, TurboQuant 2‑bit KV cache, an online quantization front‑end, IR enhancements, Model Runner V2 features, and a slew of new models, while providing detailed installation and upgrade guidance.

CUDA 13DeepSeek-V4FlashAttention
0 likes · 10 min read
vLLM 0.20 Arrives with DeepSeek V4 Support – What’s New?
Machine Heart
Machine Heart
Apr 25, 2026 · Artificial Intelligence

ICLR 2026 Award Winners: Two Outstanding Papers and Alec Radford’s Classic Work Honored with Test‑of‑Time Award

The ICLR 2026 conference announced its award winners, highlighting two Outstanding Papers—"Transformers are Inherently Succinct" and "LLMs Get Lost In Multi‑Turn Conversation"—a Honorable Mention, and two Test‑of‑Time awards for the seminal DCGAN and DDPG papers, after receiving about 19,000 submissions with a 28% acceptance rate.

Generative Adversarial NetworksICLR 2026Test of Time
0 likes · 9 min read
ICLR 2026 Award Winners: Two Outstanding Papers and Alec Radford’s Classic Work Honored with Test‑of‑Time Award
Machine Heart
Machine Heart
Apr 23, 2026 · Artificial Intelligence

First Survey of Attention Sink: From Utilization and Understanding to Elimination in Transformers

This survey reviews over 180 papers on the Attention Sink phenomenon in Transformers, outlining its three-stage evolution—from early exploitation to mechanistic interpretation and finally strategic mitigation—while detailing utilization tactics, theoretical explanations, removal techniques, and promising future research directions.

Attention SinkMitigationTransformers
0 likes · 9 min read
First Survey of Attention Sink: From Utilization and Understanding to Elimination in Transformers
Baobao Algorithm Notes
Baobao Algorithm Notes
Feb 25, 2026 · Artificial Intelligence

Exploring Qwen 3.5: Small‑Scale MoE Models, Architecture, and Deployment Guides

This article reviews the three open‑source Qwen 3.5 models—including a 35B MoE, a 122B MoE, and a 27B dense version—detailing their parameter layouts, core attention designs, context length, inference performance, hardware requirements, and provides step‑by‑step code examples for loading them with Hugging Face Transformers and vLLM.

AIMoEModel Deployment
0 likes · 10 min read
Exploring Qwen 3.5: Small‑Scale MoE Models, Architecture, and Deployment Guides
Old Zhang's AI Learning
Old Zhang's AI Learning
Feb 9, 2026 · Artificial Intelligence

GLM-5 Emerges First, Built on DeepSeek Tech, Triggering a 40% Stock Surge

An anonymous OpenRouter model dubbed "Pony Alpha" was verified as the new 745B‑parameter GLM-5, which reuses DeepSeek‑V3 architecture, supports sparse attention and multi‑token prediction, and has already caused a near‑40% jump in Zhipu AI’s stock while hinting at upcoming integration into the Transformers library.

DeepSeekGLM-5MoE
0 likes · 3 min read
GLM-5 Emerges First, Built on DeepSeek Tech, Triggering a 40% Stock Surge
Data Party THU
Data Party THU
Dec 18, 2025 · Artificial Intelligence

How Diffusion Models and Transformers Power the Next Generation of AI Video Generation

AI video generation now turns textual prompts into high‑quality clips using diffusion models and transformer‑based architectures; this article explains the underlying mathematics, training objectives, spatio‑temporal encoding, breakthroughs like consistent motion and physical realism, and discusses the technology’s opportunities and inherent risks.

AI video generationSpatio-temporal modelingTransformers
0 likes · 11 min read
How Diffusion Models and Transformers Power the Next Generation of AI Video Generation
Code Mala Tang
Code Mala Tang
Oct 9, 2025 · Artificial Intelligence

Fine‑Tune a Language Model for Band Trivia with Hugging Face PEFT

This tutorial walks through installing Python dependencies, preparing a JSON‑based QA dataset, and using Hugging Face's PEFT library to fine‑tune a small FLAN‑T5 model so it can answer questions about AC/DC and other bands without passing knowledge at inference time.

FAQ modelHugging FaceLLM fine-tuning
0 likes · 12 min read
Fine‑Tune a Language Model for Band Trivia with Hugging Face PEFT
Code Ape Tech Column
Code Ape Tech Column
Aug 29, 2025 · Backend Development

Master Spring Integration: Build Scalable Message‑Driven Systems with Ease

This article introduces Spring Integration, explains its core concepts such as messages, channels, endpoints, adapters, filters, and transformers, compares it with traditional middleware, and provides detailed XML and Java configuration examples for channels, endpoints, adapters, transformers, routers, integration patterns, interceptors, and a practical order‑processing workflow.

JavaMessage ChannelsSpring Integration
0 likes · 21 min read
Master Spring Integration: Build Scalable Message‑Driven Systems with Ease
Baobao Algorithm Notes
Baobao Algorithm Notes
Aug 1, 2025 · Artificial Intelligence

Unlocking Qwen3-Coder-30B: Features, Fast Start, and Agentic Coding Guide

The article introduces Qwen3‑Coder‑30B‑A3B‑Instruct (aka Qwen3‑Coder‑Flash), detailing its architecture, 256K‑to‑1M token context, agentic coding capabilities, installation steps with Transformers, sample code for tool use, optimal sampling parameters, and deployment tips across various runtimes.

AI coding assistantAgentic CodingDeep Learning
0 likes · 6 min read
Unlocking Qwen3-Coder-30B: Features, Fast Start, and Agentic Coding Guide
AI Frontier Lectures
AI Frontier Lectures
Jul 24, 2025 · Artificial Intelligence

State Space Models vs Transformers: Uncovering the Real Trade‑offs in Sequence Modeling

This article analyzes the fundamental differences between state space models (SSM) and Transformer architectures, highlighting their three core components, training efficiency, memory handling, tokenization impact, and empirical performance trade‑offs, and argues why SSMs can outperform Transformers on many sequence tasks.

AI ArchitectureSequence ModelingTransformers
0 likes · 19 min read
State Space Models vs Transformers: Uncovering the Real Trade‑offs in Sequence Modeling
Network Intelligence Research Center (NIRC)
Network Intelligence Research Center (NIRC)
Jul 13, 2025 · Artificial Intelligence

Getting Started with Hugging Face Transformers Trainer

This guide walks through the Hugging Face Transformers Trainer library, explaining its core features such as configurable training loops, mixed‑precision and gradient‑accumulation support, seamless distributed training via Accelerate and DeepSpeed, and provides a step‑by‑step example of converting a simple PyTorch CNN model to use Trainer.

AccelerateDeepSpeedDistributed Training
0 likes · 7 min read
Getting Started with Hugging Face Transformers Trainer
AI Algorithm Path
AI Algorithm Path
Jun 28, 2025 · Artificial Intelligence

Implementing Greedy and Beam Decoding for Large Language Models from Scratch

This article walks through the mechanics of greedy search and beam search in large language models, demonstrates both methods with GPT‑2 on the prompt "I have a dream", visualizes the decoding trees, compares their scores, and discusses the trade‑offs between efficiency and output quality.

Beam SearchGPT-2Greedy Search
0 likes · 16 min read
Implementing Greedy and Beam Decoding for Large Language Models from Scratch
Architect's Alchemy Furnace
Architect's Alchemy Furnace
May 7, 2025 · Artificial Intelligence

Which LLM Inference Engine Reigns Supreme? A Deep Dive into Transformers, vLLM, Llama.cpp, SGLang, MLX and Ollama

This article provides a comprehensive comparison of seven popular large‑language‑model inference engines—Transformers, vLLM, Llama.cpp, SGLang, MLX, Ollama and others—detailing their core features, performance characteristics, hardware compatibility, concurrency support, and ideal use‑cases, plus practical installation guidance for Xinference.

InferenceLLMMLX
0 likes · 17 min read
Which LLM Inference Engine Reigns Supreme? A Deep Dive into Transformers, vLLM, Llama.cpp, SGLang, MLX and Ollama
Rare Earth Juejin Tech Community
Rare Earth Juejin Tech Community
Feb 12, 2025 · Frontend Development

UnoCSS Installation, Basic Usage, Presets, Transformers, and Common Tips

This article provides a comprehensive guide to UnoCSS, covering installation in Vue 3 + Vite and Nuxt 3 projects, basic syntax and interactive documentation, Iconify SVG integration, various presets and transformers, as well as practical shortcuts, responsive design, safelist handling, custom rules, theming, and dark‑mode support.

CSS UtilityNuxtTransformers
0 likes · 22 min read
UnoCSS Installation, Basic Usage, Presets, Transformers, and Common Tips
DaTaobao Tech
DaTaobao Tech
Nov 13, 2024 · Artificial Intelligence

Understanding Neural Networks and Transformers: Principles, Implementation, and Applications

The article surveys neural networks from basic neuron operations and loss functions through deep architectures to the Transformer model, detailing embeddings, positional encoding, self‑attention, multi‑head attention, residual links, and encoder‑decoder design, and includes PyTorch code examples for linear regression, translation, and fine‑tuning Hugging Face’s MiniRBT for text classification.

AIAttention MechanismDeep Learning
0 likes · 44 min read
Understanding Neural Networks and Transformers: Principles, Implementation, and Applications
AntTech
AntTech
Nov 13, 2024 · Artificial Intelligence

Nimbus: Secure and Efficient Two‑Party Inference for Transformers

The article introduces Nimbus, a novel two‑party privacy‑preserving inference framework for Transformer models that accelerates linear‑layer matrix multiplication and activation‑function evaluation through an outer‑product encoding and distribution‑aware polynomial approximation, achieving 2.7‑4.7× speedup over prior work while maintaining model accuracy.

TransformersTwo-Party Computationcryptography
0 likes · 6 min read
Nimbus: Secure and Efficient Two‑Party Inference for Transformers
Data Thinking Notes
Data Thinking Notes
Sep 19, 2024 · Artificial Intelligence

Why AI Has Only a Seven-Year History—and What AI+ Means for the Future

In this speech, Wang Jian reflects on the evolution of artificial intelligence, arguing that modern AI is fundamentally different from its early concepts, emphasizing the pivotal roles of data, models, and infrastructure, and exploring the transformative impact of AI+, transformers, and cloud platforms on future innovation.

AI InfrastructureAI+Transformers
0 likes · 18 min read
Why AI Has Only a Seven-Year History—and What AI+ Means for the Future
Open Source Tech Hub
Open Source Tech Hub
Aug 22, 2024 · Artificial Intelligence

Unlock AI Power in PHP: A Hands‑On Guide to TransformersPHP

TransformersPHP brings Hugging Face’s Transformer models to PHP, enabling developers to run thousands of pre‑trained NLP models locally for tasks like text generation, summarisation, and translation, with simple installation, ONNX‑based execution, and a Python‑like pipeline API.

AINLPONNX
0 likes · 8 min read
Unlock AI Power in PHP: A Hands‑On Guide to TransformersPHP
Baobao Algorithm Notes
Baobao Algorithm Notes
Jun 14, 2024 · Artificial Intelligence

Boost LLM Speed: How KV Cache Quantization Cuts Memory While Preserving Quality

This article explains Hugging Face's KV cache quantization technique, detailing how it reduces memory usage for long‑context LLM generation, the underlying quantization methods, implementation steps in 🤗 Transformers, benchmark results versus fp16, and the trade‑offs between speed, memory, and accuracy.

LLMMemory OptimizationTransformers
0 likes · 15 min read
Boost LLM Speed: How KV Cache Quantization Cuts Memory While Preserving Quality
NewBeeNLP
NewBeeNLP
Feb 11, 2024 · Industry Insights

What 2023 Taught Us About LLMs and AI‑Guided Optimization

The author reviews a year of rapid progress in large language models, highlighting breakthrough papers such as Positional Interpolation, StreamingLLM, Deja Vu, and RLCD, and discusses how AI‑guided optimization techniques like SurCo, LANCER, and GenCo are reshaping research and industry applications.

AI OptimizationLLMTransformers
0 likes · 13 min read
What 2023 Taught Us About LLMs and AI‑Guided Optimization
21CTO
21CTO
Jan 31, 2024 · Artificial Intelligence

Unlocking LLaVA: A Hands‑On Guide to the Open‑Source Visual Language Model

This article introduces LLaVA, an open‑source large language‑visual assistant that replicates GPT‑4‑V capabilities, explains its architecture, training process, and key features, and provides step‑by‑step instructions for using the web demo, running it locally via Ollama or HuggingFace, and building a simple Gradio chatbot with code examples.

GradioLLaVAMultimodal AI
0 likes · 11 min read
Unlocking LLaVA: A Hands‑On Guide to the Open‑Source Visual Language Model
Baobao Algorithm Notes
Baobao Algorithm Notes
Oct 19, 2023 · Artificial Intelligence

Efficient LLM Deployment: Low‑Precision, Flash Attention, and Architecture Tricks

This article reviews the main memory and compute challenges of deploying large language models and presents practical solutions—including low‑precision arithmetic, flash attention, advanced positional embeddings, key‑value caching, and quantization techniques—backed by code examples and performance measurements on models such as OctoCoder.

Flash AttentionLLMTransformers
0 likes · 35 min read
Efficient LLM Deployment: Low‑Precision, Flash Attention, and Architecture Tricks
21CTO
21CTO
Sep 14, 2023 · Artificial Intelligence

Unlocking Falcon 180B: The World’s Most Powerful Open‑Source LLM

Falcon 180B, the newly released 180‑billion‑parameter open‑source LLM from TII, outperforms Llama 2 and rivals top commercial models across numerous benchmarks, offers free commercial use, and comes with detailed hardware requirements, prompt formats, and ready‑to‑run code examples for developers.

AI modelFalcon 180BHardware Requirements
0 likes · 9 min read
Unlocking Falcon 180B: The World’s Most Powerful Open‑Source LLM
21CTO
21CTO
Aug 16, 2023 · Artificial Intelligence

Top Python Libraries for Building Generative AI Apps: A Quick Reference

This cheat‑sheet summarizes the leading Python libraries for creating generative AI applications—covering OpenAI, Transformers, Gradio, LangChain, LlamaIndex and more—providing a concise, practical guide for both beginners and seasoned developers.

GradioLangChainLlamaIndex
0 likes · 3 min read
Top Python Libraries for Building Generative AI Apps: A Quick Reference
Rare Earth Juejin Tech Community
Rare Earth Juejin Tech Community
Jul 22, 2023 · Artificial Intelligence

Building an Image Classification Model with Transformers and TensorFlow: Theory, Code, and Practice

This article explains how to leverage computer‑vision techniques and deep‑learning frameworks such as Transformers and TensorFlow to build a complete image‑classification pipeline, covering the underlying RGB and CNN principles, model architecture, data preparation, training, and inference with runnable Python code.

CNNImage ClassificationPython
0 likes · 15 min read
Building an Image Classification Model with Transformers and TensorFlow: Theory, Code, and Practice
Architect
Architect
Jul 1, 2023 · Artificial Intelligence

Comprehensive Guide to Text Generation Decoding Strategies with HuggingFace Transformers

This tutorial explores various text generation decoding methods—including greedy search, beam search, top‑k/top‑p sampling, sample‑and‑rank, and group beam search—explaining their principles, providing detailed Python code examples, and comparing their use in modern large language models.

Beam SearchGreedy SearchSampling
0 likes · 59 min read
Comprehensive Guide to Text Generation Decoding Strategies with HuggingFace Transformers
Tencent Cloud Developer
Tencent Cloud Developer
Jun 1, 2023 · Artificial Intelligence

A Comprehensive Guide to Decoding Strategies for Text Generation with HuggingFace Transformers

This guide thoroughly explains the major decoding strategies for neural text generation in HuggingFace Transformers—including greedy, beam, diverse beam, sampling, top‑k, top‑p, sample‑and‑rank, beam sampling, and group beam search—detailing their principles, Python implementations with LogitsProcessor components, workflow diagrams, comparative analysis, and references to original research.

Beam SearchSamplingText Generation
0 likes · 60 min read
A Comprehensive Guide to Decoding Strategies for Text Generation with HuggingFace Transformers
Architect
Architect
Apr 24, 2023 · Artificial Intelligence

MOSS 003: Open‑Source Large Language Model Development, Training Data, and Plugin‑Enabled Deployment

The article details the evolution of the open‑source MOSS series—from OpenChat 001 to MOSS 003—covering data collection, fine‑tuning procedures, multilingual capabilities, plugin architecture, example code for inference, and upcoming releases, providing a comprehensive technical overview for AI practitioners.

AIMOSSPlugins
0 likes · 11 min read
MOSS 003: Open‑Source Large Language Model Development, Training Data, and Plugin‑Enabled Deployment
Architect
Architect
Feb 19, 2023 · Artificial Intelligence

Training a Positive Review Generator with RLHF and PPO

This article demonstrates how to apply Reinforcement Learning from Human Feedback (RLHF) using a sentiment‑analysis model as a reward function and Proximal Policy Optimization (PPO) to fine‑tune a language model that generates positive product reviews, complete with code snippets and experimental results.

Language ModelPPORLHF
0 likes · 10 min read
Training a Positive Review Generator with RLHF and PPO
Code DAO
Code DAO
May 19, 2022 · Artificial Intelligence

Semi‑Supervised Training Methods for Transformers

This article explains an end‑to‑end semi‑supervised training pipeline for Transformer‑based NLP models, detailing the unsupervised language‑model pre‑training, supervised fine‑tuning, and the internal architecture of embeddings, encoder layers, and downstream tasks such as text classification and NER.

BERTFine-tuningMasked Language Model
0 likes · 9 min read
Semi‑Supervised Training Methods for Transformers
DataFunTalk
DataFunTalk
Jan 3, 2022 · Artificial Intelligence

Top AI Stories of 2021: Large‑Scale Pretrained Models, Transformers, Multimodal AI, and Emerging Challenges

The article reviews the 2021 AI landscape, highlighting the race for ever‑larger pretrained models, the dominance of Transformers across modalities, the promise and limits of large models, the rise of multimodal systems, regulatory considerations, and the still‑nascent progress in reinforcement learning.

AI GovernanceAI industryMultimodal AI
0 likes · 12 min read
Top AI Stories of 2021: Large‑Scale Pretrained Models, Transformers, Multimodal AI, and Emerging Challenges
Code DAO
Code DAO
Dec 26, 2021 · Artificial Intelligence

Building a Vector‑Based Movie Recommendation System with Transformers

This tutorial walks through constructing a movie recommendation engine by downloading a dataset, cleaning and de‑duplicating entries, encoding plot summaries into vectors with transformer models, and performing nearest‑neighbor searches using scikit‑learn, while handling misspellings with Levenshtein distance.

Levenshtein distanceTransformersmovie recommendation
0 likes · 8 min read
Building a Vector‑Based Movie Recommendation System with Transformers
Meituan Technology Team
Meituan Technology Team
Apr 15, 2021 · Artificial Intelligence

Meituan Technical Team Shares CVPR 2021 Pre-lecture: Five Papers on Video Instance Segmentation, Facial Expression Recognition, Real-time Semantic Segmentation, Weakly Supervised Semantic Segmentation, and Multi-source Domain Adaptation

At a CVPR 2021 pre‑lecture, Meituan’s Visual Intelligence Center showcased five cutting‑edge papers—VisTR transformer‑based video instance segmentation, a feature‑decomposition facial expression recognizer, an accelerated BiSeNet for real‑time semantic segmentation, an embedded discriminative attention mechanism for weakly supervised segmentation, and a partial‑feature selection framework for multi‑source domain adaptation—highlighting the company’s large AI R&D team, university collaborations, real‑world deployment across its services, and ongoing recruitment.

AICVPR2021Facial Expression Recognition
0 likes · 10 min read
Meituan Technical Team Shares CVPR 2021 Pre-lecture: Five Papers on Video Instance Segmentation, Facial Expression Recognition, Real-time Semantic Segmentation, Weakly Supervised Semantic Segmentation, and Multi-source Domain Adaptation