Tagged articles

Transformers

39 articles · Page 1 of 1

Jun 3, 2026 · Artificial Intelligence

Fundamentals of NLP: Core Tasks, Tool Setup, and Hands‑On Projects

This article introduces the basics of Natural Language Processing, covering core tasks such as language understanding and generation, common applications, essential linguistic analyses, environment setup with Python libraries, hands‑on code examples for preprocessing, POS tagging, NER, sentiment analysis using both classical and transformer models, text generation with GPT‑2, and discusses challenges and Rust‑centric integration strategies.

NLPPythonSentiment Analysis

0 likes · 13 min read

Fundamentals of NLP: Core Tasks, Tool Setup, and Hands‑On Projects

Old Zhang's AI Learning

May 16, 2026 · Artificial Intelligence

vLLM 0.21.0 Arrives: Speculative Decoding Now Supports Reasoning Models

The vLLM 0.21.0 release brings five major updates—including Transformers v4 deprecation, a C++20 build requirement, KV offload with hybrid memory, speculative decoding that respects thinking budgets, and a Blackwell token‑speed backend—while offering detailed upgrade guidance for different user groups.

C++20KV cacheTransformers

0 likes · 12 min read

vLLM 0.21.0 Arrives: Speculative Decoding Now Supports Reasoning Models

Old Zhang's AI Learning

Apr 28, 2026 · Artificial Intelligence

vLLM 0.20 Arrives with DeepSeek V4 Support – What’s New?

The vLLM 0.20.0 release dramatically upgrades the inference engine with DeepSeek V4 support, default CUDA 13, PyTorch 2.11, Transformers v5 compatibility, FlashAttention 4 MLA prefill, TurboQuant 2‑bit KV cache, an online quantization front‑end, IR enhancements, Model Runner V2 features, and a slew of new models, while providing detailed installation and upgrade guidance.

CUDA 13DeepSeek-V4FlashAttention

0 likes · 10 min read

vLLM 0.20 Arrives with DeepSeek V4 Support – What’s New?

Machine Heart

Apr 25, 2026 · Artificial Intelligence

ICLR 2026 Award Winners: Two Outstanding Papers and Alec Radford’s Classic Work Honored with Test‑of‑Time Award

The ICLR 2026 conference announced its award winners, highlighting two Outstanding Papers—"Transformers are Inherently Succinct" and "LLMs Get Lost In Multi‑Turn Conversation"—a Honorable Mention, and two Test‑of‑Time awards for the seminal DCGAN and DDPG papers, after receiving about 19,000 submissions with a 28% acceptance rate.

Generative Adversarial NetworksICLR 2026Test of Time

0 likes · 9 min read

ICLR 2026 Award Winners: Two Outstanding Papers and Alec Radford’s Classic Work Honored with Test‑of‑Time Award

Machine Heart

Apr 23, 2026 · Artificial Intelligence

First Survey of Attention Sink: From Utilization and Understanding to Elimination in Transformers

This survey reviews over 180 papers on the Attention Sink phenomenon in Transformers, outlining its three-stage evolution—from early exploitation to mechanistic interpretation and finally strategic mitigation—while detailing utilization tactics, theoretical explanations, removal techniques, and promising future research directions.

Attention SinkTransformersmitigation

0 likes · 9 min read

First Survey of Attention Sink: From Utilization and Understanding to Elimination in Transformers

Baobao Algorithm Notes

Feb 25, 2026 · Artificial Intelligence

Exploring Qwen 3.5: Small‑Scale MoE Models, Architecture, and Deployment Guides

This article reviews the three open‑source Qwen 3.5 models—including a 35B MoE, a 122B MoE, and a 27B dense version—detailing their parameter layouts, core attention designs, context length, inference performance, hardware requirements, and provides step‑by‑step code examples for loading them with Hugging Face Transformers and vLLM.

AILarge Language ModelMoE

0 likes · 10 min read

Exploring Qwen 3.5: Small‑Scale MoE Models, Architecture, and Deployment Guides

Old Zhang's AI Learning

Feb 9, 2026 · Artificial Intelligence

GLM-5 Emerges First, Built on DeepSeek Tech, Triggering a 40% Stock Surge

An anonymous OpenRouter model dubbed "Pony Alpha" was verified as the new 745B‑parameter GLM-5, which reuses DeepSeek‑V3 architecture, supports sparse attention and multi‑token prediction, and has already caused a near‑40% jump in Zhipu AI’s stock while hinting at upcoming integration into the Transformers library.

DeepSeekGLM-5Large Language Model

0 likes · 3 min read

GLM-5 Emerges First, Built on DeepSeek Tech, Triggering a 40% Stock Surge

Old Zhang's AI Learning

Feb 9, 2026 · Artificial Intelligence

Qwen 3.5 Emerges; ByteDance and DeepSeek Set to Release Flagship LLMs for Spring Festival

The LMSYS Chatbot Arena now shows Qwen 3.5 (codenamed Karp-001/002) alongside ByteDance's Pisces‑llm models and DeepSeek‑V4, with new Transformers configs and hints of an Active‑3B MoE architecture, suggesting a fresh wave of flagship large language models arriving for the Spring Festival.

ByteDanceDeepSeekMoE

0 likes · 4 min read

Qwen 3.5 Emerges; ByteDance and DeepSeek Set to Release Flagship LLMs for Spring Festival

Data Party THU

Dec 18, 2025 · Artificial Intelligence

How Diffusion Models and Transformers Power the Next Generation of AI Video Generation

AI video generation now turns textual prompts into high‑quality clips using diffusion models and transformer‑based architectures; this article explains the underlying mathematics, training objectives, spatio‑temporal encoding, breakthroughs like consistent motion and physical realism, and discusses the technology’s opportunities and inherent risks.

AI video generationDiffusion ModelsSpatio-temporal modeling

0 likes · 11 min read

How Diffusion Models and Transformers Power the Next Generation of AI Video Generation

Code Mala Tang

Oct 9, 2025 · Artificial Intelligence

Fine‑Tune a Language Model for Band Trivia with Hugging Face PEFT

This tutorial walks through installing Python dependencies, preparing a JSON‑based QA dataset, and using Hugging Face's PEFT library to fine‑tune a small FLAN‑T5 model so it can answer questions about AC/DC and other bands without passing knowledge at inference time.

FAQ modelHugging FaceLLM fine-tuning

0 likes · 12 min read

Fine‑Tune a Language Model for Band Trivia with Hugging Face PEFT

Code Ape Tech Column

Aug 29, 2025 · Backend Development

Master Spring Integration: Build Scalable Message‑Driven Systems with Ease

This article introduces Spring Integration, explains its core concepts such as messages, channels, endpoints, adapters, filters, and transformers, compares it with traditional middleware, and provides detailed XML and Java configuration examples for channels, endpoints, adapters, transformers, routers, integration patterns, interceptors, and a practical order‑processing workflow.

Enterprise IntegrationJavaMessage Channels

0 likes · 21 min read

Master Spring Integration: Build Scalable Message‑Driven Systems with Ease

Baobao Algorithm Notes

Aug 1, 2025 · Artificial Intelligence

Unlocking Qwen3-Coder-30B: Features, Fast Start, and Agentic Coding Guide

The article introduces Qwen3‑Coder‑30B‑A3B‑Instruct (aka Qwen3‑Coder‑Flash), detailing its architecture, 256K‑to‑1M token context, agentic coding capabilities, installation steps with Transformers, sample code for tool use, optimal sampling parameters, and deployment tips across various runtimes.

AI coding assistantLarge Language ModelQwen3

0 likes · 6 min read

Unlocking Qwen3-Coder-30B: Features, Fast Start, and Agentic Coding Guide

AI Frontier Lectures

Jul 24, 2025 · Artificial Intelligence

State Space Models vs Transformers: Uncovering the Real Trade‑offs in Sequence Modeling

This article analyzes the fundamental differences between state space models (SSM) and Transformer architectures, highlighting their three core components, training efficiency, memory handling, tokenization impact, and empirical performance trade‑offs, and argues why SSMs can outperform Transformers on many sequence tasks.

AI ArchitectureTokenizationTransformers

0 likes · 19 min read

State Space Models vs Transformers: Uncovering the Real Trade‑offs in Sequence Modeling

Network Intelligence Research Center (NIRC)

Jul 13, 2025 · Artificial Intelligence

Getting Started with Hugging Face Transformers Trainer

This guide walks through the Hugging Face Transformers Trainer library, explaining its core features such as configurable training loops, mixed‑precision and gradient‑accumulation support, seamless distributed training via Accelerate and DeepSpeed, and provides a step‑by‑step example of converting a simple PyTorch CNN model to use Trainer.

AccelerateDeepSpeedHugging Face

0 likes · 7 min read

Getting Started with Hugging Face Transformers Trainer

AI Algorithm Path

Jun 28, 2025 · Artificial Intelligence

Implementing Greedy and Beam Decoding for Large Language Models from Scratch

This article walks through the mechanics of greedy search and beam search in large language models, demonstrates both methods with GPT‑2 on the prompt "I have a dream", visualizes the decoding trees, compares their scores, and discusses the trade‑offs between efficiency and output quality.

Beam SearchGPT-2Greedy Search

0 likes · 16 min read

Implementing Greedy and Beam Decoding for Large Language Models from Scratch

AIWalker

May 22, 2025 · Artificial Intelligence

174 Innovative Attention Mechanism Tweaks Backed by Leading Researchers (Yao Qizhi Included)

This article surveys 174 recent attention‑mechanism modifications from 2024‑2025, summarizing each paper’s core idea, performance gains, and code links while offering practical guidance on selecting and integrating the right attention variant for tasks such as object detection.

AITransformersattention mechanisms

0 likes · 9 min read

174 Innovative Attention Mechanism Tweaks Backed by Leading Researchers (Yao Qizhi Included)

Architect's Alchemy Furnace

May 7, 2025 · Artificial Intelligence

Which LLM Inference Engine Reigns Supreme? A Deep Dive into Transformers, vLLM, Llama.cpp, SGLang, MLX and Ollama

This article provides a comprehensive comparison of seven popular large‑language‑model inference engines—Transformers, vLLM, Llama.cpp, SGLang, MLX, Ollama and others—detailing their core features, performance characteristics, hardware compatibility, concurrency support, and ideal use‑cases, plus practical installation guidance for Xinference.

LLMMLXSGLang

0 likes · 17 min read

Which LLM Inference Engine Reigns Supreme? A Deep Dive into Transformers, vLLM, Llama.cpp, SGLang, MLX and Ollama

Rare Earth Juejin Tech Community

Feb 12, 2025 · Frontend Development

UnoCSS Installation, Basic Usage, Presets, Transformers, and Common Tips

This article provides a comprehensive guide to UnoCSS, covering installation in Vue 3 + Vite and Nuxt 3 projects, basic syntax and interactive documentation, Iconify SVG integration, various presets and transformers, as well as practical shortcuts, responsive design, safelist handling, custom rules, theming, and dark‑mode support.

CSS UtilityNuxtTransformers

0 likes · 22 min read

UnoCSS Installation, Basic Usage, Presets, Transformers, and Common Tips

DaTaobao Tech

Nov 13, 2024 · Artificial Intelligence

Understanding Neural Networks and Transformers: Principles, Implementation, and Applications

The article surveys neural networks from basic neuron operations and loss functions through deep architectures to the Transformer model, detailing embeddings, positional encoding, self‑attention, multi‑head attention, residual links, and encoder‑decoder design, and includes PyTorch code examples for linear regression, translation, and fine‑tuning Hugging Face’s MiniRBT for text classification.

AIAttention MechanismNLP

0 likes · 44 min read

Understanding Neural Networks and Transformers: Principles, Implementation, and Applications

AntTech

Nov 13, 2024 · Artificial Intelligence

Nimbus: Secure and Efficient Two‑Party Inference for Transformers

The article introduces Nimbus, a novel two‑party privacy‑preserving inference framework for Transformer models that accelerates linear‑layer matrix multiplication and activation‑function evaluation through an outer‑product encoding and distribution‑aware polynomial approximation, achieving 2.7‑4.7× speedup over prior work while maintaining model accuracy.

TransformersTwo-Party Computationcryptography

0 likes · 6 min read

Nimbus: Secure and Efficient Two‑Party Inference for Transformers

Data Thinking Notes

Sep 19, 2024 · Artificial Intelligence

Why AI Has Only a Seven-Year History—and What AI+ Means for the Future

In this speech, Wang Jian reflects on the evolution of artificial intelligence, arguing that modern AI is fundamentally different from its early concepts, emphasizing the pivotal roles of data, models, and infrastructure, and exploring the transformative impact of AI+, transformers, and cloud platforms on future innovation.

AI InfrastructureAI+Cloud Computing

0 likes · 18 min read

Why AI Has Only a Seven-Year History—and What AI+ Means for the Future

Open Source Tech Hub

Aug 22, 2024 · Artificial Intelligence

Unlock AI Power in PHP: A Hands‑On Guide to TransformersPHP

TransformersPHP brings Hugging Face’s Transformer models to PHP, enabling developers to run thousands of pre‑trained NLP models locally for tasks like text generation, summarisation, and translation, with simple installation, ONNX‑based execution, and a Python‑like pipeline API.

AINLPONNX

0 likes · 8 min read

Unlock AI Power in PHP: A Hands‑On Guide to TransformersPHP

Baobao Algorithm Notes

Jun 14, 2024 · Artificial Intelligence

Boost LLM Speed: How KV Cache Quantization Cuts Memory While Preserving Quality

This article explains Hugging Face's KV cache quantization technique, detailing how it reduces memory usage for long‑context LLM generation, the underlying quantization methods, implementation steps in 🤗 Transformers, benchmark results versus fp16, and the trade‑offs between speed, memory, and accuracy.

LLMMemory optimizationQuantization

0 likes · 15 min read

Boost LLM Speed: How KV Cache Quantization Cuts Memory While Preserving Quality

DaTaobao Tech

May 27, 2024 · Artificial Intelligence

Sampling Strategies for Large Language Models: Greedy, Beam, Top‑K, Top‑p, and Temperature

The article explains how greedy search, beam search, Top‑K, Top‑p (nucleus) sampling, and temperature each shape large language model generation, comparing their effects on repetition, diversity, and creativity, and provides concise TensorFlow‑based code examples illustrating these inference‑time strategies.

AILLMPython

0 likes · 15 min read

Sampling Strategies for Large Language Models: Greedy, Beam, Top‑K, Top‑p, and Temperature

NewBeeNLP

Feb 11, 2024 · Industry Insights

What 2023 Taught Us About LLMs and AI‑Guided Optimization

The author reviews a year of rapid progress in large language models, highlighting breakthrough papers such as Positional Interpolation, StreamingLLM, Deja Vu, and RLCD, and discusses how AI‑guided optimization techniques like SurCo, LANCER, and GenCo are reshaping research and industry applications.

LLMTransformersai-optimization

0 likes · 13 min read

What 2023 Taught Us About LLMs and AI‑Guided Optimization

21CTO

Jan 31, 2024 · Artificial Intelligence

Unlocking LLaVA: A Hands‑On Guide to the Open‑Source Visual Language Model

This article introduces LLaVA, an open‑source large language‑visual assistant that replicates GPT‑4‑V capabilities, explains its architecture, training process, and key features, and provides step‑by‑step instructions for using the web demo, running it locally via Ollama or HuggingFace, and building a simple Gradio chatbot with code examples.

GradioLLaVAMultimodal AI

0 likes · 11 min read

Unlocking LLaVA: A Hands‑On Guide to the Open‑Source Visual Language Model

Rare Earth Juejin Tech Community

Dec 15, 2023 · Artificial Intelligence

AIGC Tutorial: Tokenization, POS Tagging, and Named Entity Recognition with Transformers, NLTK, and spaCy

This tutorial introduces AIGC concepts and walks through practical implementations of tokenization, part‑of‑speech tagging, and named entity recognition using the Transformers library, NLTK, and spaCy on Google Colab, complete with code snippets and visual results.

AIGCNLPNLTK

0 likes · 10 min read

AIGC Tutorial: Tokenization, POS Tagging, and Named Entity Recognition with Transformers, NLTK, and spaCy

Baobao Algorithm Notes

Oct 19, 2023 · Artificial Intelligence

Efficient LLM Deployment: Low‑Precision, Flash Attention, and Architecture Tricks

This article reviews the main memory and compute challenges of deploying large language models and presents practical solutions—including low‑precision arithmetic, flash attention, advanced positional embeddings, key‑value caching, and quantization techniques—backed by code examples and performance measurements on models such as OctoCoder.

Flash AttentionLLMQuantization

0 likes · 35 min read

Efficient LLM Deployment: Low‑Precision, Flash Attention, and Architecture Tricks

21CTO

Sep 14, 2023 · Artificial Intelligence

Unlocking Falcon 180B: The World’s Most Powerful Open‑Source LLM

Falcon 180B, the newly released 180‑billion‑parameter open‑source LLM from TII, outperforms Llama 2 and rivals top commercial models across numerous benchmarks, offers free commercial use, and comes with detailed hardware requirements, prompt formats, and ready‑to‑run code examples for developers.

AI modelFalcon 180BHardware Requirements

0 likes · 9 min read

Unlocking Falcon 180B: The World’s Most Powerful Open‑Source LLM

21CTO

Aug 16, 2023 · Artificial Intelligence

Top Python Libraries for Building Generative AI Apps: A Quick Reference

This cheat‑sheet summarizes the leading Python libraries for creating generative AI applications—covering OpenAI, Transformers, Gradio, LangChain, LlamaIndex and more—providing a concise, practical guide for both beginners and seasoned developers.

GradioLangChainLlamaIndex

0 likes · 3 min read

Top Python Libraries for Building Generative AI Apps: A Quick Reference

Rare Earth Juejin Tech Community

Jul 22, 2023 · Artificial Intelligence

Building an Image Classification Model with Transformers and TensorFlow: Theory, Code, and Practice

This article explains how to leverage computer‑vision techniques and deep‑learning frameworks such as Transformers and TensorFlow to build a complete image‑classification pipeline, covering the underlying RGB and CNN principles, model architecture, data preparation, training, and inference with runnable Python code.

CNNPythonTensorFlow

0 likes · 15 min read

Building an Image Classification Model with Transformers and TensorFlow: Theory, Code, and Practice

Architect

Jul 1, 2023 · Artificial Intelligence

Comprehensive Guide to Text Generation Decoding Strategies with HuggingFace Transformers

This tutorial explores various text generation decoding methods—including greedy search, beam search, top‑k/top‑p sampling, sample‑and‑rank, and group beam search—explaining their principles, providing detailed Python code examples, and comparing their use in modern large language models.

Beam SearchGreedy SearchHuggingFace

0 likes · 59 min read

Comprehensive Guide to Text Generation Decoding Strategies with HuggingFace Transformers

Tencent Cloud Developer

Jun 1, 2023 · Artificial Intelligence

A Comprehensive Guide to Decoding Strategies for Text Generation with HuggingFace Transformers

This guide thoroughly explains the major decoding strategies for neural text generation in HuggingFace Transformers—including greedy, beam, diverse beam, sampling, top‑k, top‑p, sample‑and‑rank, beam sampling, and group beam search—detailing their principles, Python implementations with LogitsProcessor components, workflow diagrams, comparative analysis, and references to original research.

Beam SearchHuggingFaceText Generation

0 likes · 60 min read

A Comprehensive Guide to Decoding Strategies for Text Generation with HuggingFace Transformers

Architect

Apr 24, 2023 · Artificial Intelligence

MOSS 003: Open‑Source Large Language Model Development, Training Data, and Plugin‑Enabled Deployment

The article details the evolution of the open‑source MOSS series—from OpenChat 001 to MOSS 003—covering data collection, fine‑tuning procedures, multilingual capabilities, plugin architecture, example code for inference, and upcoming releases, providing a comprehensive technical overview for AI practitioners.

AILarge Language ModelMOSS

0 likes · 11 min read

MOSS 003: Open‑Source Large Language Model Development, Training Data, and Plugin‑Enabled Deployment

Architect

Feb 19, 2023 · Artificial Intelligence

Training a Positive Review Generator with RLHF and PPO

This article demonstrates how to apply Reinforcement Learning from Human Feedback (RLHF) using a sentiment‑analysis model as a reward function and Proximal Policy Optimization (PPO) to fine‑tune a language model that generates positive product reviews, complete with code snippets and experimental results.

Language ModelPPORLHF

0 likes · 10 min read

Training a Positive Review Generator with RLHF and PPO

Code DAO

May 19, 2022 · Artificial Intelligence

Semi‑Supervised Training Methods for Transformers

This article explains an end‑to‑end semi‑supervised training pipeline for Transformer‑based NLP models, detailing the unsupervised language‑model pre‑training, supervised fine‑tuning, and the internal architecture of embeddings, encoder layers, and downstream tasks such as text classification and NER.

BERTMasked Language ModelNLP

0 likes · 9 min read

Semi‑Supervised Training Methods for Transformers

DataFunTalk

Jan 3, 2022 · Artificial Intelligence

Building a Vector‑Based Movie Recommendation System with Transformers

This tutorial walks through constructing a movie recommendation engine by downloading a dataset, cleaning and de‑duplicating entries, encoding plot summaries into vectors with transformer models, and performing nearest‑neighbor searches using scikit‑learn, while handling misspellings with Levenshtein distance.

Levenshtein distancePandasScikit-learn

0 likes · 8 min read

Building a Vector‑Based Movie Recommendation System with Transformers

Meituan Technology Team

Apr 15, 2021 · Artificial Intelligence

Meituan Technical Team Shares CVPR 2021 Pre-lecture: Five Papers on Video Instance Segmentation, Facial Expression Recognition, Real-time Semantic Segmentation, Weakly Supervised Semantic Segmentation, and Multi-source Domain Adaptation

At a CVPR 2021 pre‑lecture, Meituan’s Visual Intelligence Center showcased five cutting‑edge papers—VisTR transformer‑based video instance segmentation, a feature‑decomposition facial expression recognizer, an accelerated BiSeNet for real‑time semantic segmentation, an embedded discriminative attention mechanism for weakly supervised segmentation, and a partial‑feature selection framework for multi‑source domain adaptation—highlighting the company’s large AI R&D team, university collaborations, real‑world deployment across its services, and ongoing recruitment.

AICVPR2021Domain Adaptation

0 likes · 10 min read

Meituan Technical Team Shares CVPR 2021 Pre-lecture: Five Papers on Video Instance Segmentation, Facial Expression Recognition, Real-time Semantic Segmentation, Weakly Supervised Semantic Segmentation, and Multi-source Domain Adaptation