Tagged articles
383 articles
Page 2 of 4
AI2ML AI to Machine Learning
AI2ML AI to Machine Learning
Sep 30, 2025 · Artificial Intelligence

Dynamic Multimodal Video Generation: Prioritizing Stability and High Quality

The article surveys the evolution of video generation models—from early GANs and DCGAN to diffusion‑based approaches like Stable Diffusion and DiT—highlighting how stability, high quality, massive compute, and multimodal data pipelines are shaping the current and future paths of dynamic multimodal video generation.

Latent DiffusionMultimodal AIStable Diffusion
0 likes · 7 min read
Dynamic Multimodal Video Generation: Prioritizing Stability and High Quality
HyperAI Super Neural
HyperAI Super Neural
Sep 30, 2025 · Artificial Intelligence

OnePiece: Applying LLM‑Style Reasoning to Item‑ID Sequences for Generative Recommendation

The article presents the OnePiece framework, which injects LLM‑style context engineering and latent reasoning into item‑ID based search‑and‑recommendation models, details the design choices, training tricks, attention analysis, and reports online gains of around 1% GMV and ad revenue, offering a thorough technical dissection of generative recommendation in industrial settings.

Context EngineeringGenerative RecommendationLLM Reasoning
0 likes · 31 min read
OnePiece: Applying LLM‑Style Reasoning to Item‑ID Sequences for Generative Recommendation
Volcano Engine Developer Services
Volcano Engine Developer Services
Sep 28, 2025 · Artificial Intelligence

Demystifying AI Jargon: A Beginner’s Guide to Large Language Models

This guide breaks down the complex terminology of large language models—explaining tokens, transformers, self‑attention, RAG, scaling laws, dense vs. sparse architectures, and training stages—using clear analogies and step‑by‑step explanations so readers can confidently understand and work with modern AI systems.

AI fundamentalsModel TrainingRAG
0 likes · 35 min read
Demystifying AI Jargon: A Beginner’s Guide to Large Language Models
Wu Shixiong's Large Model Academy
Wu Shixiong's Large Model Academy
Sep 26, 2025 · Artificial Intelligence

Crack Large-Model Interviews: Master Positional Encoding, Residuals, LayerNorm & FFN

Preparing for large-model interview? This guide reveals why interviewers probe seemingly minor components—positional encoding, residual connections, layer normalization, and feed-forward networks—explains each technique's purpose, variants, and how to answer confidently, plus practical tips and a learning roadmap to boost your chances.

FFNInterview TipsLayerNorm
0 likes · 8 min read
Crack Large-Model Interviews: Master Positional Encoding, Residuals, LayerNorm & FFN
Wu Shixiong's Large Model Academy
Wu Shixiong's Large Model Academy
Sep 25, 2025 · Artificial Intelligence

Master Self-Attention & Multi-Head Attention for Large Model Interviews

This guide breaks down the core logic, computation steps, formulas, and common interview questions about Self‑Attention and Multi‑Head Attention in Transformers, offering concrete explanations, dimensional examples, and practical answering techniques to help candidates ace large‑model algorithm interviews.

Deep LearningInterview TipsSelf-Attention
0 likes · 8 min read
Master Self-Attention & Multi-Head Attention for Large Model Interviews
Data Party THU
Data Party THU
Sep 21, 2025 · Artificial Intelligence

Building a Mini‑DeepSeek‑V3: Transformer Block and MTP Implementation on Limited Compute

This article walks through the design and implementation of a Mini‑DeepSeek‑V3 language model, detailing how to assemble the core Transformer block, integrate Multi‑Token Prediction (MTP) modules, construct the overall architecture, and compute the combined loss—all using modest GPU resources and a single‑card or DDP training setup.

AIDeepSeekMTP
0 likes · 12 min read
Building a Mini‑DeepSeek‑V3: Transformer Block and MTP Implementation on Limited Compute
Bighead's Algorithm Notes
Bighead's Algorithm Notes
Sep 16, 2025 · Artificial Intelligence

Paper Review: HGTS‑Former – A Hierarchical Hypergraph Transformer for Multivariate Time‑Series Analysis

The HGTS‑Former model introduces a hierarchical hypergraph backbone combined with a Transformer to capture high‑order and dynamic dependencies in multivariate time‑series data, and experimental results on eight datasets show it consistently outperforms state‑of‑the‑art methods in both long‑term forecasting and interpolation tasks.

HGTS-FormerHypergraphTransformer
0 likes · 11 min read
Paper Review: HGTS‑Former – A Hierarchical Hypergraph Transformer for Multivariate Time‑Series Analysis
Architect
Architect
Sep 16, 2025 · Artificial Intelligence

Why Transformers Outperform RNNs: A Beginner’s Guide to Attention and Architecture

This article introduces the Transformer architecture, explaining its attention mechanism, encoder‑decoder design, training and inference processes, and why it surpasses RNN‑based models, while also covering common applications and variations in natural language processing.

Deep LearningModel architectureNLP
0 likes · 13 min read
Why Transformers Outperform RNNs: A Beginner’s Guide to Attention and Architecture
Sohu Smart Platform Tech Team
Sohu Smart Platform Tech Team
Sep 12, 2025 · Artificial Intelligence

How AI is Revolutionizing Video Creation: From Text‑to‑Video to Real‑Time Editing

This article systematically explores the technical evolution, core principles, and emerging innovations of AI‑generated video, covering generation methods, GAN and diffusion models, transformer‑based DiT architectures, efficiency‑boosting NCR, audio‑visual V2A integration, and real‑world applications across media, education, and commerce.

AI video generationGANNCR
0 likes · 25 min read
How AI is Revolutionizing Video Creation: From Text‑to‑Video to Real‑Time Editing
Architects Research Society
Architects Research Society
Sep 4, 2025 · Artificial Intelligence

Choosing the Right Generative AI Model: Transformers, Diffusion, GANs & RNNs Explained

This article outlines the four dominant generative AI architectures—Transformers, diffusion models, GANs, and RNNs—explaining their core mechanisms, key capabilities, and typical application domains such as chatbots, image creation, deep‑fake media, and time‑series analysis, helping readers choose the right model for their needs.

AI applicationsGANRNN
0 likes · 3 min read
Choosing the Right Generative AI Model: Transformers, Diffusion, GANs & RNNs Explained
Data Party THU
Data Party THU
Sep 3, 2025 · Artificial Intelligence

Unlocking Large Model Secrets: Transformers, MoE, Fine‑Tuning, RAG & KV Caching

This article provides a comprehensive technical overview of today’s large‑model ecosystem, covering the Transformer architecture, Mixture‑of‑Experts extensions, five fine‑tuning methods, the evolution from traditional RAG to agentic RAG, classic agent design patterns, diverse text‑chunking strategies, and the KV‑cache optimization that accelerates inference.

Agentic AIFine‑tuningKV cache
0 likes · 13 min read
Unlocking Large Model Secrets: Transformers, MoE, Fine‑Tuning, RAG & KV Caching
Data Party THU
Data Party THU
Sep 2, 2025 · Artificial Intelligence

Inside Large Action Models (LAMs): Architecture, Code, and Enterprise Automation

This article provides a comprehensive technical analysis of Large Action Models (LAMs), detailing their neuro‑symbolic architecture, core components such as LAMProcessor, NeuroSymbolicLayer, ActionExecutor, and learning modules, and demonstrates how they enable intelligent, end‑to‑end automation of enterprise tasks.

AI automationEnterprise AINeuro-symbolic AI
0 likes · 30 min read
Inside Large Action Models (LAMs): Architecture, Code, and Enterprise Automation
Bighead's Algorithm Notes
Bighead's Algorithm Notes
Aug 26, 2025 · Artificial Intelligence

SSPT: Custom Pre‑training Tasks for Stock Data Boost Stock Selection Performance

This article reviews the SSPT paper, which introduces three stock‑specific pre‑training tasks—stock code classification, sector classification, and moving‑average prediction—built on a two‑layer Transformer, and demonstrates through extensive experiments across five market datasets that these tasks consistently improve cumulative return and Sharpe ratio over baselines.

Financial AITime SeriesTransformer
0 likes · 11 min read
SSPT: Custom Pre‑training Tasks for Stock Data Boost Stock Selection Performance
AntTech
AntTech
Aug 21, 2025 · Artificial Intelligence

How the Mixture-of-Queries Transformer Tackles Camouflaged Instance Segmentation

The IJCAI 2025 paper showcase introduces the Mixture‑of‑Queries Transformer, a novel model that combines frequency‑domain feature enhancement with collaborative query decoding to achieve state‑of‑the‑art camouflaged instance segmentation across multiple datasets.

Computer VisionIJCAI 2025Transformer
0 likes · 4 min read
How the Mixture-of-Queries Transformer Tackles Camouflaged Instance Segmentation
Wu Shixiong's Large Model Academy
Wu Shixiong's Large Model Academy
Aug 20, 2025 · Artificial Intelligence

Mastering Large‑Model Interview Questions: MHA, KV‑Cache, Scaled Dot‑Product, and Speculative Decoding

This guide walks through common large‑model interview challenges, including a hands‑on implementation of multi‑head attention with KV‑cache, the mathematical reason for scaling by sqrt(dₖ), a concise speculative decoding algorithm, and systematic debugging steps for NaN loss during training.

KV cacheLarge Model InterviewMulti‑Head Attention
0 likes · 14 min read
Mastering Large‑Model Interview Questions: MHA, KV‑Cache, Scaled Dot‑Product, and Speculative Decoding
Tencent Cloud Developer
Tencent Cloud Developer
Aug 19, 2025 · Artificial Intelligence

Demystifying LLMs: From Transformers to Agents, Prompts, and Function Calling

This article explains the fundamentals of large language models, covering transformer self‑attention, prompt engineering, API usage with temperature and tool parameters, function calling, agent architectures, the Model Context Protocol (MCP), Agent‑to‑Agent (A2A) communication, and future AI programming roles.

A2AAI agentsFunction Calling
0 likes · 11 min read
Demystifying LLMs: From Transformers to Agents, Prompts, and Function Calling
Qborfy AI
Qborfy AI
Aug 12, 2025 · Artificial Intelligence

What Powers Large Language Models? A Deep Dive into LLM Architecture and Scaling

This article explains how massive Transformer‑based large language models compress text data into mathematical representations, why scale, self‑attention, and training paradigms enable emergent general intelligence, and walks through tokenization, embedding, multi‑layer attention, architecture choices, energy costs, and hallucination mitigation.

AIEmbeddingLLM
0 likes · 6 min read
What Powers Large Language Models? A Deep Dive into LLM Architecture and Scaling
Qborfy AI
Qborfy AI
Aug 8, 2025 · Artificial Intelligence

Why Transformers Revolutionized AI: A Deep Dive into Self‑Attention

This article explains how the Transformer model replaces sequential RNN processing with parallel self‑attention, detailing its core components, positional encoding, encoder‑decoder workflow, industry impact, and surprising facts such as training speed gains and energy efficiency.

AIDeep LearningModel architecture
0 likes · 5 min read
Why Transformers Revolutionized AI: A Deep Dive into Self‑Attention
Alibaba Cloud Developer
Alibaba Cloud Developer
Aug 6, 2025 · Artificial Intelligence

How Transformers Revolutionize Sequence Modeling: From RNN Limits to Self‑Attention Mastery

This article explains why Transformer models surpass traditional RNN‑based seq2seq architectures by introducing self‑attention, multi‑head attention, and positional encoding, detailing the inner workings of encoders, decoders, and attention mechanisms, and comparing their advantages and limitations across NLP and vision tasks.

GRULSTMRNN
0 likes · 30 min read
How Transformers Revolutionize Sequence Modeling: From RNN Limits to Self‑Attention Mastery
Data Party THU
Data Party THU
Aug 5, 2025 · Artificial Intelligence

Why State Space Models May Outperform Transformers: A Deep Dive

The article provides a comprehensive technical analysis of state space models (SSM) versus Transformers, covering their core mechanisms, three essential design factors, training efficiency, scaling behavior, tokenization debates, and experimental evidence that highlights the trade‑offs and potential advantages of SSMs in modern AI systems.

MambaState Space ModelTransformer
0 likes · 21 min read
Why State Space Models May Outperform Transformers: A Deep Dive
Baobao Algorithm Notes
Baobao Algorithm Notes
Aug 4, 2025 · Artificial Intelligence

Why GPT‑OSS Chooses a 64‑Dimensional Attention Head and 2880 Hidden Size

This article analyzes the surprising design choices of the rumored GPT‑OSS 120B model, explaining the rationale behind a 64‑dimensional attention head, the equal hidden and intermediate sizes, and other quirky parameters such as MLP bias and KV‑sink SWA, backed by theoretical formulas and empirical benchmarks.

Attention HeadGPT-OSSMLP Ratio
0 likes · 13 min read
Why GPT‑OSS Chooses a 64‑Dimensional Attention Head and 2880 Hidden Size
Data Thinking Notes
Data Thinking Notes
Jul 30, 2025 · Artificial Intelligence

Tracing the Evolution of Large Language Models: Key Papers and Breakthroughs

This article reviews the most influential papers in large language model research since 2017, covering foundational works such as the Transformer, GPT‑3, BERT, scaling laws, and recent innovations like FlashAttention, Mamba, and QLoRA, highlighting their core contributions and impact on AI development.

AI researchModel OptimizationTransformer
0 likes · 28 min read
Tracing the Evolution of Large Language Models: Key Papers and Breakthroughs
Data Party THU
Data Party THU
Jul 29, 2025 · Artificial Intelligence

Can 2‑Simplicial Attention Outperform Standard Transformers? A Deep Dive

This article reviews Meta's rotation‑invariant 2‑simplicial attention, explains its trilinear formulation and windowed implementation, analyzes its impact on scaling laws compared with standard dot‑product attention, and presents experimental results showing when the new mechanism offers advantages.

2-simplicial attentionMetaNeural architecture
0 likes · 12 min read
Can 2‑Simplicial Attention Outperform Standard Transformers? A Deep Dive
Tech Freedom Circle
Tech Freedom Circle
Jul 17, 2025 · Artificial Intelligence

DeepSeek V3 Architecture Deep Dive: MoE, MLA, DualPipe, FP8 Mixed Precision & Multi‑Token Prediction

This article provides a detailed technical analysis of DeepSeek‑V3, covering its MOE architecture, the novel Multi‑head Latent Attention (MLA) mechanism, the DualPipe pipeline‑parallel algorithm, mixed‑precision FP8 training, and the Multi‑Token Prediction (MTP) inference improvements that together boost performance and efficiency.

DeepSeekDistributed TrainingDualPipe
0 likes · 44 min read
DeepSeek V3 Architecture Deep Dive: MoE, MLA, DualPipe, FP8 Mixed Precision & Multi‑Token Prediction
AI Frontier Lectures
AI Frontier Lectures
Jul 10, 2025 · Artificial Intelligence

Can 2‑Simplicial Attention Redefine Transformer Scaling Laws?

A recent Meta paper introduces a rotation‑invariant 2‑simplicial attention mechanism, demonstrates its superior scaling‑law coefficients over standard dot‑product attention, and provides experimental evidence of improved token efficiency and model performance under constrained token budgets.

2-simplicialMetaTransformer
0 likes · 11 min read
Can 2‑Simplicial Attention Redefine Transformer Scaling Laws?
High Availability Architecture
High Availability Architecture
Jul 9, 2025 · Artificial Intelligence

How LLMs Evolved from GPT‑4 to Agentic AI: Trends, Techniques, and Future Directions

This article analyzes the rapid evolution of large language models from the GPT‑4 era through efficiency‑focused sparsity and attention innovations, to inference‑time reasoning and tool‑using agents, highlighting key architectures, benchmark breakthroughs, competitive strategies, and emerging research directions toward embodied AI.

Agentic AILLMTransformer
0 likes · 24 min read
How LLMs Evolved from GPT‑4 to Agentic AI: Trends, Techniques, and Future Directions
Alibaba Cloud Developer
Alibaba Cloud Developer
Jul 8, 2025 · Artificial Intelligence

From GPT‑4 to Thinking Models: How LLM Architecture Evolved After 2023

This article traces the evolution of large language models from the GPT‑4 era through 2024‑2025, highlighting the shift from pure scaling to efficiency‑focused architectures, the rise of reasoning‑centric "thinking" models, and the emergence of agentic capabilities that enable tools and real‑world interaction.

LLMTransformeragents
0 likes · 27 min read
From GPT‑4 to Thinking Models: How LLM Architecture Evolved After 2023
IT Services Circle
IT Services Circle
Jul 6, 2025 · Artificial Intelligence

Why Transformers Train Like Any Neural Network: Backpropagation Explained

This article demystifies how Transformers are trained by showing that all their linear layers have learnable weights and biases, and that the attention mechanism—including softmax and dot‑product operations—is fully differentiable and updated via standard back‑propagation.

BackpropagationDeep LearningPyTorch
0 likes · 7 min read
Why Transformers Train Like Any Neural Network: Backpropagation Explained
AI Algorithm Path
AI Algorithm Path
Jul 5, 2025 · Artificial Intelligence

Beginner’s Guide to Vision‑Language Models Day 7: How CLIP Achieves Joint Visual‑Language Understanding

This article explains CLIP’s dual‑encoder architecture—using a Vision Transformer for images and a Transformer for text—how both encoders map inputs into a shared embedding space, the role of cosine similarity, and the InfoNCE contrastive loss that drives joint visual‑language learning.

CLIPInfoNCEMulti-modal Embedding
0 likes · 8 min read
Beginner’s Guide to Vision‑Language Models Day 7: How CLIP Achieves Joint Visual‑Language Understanding
Amap Tech
Amap Tech
Jun 30, 2025 · Artificial Intelligence

SeqGrowGraph: Chain-of-Graph Expansion for Precise Lane Topology

SeqGrowGraph introduces a novel chain-of-graph expansion framework that incrementally builds lane topology graphs using a Transformer-based autoregressive model, achieving state‑of‑the‑art performance on large autonomous‑driving datasets such as nuScenes and Argoverse 2 by accurately modeling complex road structures.

Computer VisionSequence ModelingTransformer
0 likes · 10 min read
SeqGrowGraph: Chain-of-Graph Expansion for Precise Lane Topology
Cognitive Technology Team
Cognitive Technology Team
Jun 29, 2025 · Artificial Intelligence

Understanding Transformers: Core Mechanics Behind Modern AI Models

This article demystifies the Transformer architecture for beginners, explaining its relationship to large models, the self‑attention and multi‑head attention mechanisms, positional encoding, and the roles of Encoder and Decoder components, using clear analogies and visual diagrams to aid comprehension.

Deep LearningEncoder-DecoderPositional Encoding
0 likes · 20 min read
Understanding Transformers: Core Mechanics Behind Modern AI Models
ITFLY8 Architecture Home
ITFLY8 Architecture Home
Jun 24, 2025 · Artificial Intelligence

How Transformers and Mixture-of-Experts Power Large Language Models

This article explores the role of Transformers and Mixture‑of‑Experts in large models, outlines five fine‑tuning methods, compares traditional and agentic RAG, presents classic agent design patterns, text‑chunking strategies, levels of intelligent agent systems, and explains KV‑caching techniques.

Fine-tuningMixture of ExpertsRAG
0 likes · 2 min read
How Transformers and Mixture-of-Experts Power Large Language Models
Programmer Xu Shu
Programmer Xu Shu
Jun 23, 2025 · Artificial Intelligence

From Bag‑of‑Words to ChatGPT: How Large Language Models Evolved

Tracing the evolution of large language models—from early bag‑of‑words techniques, through word embeddings, RNNs, attention mechanisms, Transformers, BERT, and GPT—this article explains each breakthrough, its limitations, and how they culminated in ChatGPT’s conversational AI.

AI evolutionChatGPTTransformer
0 likes · 12 min read
From Bag‑of‑Words to ChatGPT: How Large Language Models Evolved
MaGe Linux Operations
MaGe Linux Operations
Jun 15, 2025 · Artificial Intelligence

Mastering Transformers: Key Extensions and Optimization Techniques Explained

This comprehensive guide walks you through the Transformer architecture—from its encoder‑decoder structure and self‑attention mechanism to multi‑head attention, positional embeddings, and practical PyTorch implementations—providing clear visualizations and code examples for deep learning practitioners.

Deep LearningPyTorchSelf-Attention
0 likes · 22 min read
Mastering Transformers: Key Extensions and Optimization Techniques Explained
Open Source Linux
Open Source Linux
Jun 12, 2025 · Artificial Intelligence

From Transformers to DeepSeek‑R1: The Evolution of Large Language Models (2017‑2025)

This article chronicles the rapid development of large language models from the 2017 Transformer breakthrough through the rise of BERT, GPT‑3, multimodal models, alignment techniques like RLHF, and finally the cost‑efficient DeepSeek‑R1 in 2025, highlighting key innovations, scaling trends, and real‑world impacts.

AI AlignmentDeep LearningModel Scaling
0 likes · 26 min read
From Transformers to DeepSeek‑R1: The Evolution of Large Language Models (2017‑2025)
AI Algorithm Path
AI Algorithm Path
Jun 8, 2025 · Artificial Intelligence

Autoregressive vs Diffusion Language Models: Principles, Trade‑offs, and Future Directions

The article compares autoregressive and diffusion language models, detailing their mathematical foundations, training and inference pipelines, performance trade‑offs such as speed, coherence and diversity, and explores hybrid approaches and emerging research directions for more efficient and controllable text generation.

AI researchText GenerationTransformer
0 likes · 17 min read
Autoregressive vs Diffusion Language Models: Principles, Trade‑offs, and Future Directions
ITFLY8 Architecture Home
ITFLY8 Architecture Home
Jun 5, 2025 · Artificial Intelligence

Why Large Models Are Redefining Software: The Four AI Tech Drivers

The article explains how rapid AI advances and the AIAgent architecture are reshaping software development, outlines four key technical drivers—embedding, Transformer scaling laws, scenario Moore's law, and LLM OS—and discusses the security, professionalism, and responsibility challenges enterprises face when deploying AI‑native applications.

AI ArchitectureEmbeddingEnterprise AI
0 likes · 6 min read
Why Large Models Are Redefining Software: The Four AI Tech Drivers
Data Thinking Notes
Data Thinking Notes
Jun 2, 2025 · Artificial Intelligence

Why Pre‑Training Powers Modern AI: From Theory to Real‑World Applications

Pre‑training enables AI models to first acquire a universal knowledge map from massive unlabelled text, then quickly adapt to specific tasks with minimal labelled data, offering superior generalization, reduced annotation costs, and versatile applications across chatbots, content creation, retrieval, coding assistance, and more.

AI applicationsTransformerlarge language models
0 likes · 14 min read
Why Pre‑Training Powers Modern AI: From Theory to Real‑World Applications
Architect
Architect
May 18, 2025 · Artificial Intelligence

How Much GPU Memory Can One Model Use? A Deep Dive into Transformer Memory Accounting

This article breaks down GPU memory consumption for large Transformer models, explains how to estimate each component—parameters, optimizer state, activations, gradients—and shows how parallelism, mixed precision, and recomputation strategies can dramatically reduce the footprint.

AI trainingGPU MemoryMemory Optimization
0 likes · 14 min read
How Much GPU Memory Can One Model Use? A Deep Dive into Transformer Memory Accounting
Meituan Technology Team
Meituan Technology Team
May 15, 2025 · Artificial Intelligence

How Meituan’s MTGR Framework Achieved 65× Faster Inference with Scaling Laws

Meituan’s recommendation team introduced the MTGR framework, aligning traditional DLRM features with a unified HSTU‑based Transformer to explore scaling laws, delivering a 65‑fold FLOPs boost, 12% lower inference cost, and record gains in online CTR and order volume across its food‑delivery platform.

Inference OptimizationLarge-Scale TrainingMTGR
0 likes · 26 min read
How Meituan’s MTGR Framework Achieved 65× Faster Inference with Scaling Laws
Data Thinking Notes
Data Thinking Notes
Apr 29, 2025 · Artificial Intelligence

From Transformers to DeepSeek‑R1: How LLMs Evolved to 2025

This article chronicles the evolution of large language models from the 2017 Transformer breakthrough through BERT, GPT series, multimodal models, and recent cost‑efficient innovations like DeepSeek‑R1, highlighting key architectures, training methods, alignment techniques, and their transformative impact on AI applications.

AI AlignmentTransformerlarge language models
0 likes · 29 min read
From Transformers to DeepSeek‑R1: How LLMs Evolved to 2025
AI Frontier Lectures
AI Frontier Lectures
Apr 27, 2025 · Artificial Intelligence

How Jeff Dean’s Vision Shaped Modern AI: From Neural Nets to Gemini

Jeff Dean’s 2024 ETH Zurich talk traces fifteen years of AI breakthroughs—from the rise of neural networks and back‑propagation, through large‑scale distributed training, TPUs, Transformers, sparse MoE models, and advanced prompting techniques—showing how scaling compute, data, and clever software have driven today’s powerful Gemini models.

AIChain-of-ThoughtDistillation
0 likes · 18 min read
How Jeff Dean’s Vision Shaped Modern AI: From Neural Nets to Gemini
Didi Tech
Didi Tech
Apr 24, 2025 · Artificial Intelligence

Algorithmic Foundations and Evolution of Natural Language Processing

The article surveys the Algorithmic Foundations of Engineering R&D series, tracing NLP’s evolution from rule‑based systems to today’s multimodal large‑model era, reviewing core machine‑learning and deep‑learning techniques, transformer breakthroughs, representation learning, optimization methods, and emerging research such as retrieval‑augmented generation and AI agents.

AINLPTransformer
0 likes · 43 min read
Algorithmic Foundations and Evolution of Natural Language Processing
Tencent Technical Engineering
Tencent Technical Engineering
Apr 16, 2025 · Artificial Intelligence

Understanding Transformer Architecture for Chinese‑English Translation: A Practical Guide

This practical guide walks through the full Transformer architecture for Chinese‑to‑English translation, detailing encoder‑decoder structure, tokenization and embeddings, batch handling with padding and masks, positional encodings, parallel teacher‑forcing, self‑ and multi‑head attention, and the complete forward and back‑propagation training steps.

Positional EncodingPyTorchSelf-Attention
0 likes · 26 min read
Understanding Transformer Architecture for Chinese‑English Translation: A Practical Guide
AI Frontier Lectures
AI Frontier Lectures
Apr 13, 2025 · Artificial Intelligence

How HINT’s Hierarchical Multi‑Head Attention Boosts Image Restoration Quality

The paper introduces HINT, a Transformer‑based image restoration model that employs Hierarchical Multi‑Head Attention (HMHA) and a Query‑Key Cache Updating (QKCU) module to eliminate attention redundancy, achieving superior PSNR/SSIM scores across low‑light enhancement, dehazing, desnowing, denoising, and deraining tasks while maintaining low model complexity.

Computer VisionHierarchical AttentionImage Restoration
0 likes · 10 min read
How HINT’s Hierarchical Multi‑Head Attention Boosts Image Restoration Quality
AI Algorithm Path
AI Algorithm Path
Apr 10, 2025 · Artificial Intelligence

Beginner-Friendly Guide to Understanding Large Language Models

This article walks readers through the fundamentals of large language models, covering what tokens are, how tokenization works, the conversion of tokens to numeric IDs, the transformer architecture—including positional encoding, self‑attention, feed‑forward networks and softmax—and explains how these components enable next‑token prediction.

EmbeddingLLMSelf-Attention
0 likes · 9 min read
Beginner-Friendly Guide to Understanding Large Language Models
AI Frontier Lectures
AI Frontier Lectures
Apr 1, 2025 · Artificial Intelligence

Can SpargeAttn Accelerate Any Model Without Training? A Deep Dive

This article reviews the SpargeAttn paper, describing how a training‑free sparse attention mechanism achieves 4‑7× inference speedup across language, video, and image models while preserving end‑to‑end accuracy, and outlines its challenges, algorithmic solutions, implementation details, and experimental results.

GPU OptimizationQuantized InferenceSpargeAttn
0 likes · 7 min read
Can SpargeAttn Accelerate Any Model Without Training? A Deep Dive
Architects' Tech Alliance
Architects' Tech Alliance
Mar 31, 2025 · Artificial Intelligence

A Comprehensive History of Large Language Models from the Transformer Era (2017) to DeepSeek‑R1 (2025)

This article reviews the evolution of large language models from the 2017 Transformer breakthrough through BERT, GPT series, alignment techniques, multimodal extensions, open‑weight releases, and the cost‑efficient DeepSeek‑R1 in 2025, highlighting key technical advances, scaling trends, and their societal impact.

AI AlignmentLLM evolutionMultimodal AI
0 likes · 26 min read
A Comprehensive History of Large Language Models from the Transformer Era (2017) to DeepSeek‑R1 (2025)
AntTech
AntTech
Mar 26, 2025 · Artificial Intelligence

BodyGen: A Bio‑Inspired Embodied Co‑Design Framework for Autonomous Robot Evolution

BodyGen, a new embodied co‑design framework presented at ICLR 2025, enables robots to autonomously evolve their morphology and control policies using reinforcement learning and transformer‑based networks, achieving up to 60 % performance gains with a lightweight 1.43 M‑parameter model, and its code is publicly released.

Embodied AITransformerco-design
0 likes · 10 min read
BodyGen: A Bio‑Inspired Embodied Co‑Design Framework for Autonomous Robot Evolution
AI Algorithm Path
AI Algorithm Path
Mar 19, 2025 · Artificial Intelligence

Understanding Multimodal Large Language Models: Part 1

This article explains the fundamentals of multimodal large language models, covering their definition, typical applications, two main architectural approaches—unified embedding decoder and cross‑modal attention—along with detailed component breakdowns, a PyTorch implementation of image‑patch projection, and training considerations, ending with a discussion of trade‑offs between the methods.

Cross-AttentionImage EncoderLinear Projection
0 likes · 14 min read
Understanding Multimodal Large Language Models: Part 1
IT Services Circle
IT Services Circle
Mar 19, 2025 · Artificial Intelligence

ByteDance’s AI Video Generation Model Goku, Streamer‑Sales Live‑Selling Model, and MimicTalk 3D Talking‑Head Project

ByteDance and partners open‑source three AI projects—Goku for high‑quality text‑to‑video generation, Streamer‑Sales for multimodal live‑selling LLMs, and MimicTalk for rapid 3D talking‑head creation—detailing their core features, underlying transformer‑based architectures, training pipelines, and public repositories.

AI video generationTransformerVirtual digital human
0 likes · 5 min read
ByteDance’s AI Video Generation Model Goku, Streamer‑Sales Live‑Selling Model, and MimicTalk 3D Talking‑Head Project
AIWalker
AIWalker
Mar 14, 2025 · Artificial Intelligence

Dynamic Tanh Lets He Kaiming and LeCun Drop Transformer Normalization in 9 Lines

Researchers He Kaiming, Yann LeCun and colleagues propose a 9‑line Dynamic Tanh (DyT) layer that replaces LayerNorm/RMSNorm in Transformers, showing comparable or superior accuracy across vision, language, speech and DNA tasks while also reducing inference latency on modern GPUs.

AI researchDeep LearningDynamic Tanh
0 likes · 18 min read
Dynamic Tanh Lets He Kaiming and LeCun Drop Transformer Normalization in 9 Lines
Network Intelligence Research Center (NIRC)
Network Intelligence Research Center (NIRC)
Mar 12, 2025 · Artificial Intelligence

How Sparse Autoencoders Uncover Monosemantic Features in Large Language Models

The article reviews the paper ‘Towards Monosemanticity: Decomposing Language Models With Dictionary Learning’, showing how Anthropic’s sparse autoencoders extract interpretable, monosemantic concepts from transformer layers, enable controlled generation, and reveal trade‑offs such as data‑intensive training and potential performance impacts.

Dictionary LearningFeature ControlLLM Interpretability
0 likes · 9 min read
How Sparse Autoencoders Uncover Monosemantic Features in Large Language Models
AIWalker
AIWalker
Mar 11, 2025 · Artificial Intelligence

MobileMamba: Lightweight Multi‑Receptive‑Field Backbone Beats Existing Mamba Models

MobileMamba introduces a three‑stage, lightweight backbone with a multi‑receptive‑field feature‑interaction module that combines wavelet‑enhanced Mamba, multi‑kernel depthwise convolutions, and redundant‑mapping reduction, delivering up to 83.6% ImageNet Top‑1 accuracy while running 21× faster than LocalVim and 3.3× faster than EfficientVMamba.

BenchmarkCNNMamba
0 likes · 10 min read
MobileMamba: Lightweight Multi‑Receptive‑Field Backbone Beats Existing Mamba Models
NewBeeNLP
NewBeeNLP
Mar 11, 2025 · Artificial Intelligence

How DeepSeek’s New Architecture Redefines LLM Efficiency and Performance

This article analyzes DeepSeek’s recent breakthroughs—including the Multi‑Head Latent Attention (MLA), Group Relative Policy Optimization (GRPO), and a refined Mixture‑of‑Experts design—along with its three‑stage training pipeline, RL‑only R1‑Zero variant, and benchmark comparisons against GPT‑4o‑Mini and Llama 3.1, highlighting both gains and remaining challenges.

DeepSeekLLMMixture of Experts
0 likes · 18 min read
How DeepSeek’s New Architecture Redefines LLM Efficiency and Performance
Cognitive Technology Team
Cognitive Technology Team
Mar 10, 2025 · Artificial Intelligence

Understanding Transformers: From NLP Challenges to Architecture and Core Mechanisms

This article explains the evolution of natural language processing, the limitations of rule‑based, statistical, and recurrent neural network models, and then introduces the Transformer architecture—covering word and position embeddings, self‑attention, multi‑head attention, Add & Norm, feed‑forward layers, and encoder‑decoder design—to help beginners grasp why Transformers solve key NLP problems.

AINLPSelf-Attention
0 likes · 15 min read
Understanding Transformers: From NLP Challenges to Architecture and Core Mechanisms
Alibaba Cloud Developer
Alibaba Cloud Developer
Mar 10, 2025 · Artificial Intelligence

Why Transformers Revolutionized NLP: From Problems to Solutions

This article explains the historical challenges of natural language processing, from rule‑based and statistical models to recurrent networks and their limitations, then introduces the Transformer architecture, its self‑attention mechanism, multi‑head attention, and supporting layers, illustrating how it overcomes previous issues and enables efficient parallel training.

NLPSelf-AttentionTransformer
0 likes · 16 min read
Why Transformers Revolutionized NLP: From Problems to Solutions
AI Frontier Lectures
AI Frontier Lectures
Mar 7, 2025 · Artificial Intelligence

From Transformers to DeepSeek‑R1: Tracing the Evolution of Large Language Models (2017‑2025)

This article chronicles the rapid development of large language models from the 2017 Transformer breakthrough through successive milestones such as BERT, GPT‑3, ChatGPT, multimodal GPT‑4 variants, open‑weight releases, and the cost‑efficient DeepSeek‑R1, highlighting key architectural innovations, training paradigms, alignment techniques, and industry impact.

Cost‑Efficient InferenceModel AlignmentReasoning Models
0 likes · 27 min read
From Transformers to DeepSeek‑R1: Tracing the Evolution of Large Language Models (2017‑2025)
Cognitive Technology Team
Cognitive Technology Team
Mar 7, 2025 · Artificial Intelligence

From Word Embeddings to Large Language Models: A Comprehensive Overview of AI Model Evolution

This article traces the development of AI models—from early word embeddings like Word2Vec and ELMo, through transformer‑based encoders such as BERT and decoder‑only models like GPT‑1/2/3, to recent multimodal systems and scaling laws—explaining their architectures, training methods, and impact on modern AI applications.

AIEmbeddingTransformer
0 likes · 22 min read
From Word Embeddings to Large Language Models: A Comprehensive Overview of AI Model Evolution
Baobao Algorithm Notes
Baobao Algorithm Notes
Mar 6, 2025 · Artificial Intelligence

Alibaba Unveils QwQ-32B: A 32‑Billion‑Parameter Inference Model with Agent Capabilities

Alibaba has open‑sourced its new QwQ‑32B inference model, a 32.5‑billion‑parameter transformer that rivals top models like DeepSeek‑R1 and o1‑mini, features integrated agent abilities for tool use and critical thinking, and offers a low inference barrier with extensive technical specifications and RL‑based training details.

AlibabaTransformeragent capabilities
0 likes · 4 min read
Alibaba Unveils QwQ-32B: A 32‑Billion‑Parameter Inference Model with Agent Capabilities
AIWalker
AIWalker
Mar 6, 2025 · Artificial Intelligence

How SCMHSA Improves Transformer Next‑Frame Prediction by Reducing Semantic Dilution

The paper introduces a Semantic‑Concentrated Multi‑Head Self‑Attention (SCMHSA) module and a new embedding‑space loss to address semantic dilution and loss‑target mismatch in Transformer‑based video next‑frame prediction, demonstrating significant PSNR and MSE gains across four benchmark datasets.

Computer VisionEmbedding LossSCMHSA
0 likes · 23 min read
How SCMHSA Improves Transformer Next‑Frame Prediction by Reducing Semantic Dilution
JD Cloud Developers
JD Cloud Developers
Mar 5, 2025 · Artificial Intelligence

How GLM’s Autoregressive Blank‑Filling Beats BERT, T5, and GPT

GLM introduces a universal language model that combines autoregressive blank‑filling with 2D positional encoding and span‑shuffle training, achieving superior performance over BERT, T5, and GPT across NLU, conditional and unconditional generation tasks, as demonstrated on SuperGLUE and other benchmarks.

Language ModelNLUTransformer
0 likes · 29 min read
How GLM’s Autoregressive Blank‑Filling Beats BERT, T5, and GPT
Architect
Architect
Mar 2, 2025 · Artificial Intelligence

Demystifying Mixture of Experts: How MoE Boosts LLMs and Vision Models

This article explains the Mixture of Experts (MoE) architecture, detailing experts, routers, dense vs. sparse layers, load‑balancing strategies such as KeepTopK, auxiliary loss, capacity constraints, the Switch Transformer simplification, and how MoE is applied to both language and vision models, illustrated with concrete examples and parameter counts.

Mixture of ExpertsMoESparse Models
0 likes · 17 min read
Demystifying Mixture of Experts: How MoE Boosts LLMs and Vision Models
IT Architects Alliance
IT Architects Alliance
Feb 26, 2025 · Artificial Intelligence

DeepSeek Large Model: Core Architecture, Key Technologies, and Training Strategies

The article provides an in‑depth overview of DeepSeek’s large language model, detailing its mixture‑of‑experts and Transformer foundations, novel attention mechanisms, load‑balancing, multi‑token prediction, FP8 mixed‑precision training, and various training regimes such as knowledge distillation and reinforcement learning.

DeepSeekFP8MLA
0 likes · 18 min read
DeepSeek Large Model: Core Architecture, Key Technologies, and Training Strategies
21CTO
21CTO
Feb 24, 2025 · Artificial Intelligence

From Transformers to DeepSeek-R1: Evolution of Large Language Models

Since the 2017 introduction of the Transformer architecture, this article chronicles the rapid development of large language models—including BERT, GPT series, multimodal systems, and the cost‑effective DeepSeek‑R1—highlighting key innovations, scaling trends, alignment techniques, and their transformative impact across AI research and industry.

AI evolutionDeepSeekLLM History
0 likes · 23 min read
From Transformers to DeepSeek-R1: Evolution of Large Language Models
AIWalker
AIWalker
Feb 20, 2025 · Artificial Intelligence

Transfusion: A Single Model for Unified Image Generation and Understanding

Transfusion is a 7B‑parameter transformer that jointly trains language modeling and diffusion losses on mixed text‑image data, enabling seamless text generation, image generation, and image understanding within one model and outperforming prior multimodal approaches such as Chameleon across multiple benchmarks.

AI researchLanguage ModelingTransformer
0 likes · 20 min read
Transfusion: A Single Model for Unified Image Generation and Understanding
AIWalker
AIWalker
Feb 19, 2025 · Artificial Intelligence

DeepSeek’s NSA Attention Cuts Inference Time 11× – CEO Liang Co‑author

DeepSeek introduces the NSA sparse attention mechanism, combining dynamic hierarchical sparsity, coarse token compression and fine token selection to achieve up to 11.6× faster inference, lower pre‑training cost, and superior benchmark performance across general, long‑context, and chain‑of‑thought tasks.

BenchmarkDeepSeekLLM optimization
0 likes · 9 min read
DeepSeek’s NSA Attention Cuts Inference Time 11× – CEO Liang Co‑author
IT Architects Alliance
IT Architects Alliance
Feb 15, 2025 · Artificial Intelligence

DeepSeek: Architecture, Core Technologies, Training Strategies, and Comparative Analysis

The article provides an in‑depth overview of DeepSeek's transformer‑based foundation, Mixture‑of‑Experts architecture, novel attention mechanisms, multi‑token prediction, FP8 mixed‑precision training, knowledge distillation, reinforcement‑learning approaches, and compares its performance and cost advantages against leading models such as GPT and Gemini.

AI model architectureDeepSeekFP8 training
0 likes · 29 min read
DeepSeek: Architecture, Core Technologies, Training Strategies, and Comparative Analysis
Architect
Architect
Feb 13, 2025 · Artificial Intelligence

How to Build a Mini ChatGPT on a Single GPU with MiniMind

This article provides a comprehensive, step‑by‑step guide to training and fine‑tuning a miniature large‑language model called MiniMind, covering lightweight model design, open‑source training pipelines, required datasets, tokenizer options, and deployment via a web UI, all using PyTorch on modest hardware.

AILLMMiniMind
0 likes · 11 min read
How to Build a Mini ChatGPT on a Single GPU with MiniMind
Architects' Tech Alliance
Architects' Tech Alliance
Feb 12, 2025 · Industry Insights

How DeepSeek Is Redefining China’s AI Landscape in 2025

The DeepSeek research framework 2025 reveals that its V3 and R1 models, built on Transformer with MLA and DeepSeek MoE technologies, are accelerating training efficiency, reshaping domestic AI valuation, and positioning open‑source AI as a disruptive force in the global market.

AI modelsChina AIDeepSeek
0 likes · 5 min read
How DeepSeek Is Redefining China’s AI Landscape in 2025
AI Algorithm Path
AI Algorithm Path
Feb 12, 2025 · Artificial Intelligence

Essential DeepSeek‑R1 Reading List: Papers Behind the 2025 Hottest LLM

This article compiles a curated reading list of foundational and recent research papers—from the original Transformer to chain‑of‑thought, mixture‑of‑experts, and reinforcement‑learning studies—that together explain the breakthroughs behind DeepSeek‑R1 and guide readers through the technical evolution of modern large language models.

DeepSeekMixture of ExpertsResearch Papers
0 likes · 15 min read
Essential DeepSeek‑R1 Reading List: Papers Behind the 2025 Hottest LLM
vivo Internet Technology
vivo Internet Technology
Feb 12, 2025 · Artificial Intelligence

Bidirectional Optimization of NLLB-200 and ChatGPT for Low-Resource Language Translation

The paper proposes a bidirectional optimization framework that fine‑tunes the low‑resource NLLB‑200 translation model with LoRA using data generated by ChatGPT, while also translating low‑resource prompts with NLLB before feeding them to LLMs, thereby improving multilingual translation quality yet requiring careful validation of noisy synthetic data.

Fine-tuningLLMLoRA
0 likes · 28 min read
Bidirectional Optimization of NLLB-200 and ChatGPT for Low-Resource Language Translation
AI Algorithm Path
AI Algorithm Path
Feb 9, 2025 · Artificial Intelligence

Understanding Multi-Token Prediction in DeepSeek‑R1 Architecture

This article dissects the Multi‑Token Prediction (MTP) technique used in DeepSeek‑R1, contrasting it with traditional next‑token prediction, detailing Meta’s MTP design, DeepSeek’s adapted architecture, loss weighting, and why MTP is applied only during training to boost efficiency and model capability.

DeepSeekMTPModel architecture
0 likes · 9 min read
Understanding Multi-Token Prediction in DeepSeek‑R1 Architecture
IT Architects Alliance
IT Architects Alliance
Feb 8, 2025 · Artificial Intelligence

Inside DeepSeek: How Its Innovative Architecture Redefines AI Performance

This article examines DeepSeek's advanced Transformer‑based architecture, dynamic routing, MoE system, multi‑stage training, efficient inference, multimodal capabilities, real‑world applications, technical challenges, and future prospects, providing a comprehensive technical analysis of the model's strengths and limitations.

AI ArchitectureDeepSeekModel Optimization
0 likes · 15 min read
Inside DeepSeek: How Its Innovative Architecture Redefines AI Performance
JavaEdge
JavaEdge
Feb 6, 2025 · Artificial Intelligence

Why Training Transformers Faces an Impossible Triangle of Speed, Performance, and Cost

The article explains the “impossible triangle” in Transformer training, showing how speed, model performance, and computational cost cannot all be optimized simultaneously, and uses analogies and real‑world examples like GPT‑4 to illustrate the necessary trade‑offs.

Deep LearningModel TrainingPerformance Tradeoff
0 likes · 7 min read
Why Training Transformers Faces an Impossible Triangle of Speed, Performance, and Cost
DaTaobao Tech
DaTaobao Tech
Jan 22, 2025 · Artificial Intelligence

AI Trends 2025: Paths to AGI, Scaling Law Evolution, and Industry Impact

The article surveys the AI revolution driven by foundation models and an evolving Scaling Law, outlining four AGI pathways—large models, intelligent robots, brain‑computer interfaces, and digital life—while highlighting transformer‑based convergence, generative‑first‑principle breakthroughs like DeepSeek‑V3, and transformative industry impacts ranging from consumer robots to Medical 2.0, personalized education, and digital‑simulation platforms such as NVIDIA’s Omniverse.

AGIAIAI industry
0 likes · 23 min read
AI Trends 2025: Paths to AGI, Scaling Law Evolution, and Industry Impact
Baobao Algorithm Notes
Baobao Algorithm Notes
Jan 15, 2025 · Artificial Intelligence

How Multi-Token Prediction Boosts LLM Training and Inference Efficiency

This article reviews the evolution of Multi‑Token Prediction (MTP) techniques—from early blockwise parallel decoding to Meta's and DeepSeek's implementations—explaining their architectures, training and inference workflows, and the speed‑up gains they offer for large language models.

DeepSeekInference AccelerationLLM
0 likes · 20 min read
How Multi-Token Prediction Boosts LLM Training and Inference Efficiency
AIWalker
AIWalker
Jan 10, 2025 · Artificial Intelligence

How a Simplified Transformer Enables Lightweight CLIP Training on a Single RTX3090

This paper presents SiCLIP, a framework that simplifies the Transformer architecture, combines weight‑sharing, multi‑stage knowledge distillation, and a novel pair‑matching loss with synthetic captions to train a competitive CLIP model using only one RTX3090 GPU and 1 TB of storage, achieving state‑of‑the‑art data‑size‑parameter‑accuracy trade‑offs.

CLIPLightweight TrainingSynthetic Captions
0 likes · 19 min read
How a Simplified Transformer Enables Lightweight CLIP Training on a Single RTX3090
DevOps
DevOps
Dec 19, 2024 · Artificial Intelligence

Yann LeCun Discusses AI, Self‑Supervised Learning, and the Future of AGI

Yann LeCun, in a half‑hour interview with Indian entrepreneur Nikhil Kamath, explains the fundamentals of artificial intelligence, critiques current transformer models, describes self‑supervised learning, outlines his joint‑embedding predictive architecture, and shares his vision for AGI, open‑source ecosystems, and the role of PhDs for AI entrepreneurs.

AGITransformerartificial intelligence
0 likes · 16 min read
Yann LeCun Discusses AI, Self‑Supervised Learning, and the Future of AGI
AntTech
AntTech
Dec 6, 2024 · Artificial Intelligence

Nimbus: Secure and Efficient Two‑Party Inference for Transformers

The paper introduces Nimbus, a two‑party privacy‑preserving inference framework for Transformer models that leverages a client‑side outer‑product linear‑layer protocol and distribution‑aware polynomial approximations for non‑linear layers, achieving up to five‑fold speedups with negligible accuracy loss.

Homomorphic EncryptionPerformance OptimizationTransformer
0 likes · 15 min read
Nimbus: Secure and Efficient Two‑Party Inference for Transformers
Cognitive Technology Team
Cognitive Technology Team
Nov 20, 2024 · Artificial Intelligence

Fundamentals and Implementation of Neural Networks and Transformers with PyTorch Examples

This article provides a comprehensive overview of neural network fundamentals, loss functions, activation functions, embedding techniques, attention mechanisms, multi‑head attention, residual networks, and the full Transformer encoder‑decoder architecture, illustrated with detailed PyTorch code and a practical MiniRBT fine‑tuning case for Chinese text classification.

AIPyTorchTransformer
0 likes · 49 min read
Fundamentals and Implementation of Neural Networks and Transformers with PyTorch Examples
NewBeeNLP
NewBeeNLP
Nov 18, 2024 · Artificial Intelligence

How to Optimize Multi-Head Attention: From MQA to FlashAttention and Beyond

This article examines various techniques for compressing and accelerating the KV cache in transformer models—including MQA, GQA, MLA, sliding‑window and linear attention, flash attention, page and ring attention, as well as mixed‑precision training and ZeRO parallelism—providing code snippets, implementation details, and practical trade‑offs.

FlashAttentionKV cacheModel Parallelism
0 likes · 17 min read
How to Optimize Multi-Head Attention: From MQA to FlashAttention and Beyond
Infra Learning Club
Infra Learning Club
Oct 30, 2024 · Artificial Intelligence

How GPT-3 Evolved: From Transformer Roots to Massive Language Models

The article traces the development of GPT series—from the 2017 Transformer breakthrough, through GPT‑1, GPT‑2, and GPT‑3’s 175 billion parameters, to later models like Codex and ChatGPT—highlighting key papers, architectural choices, and the surprising role of OpenAI’s decoder‑only approach.

GPT-3GoogleLanguage Model
0 likes · 4 min read
How GPT-3 Evolved: From Transformer Roots to Massive Language Models
Tencent Advertising Technology
Tencent Advertising Technology
Oct 17, 2024 · Artificial Intelligence

Long Sequence Modeling for Advertising Recommendation: TIN, Disentangled Side‑Info TIN, Stacked TIN, and Target‑aware SASRec

This article presents a comprehensive solution for heterogeneous long‑behavior sequence modeling in advertising recommendation, introducing the TIN backbone, Disentangled Side‑Info TIN, Stacked TIN, and Target‑aware SASRec, along with platform‑level optimizations that enable million‑scale sequences while delivering significant online performance gains.

AdvertisingDeep LearningPerformance Optimization
0 likes · 15 min read
Long Sequence Modeling for Advertising Recommendation: TIN, Disentangled Side‑Info TIN, Stacked TIN, and Target‑aware SASRec
DataFunTalk
DataFunTalk
Oct 1, 2024 · Artificial Intelligence

From Early AI to Superintelligence: Challenges and Prospects

The article reviews the evolution of artificial intelligence from early statistical models through deep learning and Transformer architectures, examines current breakthroughs like multimodal models, and discusses the technical, computational, and safety challenges that must be overcome before achieving artificial superintelligence (ASI).

AISuperintelligenceTransformer
0 likes · 8 min read
From Early AI to Superintelligence: Challenges and Prospects
Architect's Alchemy Furnace
Architect's Alchemy Furnace
Sep 16, 2024 · Artificial Intelligence

Why Transformers Revolutionize AI: From Basics to Advanced Applications

This article explains what AI Transformers are, why they matter, their key components and mechanisms, various applications ranging from language processing to bioinformatics, and how they differ from traditional neural networks, providing a comprehensive overview of Transformer architecture and its impact on modern AI research.

AIDeep LearningSelf-Attention
0 likes · 20 min read
Why Transformers Revolutionize AI: From Basics to Advanced Applications
Sohu Tech Products
Sohu Tech Products
Sep 11, 2024 · Artificial Intelligence

How RoPE and FlashAttention Empower GLM-4-Plus for Long-Text Mastery

This article explains the core mechanisms of Transformer models, details the Rotational Position Embedding (RoPE) and FlashAttention techniques for handling long sequences, introduces the GLM-4-Plus series, and presents an empirical evaluation on the THUCNews dataset showing its superior long-text performance.

FlashAttentionGLM-4-PlusLong Text
0 likes · 13 min read
How RoPE and FlashAttention Empower GLM-4-Plus for Long-Text Mastery
Baobao Algorithm Notes
Baobao Algorithm Notes
Sep 9, 2024 · Artificial Intelligence

How MoSLoRA Reinvents Low‑Rank Adaptation with Mixer Matrices

This article analyzes the Mixture‑of‑Subspaces in Low‑Rank Adaptation (MoSLoRA) paper, explaining its motivation, design choices that replace LoRA's gate with a mixer matrix, connections to multi‑head attention, experimental findings on LLaMA‑3 fine‑tuning, and theoretical proofs of its re‑parameterization properties.

AILoRAMixture of Experts
0 likes · 12 min read
How MoSLoRA Reinvents Low‑Rank Adaptation with Mixer Matrices
Ops Development & AI Practice
Ops Development & AI Practice
Aug 20, 2024 · Artificial Intelligence

How ERobot Redefines No-Code AI Automation with Natural Language

The article examines Hugging Face's ERobot, an AI model that leverages Transformer-based pre‑trained models to execute a wide range of automation tasks through natural‑language commands, discusses its technical foundations, real‑world applications, future prospects, and the challenges it must overcome.

Hugging FaceNo-codeTransformer
0 likes · 8 min read
How ERobot Redefines No-Code AI Automation with Natural Language
21CTO
21CTO
Aug 11, 2024 · Artificial Intelligence

Demystifying LLMs: How Tokens, Training, and Transformers Power Generative AI

This article explains the fundamentals of large language models, covering tokenization, probability prediction, Markov chain basics, training data limitations, context windows, and the transition to neural network architectures like Transformers, while providing Python examples and insights into model scaling and the illusion of intelligence.

AILLMNeural Networks
0 likes · 18 min read
Demystifying LLMs: How Tokens, Training, and Transformers Power Generative AI
Architect
Architect
Aug 11, 2024 · Artificial Intelligence

Understanding Large Language Models: Tokens, Tokenization, and the Evolution from Markov Chains to Transformers

This article explains how generative AI models work by demystifying tokens, tokenization with tools like tiktoken, simple Markov‑chain training, the limitations of small context windows, and how modern LLMs use neural networks, transformers and attention mechanisms to predict the next token.

LLMMarkov chainTransformer
0 likes · 20 min read
Understanding Large Language Models: Tokens, Tokenization, and the Evolution from Markov Chains to Transformers
DaTaobao Tech
DaTaobao Tech
Aug 7, 2024 · Artificial Intelligence

Overview of Large Model Development, AIGC Practices, and Prompt Engineering

The article surveys the rapid emergence of large AI models and AIGC, explains core concepts like AI, AGI, and LLMs, details prompt‑engineering techniques such as chain‑of‑thought, outlines a seven‑layer AIGC stack, discusses technical and ethical challenges, and highlights future multimodal and industry‑specific applications.

AIAIGCLLM
0 likes · 25 min read
Overview of Large Model Development, AIGC Practices, and Prompt Engineering