Tagged articles

383 articles

Page 2 of 4

Sep 30, 2025 · Artificial Intelligence

Dynamic Multimodal Video Generation: Prioritizing Stability and High Quality

The article surveys the evolution of video generation models—from early GANs and DCGAN to diffusion‑based approaches like Stable Diffusion and DiT—highlighting how stability, high quality, massive compute, and multimodal data pipelines are shaping the current and future paths of dynamic multimodal video generation.

Latent DiffusionMultimodal AIStable Diffusion

0 likes · 7 min read

Dynamic Multimodal Video Generation: Prioritizing Stability and High Quality

HyperAI Super Neural

Sep 30, 2025 · Artificial Intelligence

OnePiece: Applying LLM‑Style Reasoning to Item‑ID Sequences for Generative Recommendation

The article presents the OnePiece framework, which injects LLM‑style context engineering and latent reasoning into item‑ID based search‑and‑recommendation models, details the design choices, training tricks, attention analysis, and reports online gains of around 1% GMV and ad revenue, offering a thorough technical dissection of generative recommendation in industrial settings.

Context EngineeringGenerative RecommendationLLM Reasoning

0 likes · 31 min read

OnePiece: Applying LLM‑Style Reasoning to Item‑ID Sequences for Generative Recommendation

Volcano Engine Developer Services

Sep 28, 2025 · Artificial Intelligence

Demystifying AI Jargon: A Beginner’s Guide to Large Language Models

This guide breaks down the complex terminology of large language models—explaining tokens, transformers, self‑attention, RAG, scaling laws, dense vs. sparse architectures, and training stages—using clear analogies and step‑by‑step explanations so readers can confidently understand and work with modern AI systems.

AI fundamentalsModel TrainingRAG

0 likes · 35 min read

Demystifying AI Jargon: A Beginner’s Guide to Large Language Models

Wu Shixiong's Large Model Academy

Sep 26, 2025 · Artificial Intelligence

Crack Large-Model Interviews: Master Positional Encoding, Residuals, LayerNorm & FFN

Preparing for large-model interview? This guide reveals why interviewers probe seemingly minor components—positional encoding, residual connections, layer normalization, and feed-forward networks—explains each technique's purpose, variants, and how to answer confidently, plus practical tips and a learning roadmap to boost your chances.

FFNInterview TipsLayerNorm

0 likes · 8 min read

Crack Large-Model Interviews: Master Positional Encoding, Residuals, LayerNorm & FFN

Wu Shixiong's Large Model Academy

Sep 25, 2025 · Artificial Intelligence

Master Self-Attention & Multi-Head Attention for Large Model Interviews

This guide breaks down the core logic, computation steps, formulas, and common interview questions about Self‑Attention and Multi‑Head Attention in Transformers, offering concrete explanations, dimensional examples, and practical answering techniques to help candidates ace large‑model algorithm interviews.

Deep LearningInterview TipsSelf-Attention

0 likes · 8 min read

Master Self-Attention & Multi-Head Attention for Large Model Interviews

Data Party THU

Sep 21, 2025 · Artificial Intelligence

Building a Mini‑DeepSeek‑V3: Transformer Block and MTP Implementation on Limited Compute

This article walks through the design and implementation of a Mini‑DeepSeek‑V3 language model, detailing how to assemble the core Transformer block, integrate Multi‑Token Prediction (MTP) modules, construct the overall architecture, and compute the combined loss—all using modest GPU resources and a single‑card or DDP training setup.

AIDeepSeekMTP

0 likes · 12 min read

Building a Mini‑DeepSeek‑V3: Transformer Block and MTP Implementation on Limited Compute

Bighead's Algorithm Notes

Sep 16, 2025 · Artificial Intelligence

Paper Review: HGTS‑Former – A Hierarchical Hypergraph Transformer for Multivariate Time‑Series Analysis

The HGTS‑Former model introduces a hierarchical hypergraph backbone combined with a Transformer to capture high‑order and dynamic dependencies in multivariate time‑series data, and experimental results on eight datasets show it consistently outperforms state‑of‑the‑art methods in both long‑term forecasting and interpolation tasks.

HGTS-FormerHypergraphTransformer

0 likes · 11 min read

Paper Review: HGTS‑Former – A Hierarchical Hypergraph Transformer for Multivariate Time‑Series Analysis

Architect

Sep 16, 2025 · Artificial Intelligence

Why Transformers Outperform RNNs: A Beginner’s Guide to Attention and Architecture

This article introduces the Transformer architecture, explaining its attention mechanism, encoder‑decoder design, training and inference processes, and why it surpasses RNN‑based models, while also covering common applications and variations in natural language processing.

Deep LearningModel architectureNLP

0 likes · 13 min read

Why Transformers Outperform RNNs: A Beginner’s Guide to Attention and Architecture

Sohu Smart Platform Tech Team

Sep 12, 2025 · Artificial Intelligence

How AI is Revolutionizing Video Creation: From Text‑to‑Video to Real‑Time Editing

This article systematically explores the technical evolution, core principles, and emerging innovations of AI‑generated video, covering generation methods, GAN and diffusion models, transformer‑based DiT architectures, efficiency‑boosting NCR, audio‑visual V2A integration, and real‑world applications across media, education, and commerce.

AI video generationGANNCR

0 likes · 25 min read

How AI is Revolutionizing Video Creation: From Text‑to‑Video to Real‑Time Editing

Architects Research Society

Sep 4, 2025 · Artificial Intelligence

Choosing the Right Generative AI Model: Transformers, Diffusion, GANs & RNNs Explained

This article outlines the four dominant generative AI architectures—Transformers, diffusion models, GANs, and RNNs—explaining their core mechanisms, key capabilities, and typical application domains such as chatbots, image creation, deep‑fake media, and time‑series analysis, helping readers choose the right model for their needs.

AI applicationsGANRNN

0 likes · 3 min read

Choosing the Right Generative AI Model: Transformers, Diffusion, GANs & RNNs Explained

Data Party THU

Sep 3, 2025 · Artificial Intelligence

Unlocking Large Model Secrets: Transformers, MoE, Fine‑Tuning, RAG & KV Caching

This article provides a comprehensive technical overview of today’s large‑model ecosystem, covering the Transformer architecture, Mixture‑of‑Experts extensions, five fine‑tuning methods, the evolution from traditional RAG to agentic RAG, classic agent design patterns, diverse text‑chunking strategies, and the KV‑cache optimization that accelerates inference.

Agentic AIFine‑tuningKV cache

0 likes · 13 min read

Unlocking Large Model Secrets: Transformers, MoE, Fine‑Tuning, RAG & KV Caching

Data Party THU

Sep 2, 2025 · Artificial Intelligence

Inside Large Action Models (LAMs): Architecture, Code, and Enterprise Automation

This article provides a comprehensive technical analysis of Large Action Models (LAMs), detailing their neuro‑symbolic architecture, core components such as LAMProcessor, NeuroSymbolicLayer, ActionExecutor, and learning modules, and demonstrates how they enable intelligent, end‑to‑end automation of enterprise tasks.

AI automationEnterprise AINeuro-symbolic AI

0 likes · 30 min read

Inside Large Action Models (LAMs): Architecture, Code, and Enterprise Automation

Bighead's Algorithm Notes

Aug 26, 2025 · Artificial Intelligence

SSPT: Custom Pre‑training Tasks for Stock Data Boost Stock Selection Performance

This article reviews the SSPT paper, which introduces three stock‑specific pre‑training tasks—stock code classification, sector classification, and moving‑average prediction—built on a two‑layer Transformer, and demonstrates through extensive experiments across five market datasets that these tasks consistently improve cumulative return and Sharpe ratio over baselines.

Financial AITime SeriesTransformer

0 likes · 11 min read

SSPT: Custom Pre‑training Tasks for Stock Data Boost Stock Selection Performance

AntTech

Aug 21, 2025 · Artificial Intelligence

How the Mixture-of-Queries Transformer Tackles Camouflaged Instance Segmentation

The IJCAI 2025 paper showcase introduces the Mixture‑of‑Queries Transformer, a novel model that combines frequency‑domain feature enhancement with collaborative query decoding to achieve state‑of‑the‑art camouflaged instance segmentation across multiple datasets.

Computer VisionIJCAI 2025Transformer

0 likes · 4 min read

How the Mixture-of-Queries Transformer Tackles Camouflaged Instance Segmentation

Wu Shixiong's Large Model Academy

Aug 20, 2025 · Artificial Intelligence

Mastering Large‑Model Interview Questions: MHA, KV‑Cache, Scaled Dot‑Product, and Speculative Decoding

This guide walks through common large‑model interview challenges, including a hands‑on implementation of multi‑head attention with KV‑cache, the mathematical reason for scaling by sqrt(dₖ), a concise speculative decoding algorithm, and systematic debugging steps for NaN loss during training.

KV cacheLarge Model InterviewMulti‑Head Attention

0 likes · 14 min read

Mastering Large‑Model Interview Questions: MHA, KV‑Cache, Scaled Dot‑Product, and Speculative Decoding

Tencent Cloud Developer

Aug 19, 2025 · Artificial Intelligence

Demystifying LLMs: From Transformers to Agents, Prompts, and Function Calling

This article explains the fundamentals of large language models, covering transformer self‑attention, prompt engineering, API usage with temperature and tool parameters, function calling, agent architectures, the Model Context Protocol (MCP), Agent‑to‑Agent (A2A) communication, and future AI programming roles.

A2AAI agentsFunction Calling

0 likes · 11 min read

Demystifying LLMs: From Transformers to Agents, Prompts, and Function Calling

Qborfy AI

Aug 12, 2025 · Artificial Intelligence

What Powers Large Language Models? A Deep Dive into LLM Architecture and Scaling

This article explains how massive Transformer‑based large language models compress text data into mathematical representations, why scale, self‑attention, and training paradigms enable emergent general intelligence, and walks through tokenization, embedding, multi‑layer attention, architecture choices, energy costs, and hallucination mitigation.

AIEmbeddingLLM

0 likes · 6 min read

What Powers Large Language Models? A Deep Dive into LLM Architecture and Scaling

Qborfy AI

Aug 8, 2025 · Artificial Intelligence

Why Transformers Revolutionized AI: A Deep Dive into Self‑Attention

This article explains how the Transformer model replaces sequential RNN processing with parallel self‑attention, detailing its core components, positional encoding, encoder‑decoder workflow, industry impact, and surprising facts such as training speed gains and energy efficiency.

AIDeep LearningModel architecture

0 likes · 5 min read

Why Transformers Revolutionized AI: A Deep Dive into Self‑Attention

Alibaba Cloud Developer

Aug 6, 2025 · Artificial Intelligence

How Transformers Revolutionize Sequence Modeling: From RNN Limits to Self‑Attention Mastery

This article explains why Transformer models surpass traditional RNN‑based seq2seq architectures by introducing self‑attention, multi‑head attention, and positional encoding, detailing the inner workings of encoders, decoders, and attention mechanisms, and comparing their advantages and limitations across NLP and vision tasks.

GRULSTMRNN

0 likes · 30 min read

How Transformers Revolutionize Sequence Modeling: From RNN Limits to Self‑Attention Mastery

Data Party THU

Aug 5, 2025 · Artificial Intelligence

Why State Space Models May Outperform Transformers: A Deep Dive

The article provides a comprehensive technical analysis of state space models (SSM) versus Transformers, covering their core mechanisms, three essential design factors, training efficiency, scaling behavior, tokenization debates, and experimental evidence that highlights the trade‑offs and potential advantages of SSMs in modern AI systems.

MambaState Space ModelTransformer

0 likes · 21 min read

Why State Space Models May Outperform Transformers: A Deep Dive

Baobao Algorithm Notes

Aug 4, 2025 · Artificial Intelligence

Why GPT‑OSS Chooses a 64‑Dimensional Attention Head and 2880 Hidden Size

This article analyzes the surprising design choices of the rumored GPT‑OSS 120B model, explaining the rationale behind a 64‑dimensional attention head, the equal hidden and intermediate sizes, and other quirky parameters such as MLP bias and KV‑sink SWA, backed by theoretical formulas and empirical benchmarks.

Attention HeadGPT-OSSMLP Ratio

0 likes · 13 min read

Why GPT‑OSS Chooses a 64‑Dimensional Attention Head and 2880 Hidden Size

Data Thinking Notes

Jul 30, 2025 · Artificial Intelligence

Tracing the Evolution of Large Language Models: Key Papers and Breakthroughs

This article reviews the most influential papers in large language model research since 2017, covering foundational works such as the Transformer, GPT‑3, BERT, scaling laws, and recent innovations like FlashAttention, Mamba, and QLoRA, highlighting their core contributions and impact on AI development.

AI researchModel OptimizationTransformer

0 likes · 28 min read

Tracing the Evolution of Large Language Models: Key Papers and Breakthroughs

Data Party THU

Jul 29, 2025 · Artificial Intelligence

Can 2‑Simplicial Attention Outperform Standard Transformers? A Deep Dive

This article reviews Meta's rotation‑invariant 2‑simplicial attention, explains its trilinear formulation and windowed implementation, analyzes its impact on scaling laws compared with standard dot‑product attention, and presents experimental results showing when the new mechanism offers advantages.

2-simplicial attentionMetaNeural architecture

0 likes · 12 min read

Can 2‑Simplicial Attention Outperform Standard Transformers? A Deep Dive

AsiaInfo Technology: New Tech Exploration

Jul 25, 2025 · Artificial Intelligence

How to Master Time Series Forecasting for Cloud CPU Anomaly Detection

This article systematically explores the principles and mathematics behind ARIMA, XGBoost, LSTM, and Transformer models, compares their strengths and weaknesses, and demonstrates a complete end‑to‑end workflow for detecting CPU resource anomalies in a cloud service environment.

ARIMALSTMTransformer

0 likes · 12 min read

How to Master Time Series Forecasting for Cloud CPU Anomaly Detection

Tech Freedom Circle

Jul 17, 2025 · Artificial Intelligence

DeepSeek V3 Architecture Deep Dive: MoE, MLA, DualPipe, FP8 Mixed Precision & Multi‑Token Prediction

This article provides a detailed technical analysis of DeepSeek‑V3, covering its MOE architecture, the novel Multi‑head Latent Attention (MLA) mechanism, the DualPipe pipeline‑parallel algorithm, mixed‑precision FP8 training, and the Multi‑Token Prediction (MTP) inference improvements that together boost performance and efficiency.

DeepSeekDistributed TrainingDualPipe

0 likes · 44 min read

DeepSeek V3 Architecture Deep Dive: MoE, MLA, DualPipe, FP8 Mixed Precision & Multi‑Token Prediction

AI Frontier Lectures

Jul 10, 2025 · Artificial Intelligence

Can 2‑Simplicial Attention Redefine Transformer Scaling Laws?

A recent Meta paper introduces a rotation‑invariant 2‑simplicial attention mechanism, demonstrates its superior scaling‑law coefficients over standard dot‑product attention, and provides experimental evidence of improved token efficiency and model performance under constrained token budgets.

2-simplicialMetaTransformer

0 likes · 11 min read

Can 2‑Simplicial Attention Redefine Transformer Scaling Laws?

High Availability Architecture

Jul 9, 2025 · Artificial Intelligence

How LLMs Evolved from GPT‑4 to Agentic AI: Trends, Techniques, and Future Directions

This article analyzes the rapid evolution of large language models from the GPT‑4 era through efficiency‑focused sparsity and attention innovations, to inference‑time reasoning and tool‑using agents, highlighting key architectures, benchmark breakthroughs, competitive strategies, and emerging research directions toward embodied AI.

Agentic AILLMTransformer

0 likes · 24 min read

How LLMs Evolved from GPT‑4 to Agentic AI: Trends, Techniques, and Future Directions

Alibaba Cloud Developer

Jul 8, 2025 · Artificial Intelligence

From GPT‑4 to Thinking Models: How LLM Architecture Evolved After 2023

This article traces the evolution of large language models from the GPT‑4 era through 2024‑2025, highlighting the shift from pure scaling to efficiency‑focused architectures, the rise of reasoning‑centric "thinking" models, and the emergence of agentic capabilities that enable tools and real‑world interaction.

LLMTransformeragents

0 likes · 27 min read

From GPT‑4 to Thinking Models: How LLM Architecture Evolved After 2023

IT Services Circle

Jul 6, 2025 · Artificial Intelligence

Why Transformers Train Like Any Neural Network: Backpropagation Explained

This article demystifies how Transformers are trained by showing that all their linear layers have learnable weights and biases, and that the attention mechanism—including softmax and dot‑product operations—is fully differentiable and updated via standard back‑propagation.

BackpropagationDeep LearningPyTorch

0 likes · 7 min read

Why Transformers Train Like Any Neural Network: Backpropagation Explained

AI Algorithm Path

Jul 5, 2025 · Artificial Intelligence

Beginner’s Guide to Vision‑Language Models Day 7: How CLIP Achieves Joint Visual‑Language Understanding

This article explains CLIP’s dual‑encoder architecture—using a Vision Transformer for images and a Transformer for text—how both encoders map inputs into a shared embedding space, the role of cosine similarity, and the InfoNCE contrastive loss that drives joint visual‑language learning.

CLIPInfoNCEMulti-modal Embedding

0 likes · 8 min read

Beginner’s Guide to Vision‑Language Models Day 7: How CLIP Achieves Joint Visual‑Language Understanding

Amap Tech

Jun 30, 2025 · Artificial Intelligence

SeqGrowGraph: Chain-of-Graph Expansion for Precise Lane Topology

SeqGrowGraph introduces a novel chain-of-graph expansion framework that incrementally builds lane topology graphs using a Transformer-based autoregressive model, achieving state‑of‑the‑art performance on large autonomous‑driving datasets such as nuScenes and Argoverse 2 by accurately modeling complex road structures.

Computer VisionSequence ModelingTransformer

0 likes · 10 min read

SeqGrowGraph: Chain-of-Graph Expansion for Precise Lane Topology

Cognitive Technology Team

Jun 29, 2025 · Artificial Intelligence

Understanding Transformers: Core Mechanics Behind Modern AI Models

This article demystifies the Transformer architecture for beginners, explaining its relationship to large models, the self‑attention and multi‑head attention mechanisms, positional encoding, and the roles of Encoder and Decoder components, using clear analogies and visual diagrams to aid comprehension.

Deep LearningEncoder-DecoderPositional Encoding

0 likes · 20 min read

Understanding Transformers: Core Mechanics Behind Modern AI Models

ITFLY8 Architecture Home

Jun 24, 2025 · Artificial Intelligence

How Transformers and Mixture-of-Experts Power Large Language Models

This article explores the role of Transformers and Mixture‑of‑Experts in large models, outlines five fine‑tuning methods, compares traditional and agentic RAG, presents classic agent design patterns, text‑chunking strategies, levels of intelligent agent systems, and explains KV‑caching techniques.

Fine-tuningMixture of ExpertsRAG

0 likes · 2 min read

How Transformers and Mixture-of-Experts Power Large Language Models

Programmer Xu Shu

Jun 23, 2025 · Artificial Intelligence

From Bag‑of‑Words to ChatGPT: How Large Language Models Evolved

Tracing the evolution of large language models—from early bag‑of‑words techniques, through word embeddings, RNNs, attention mechanisms, Transformers, BERT, and GPT—this article explains each breakthrough, its limitations, and how they culminated in ChatGPT’s conversational AI.

AI evolutionChatGPTTransformer

0 likes · 12 min read

From Bag‑of‑Words to ChatGPT: How Large Language Models Evolved

MaGe Linux Operations

Jun 15, 2025 · Artificial Intelligence

Mastering Transformers: Key Extensions and Optimization Techniques Explained

This comprehensive guide walks you through the Transformer architecture—from its encoder‑decoder structure and self‑attention mechanism to multi‑head attention, positional embeddings, and practical PyTorch implementations—providing clear visualizations and code examples for deep learning practitioners.

Deep LearningPyTorchSelf-Attention

0 likes · 22 min read

Mastering Transformers: Key Extensions and Optimization Techniques Explained

Open Source Linux

Jun 12, 2025 · Artificial Intelligence

From Transformers to DeepSeek‑R1: The Evolution of Large Language Models (2017‑2025)

This article chronicles the rapid development of large language models from the 2017 Transformer breakthrough through the rise of BERT, GPT‑3, multimodal models, alignment techniques like RLHF, and finally the cost‑efficient DeepSeek‑R1 in 2025, highlighting key innovations, scaling trends, and real‑world impacts.

AI AlignmentDeep LearningModel Scaling

0 likes · 26 min read

From Transformers to DeepSeek‑R1: The Evolution of Large Language Models (2017‑2025)

AI Algorithm Path

Jun 8, 2025 · Artificial Intelligence

Autoregressive vs Diffusion Language Models: Principles, Trade‑offs, and Future Directions

The article compares autoregressive and diffusion language models, detailing their mathematical foundations, training and inference pipelines, performance trade‑offs such as speed, coherence and diversity, and explores hybrid approaches and emerging research directions for more efficient and controllable text generation.

AI researchText GenerationTransformer

0 likes · 17 min read

Autoregressive vs Diffusion Language Models: Principles, Trade‑offs, and Future Directions

ITFLY8 Architecture Home

Jun 5, 2025 · Artificial Intelligence

Why Large Models Are Redefining Software: The Four AI Tech Drivers

The article explains how rapid AI advances and the AIAgent architecture are reshaping software development, outlines four key technical drivers—embedding, Transformer scaling laws, scenario Moore's law, and LLM OS—and discusses the security, professionalism, and responsibility challenges enterprises face when deploying AI‑native applications.

AI ArchitectureEmbeddingEnterprise AI

0 likes · 6 min read

Why Large Models Are Redefining Software: The Four AI Tech Drivers

Data Thinking Notes

Jun 2, 2025 · Artificial Intelligence

Why Pre‑Training Powers Modern AI: From Theory to Real‑World Applications

Pre‑training enables AI models to first acquire a universal knowledge map from massive unlabelled text, then quickly adapt to specific tasks with minimal labelled data, offering superior generalization, reduced annotation costs, and versatile applications across chatbots, content creation, retrieval, coding assistance, and more.

AI applicationsTransformerlarge language models

0 likes · 14 min read

Why Pre‑Training Powers Modern AI: From Theory to Real‑World Applications

AI Large Model Application Practice

May 30, 2025 · Artificial Intelligence

Why Layer Normalization Stabilizes Transformers: A Deep Dive

This article explains the mathematical foundation of layer normalization, why it is needed for deep neural networks like Transformers, how scaling (γ) and bias (β) parameters restore important signal variations, and practical placement tips for stable training.

BiasDeep LearningLayer Normalization

0 likes · 8 min read

Why Layer Normalization Stabilizes Transformers: A Deep Dive

Architect

May 18, 2025 · Artificial Intelligence

How Much GPU Memory Can One Model Use? A Deep Dive into Transformer Memory Accounting

This article breaks down GPU memory consumption for large Transformer models, explains how to estimate each component—parameters, optimizer state, activations, gradients—and shows how parallelism, mixed precision, and recomputation strategies can dramatically reduce the footprint.

AI trainingGPU MemoryMemory Optimization

0 likes · 14 min read

How Much GPU Memory Can One Model Use? A Deep Dive into Transformer Memory Accounting

Meituan Technology Team

May 15, 2025 · Artificial Intelligence

How Meituan’s MTGR Framework Achieved 65× Faster Inference with Scaling Laws

Meituan’s recommendation team introduced the MTGR framework, aligning traditional DLRM features with a unified HSTU‑based Transformer to explore scaling laws, delivering a 65‑fold FLOPs boost, 12% lower inference cost, and record gains in online CTR and order volume across its food‑delivery platform.

Inference OptimizationLarge-Scale TrainingMTGR

0 likes · 26 min read

How Meituan’s MTGR Framework Achieved 65× Faster Inference with Scaling Laws

AI Frontier Lectures

May 6, 2025 · Artificial Intelligence

Can Convolution Replace Self‑Attention for Efficient Image Super‑Resolution?

The paper proposes ESC, a lightweight image super‑resolution network that emulates Transformer self‑attention using large‑kernel and dynamic convolutions, achieving higher PSNR with significantly lower latency and memory consumption, making it suitable for mobile deployment.

Deep LearningTransformerconvolutional attention

0 likes · 12 min read

Can Convolution Replace Self‑Attention for Efficient Image Super‑Resolution?

Data Thinking Notes

Apr 29, 2025 · Artificial Intelligence

From Transformers to DeepSeek‑R1: How LLMs Evolved to 2025

This article chronicles the evolution of large language models from the 2017 Transformer breakthrough through BERT, GPT series, multimodal models, and recent cost‑efficient innovations like DeepSeek‑R1, highlighting key architectures, training methods, alignment techniques, and their transformative impact on AI applications.

AI AlignmentTransformerlarge language models

0 likes · 29 min read

From Transformers to DeepSeek‑R1: How LLMs Evolved to 2025

AI Frontier Lectures

Apr 27, 2025 · Artificial Intelligence

How Jeff Dean’s Vision Shaped Modern AI: From Neural Nets to Gemini

Jeff Dean’s 2024 ETH Zurich talk traces fifteen years of AI breakthroughs—from the rise of neural networks and back‑propagation, through large‑scale distributed training, TPUs, Transformers, sparse MoE models, and advanced prompting techniques—showing how scaling compute, data, and clever software have driven today’s powerful Gemini models.

AIChain-of-ThoughtDistillation

0 likes · 18 min read

How Jeff Dean’s Vision Shaped Modern AI: From Neural Nets to Gemini

Didi Tech

Apr 24, 2025 · Artificial Intelligence

Algorithmic Foundations and Evolution of Natural Language Processing

The article surveys the Algorithmic Foundations of Engineering R&D series, tracing NLP’s evolution from rule‑based systems to today’s multimodal large‑model era, reviewing core machine‑learning and deep‑learning techniques, transformer breakthroughs, representation learning, optimization methods, and emerging research such as retrieval‑augmented generation and AI agents.

AINLPTransformer

0 likes · 43 min read

Algorithmic Foundations and Evolution of Natural Language Processing

Tencent Technical Engineering

Apr 16, 2025 · Artificial Intelligence

Understanding Transformer Architecture for Chinese‑English Translation: A Practical Guide

This practical guide walks through the full Transformer architecture for Chinese‑to‑English translation, detailing encoder‑decoder structure, tokenization and embeddings, batch handling with padding and masks, positional encodings, parallel teacher‑forcing, self‑ and multi‑head attention, and the complete forward and back‑propagation training steps.

Positional EncodingPyTorchSelf-Attention

0 likes · 26 min read

Understanding Transformer Architecture for Chinese‑English Translation: A Practical Guide

AI Frontier Lectures

Apr 13, 2025 · Artificial Intelligence

How HINT’s Hierarchical Multi‑Head Attention Boosts Image Restoration Quality

The paper introduces HINT, a Transformer‑based image restoration model that employs Hierarchical Multi‑Head Attention (HMHA) and a Query‑Key Cache Updating (QKCU) module to eliminate attention redundancy, achieving superior PSNR/SSIM scores across low‑light enhancement, dehazing, desnowing, denoising, and deraining tasks while maintaining low model complexity.

Computer VisionHierarchical AttentionImage Restoration

0 likes · 10 min read

How HINT’s Hierarchical Multi‑Head Attention Boosts Image Restoration Quality

AI Algorithm Path

Apr 10, 2025 · Artificial Intelligence

Beginner-Friendly Guide to Understanding Large Language Models

This article walks readers through the fundamentals of large language models, covering what tokens are, how tokenization works, the conversion of tokens to numeric IDs, the transformer architecture—including positional encoding, self‑attention, feed‑forward networks and softmax—and explains how these components enable next‑token prediction.

EmbeddingLLMSelf-Attention

0 likes · 9 min read

Beginner-Friendly Guide to Understanding Large Language Models

Ops Development & AI Practice

Apr 3, 2025 · Artificial Intelligence

What Powers LLMs? Unpacking Transformers, Architectures, and Context Windows

This article explains the core Transformer architecture behind large language models, compares encoder‑decoder and decoder‑only designs, and dives into the crucial concept of the context window, including its limits, examples, and ongoing research to extend it.

AI ArchitectureContext WindowLLM

0 likes · 10 min read

What Powers LLMs? Unpacking Transformers, Architectures, and Context Windows

AI Frontier Lectures

Apr 1, 2025 · Artificial Intelligence

Can SpargeAttn Accelerate Any Model Without Training? A Deep Dive

This article reviews the SpargeAttn paper, describing how a training‑free sparse attention mechanism achieves 4‑7× inference speedup across language, video, and image models while preserving end‑to‑end accuracy, and outlines its challenges, algorithmic solutions, implementation details, and experimental results.

GPU OptimizationQuantized InferenceSpargeAttn

0 likes · 7 min read

Can SpargeAttn Accelerate Any Model Without Training? A Deep Dive

Architects' Tech Alliance

Mar 31, 2025 · Artificial Intelligence

A Comprehensive History of Large Language Models from the Transformer Era (2017) to DeepSeek‑R1 (2025)

This article reviews the evolution of large language models from the 2017 Transformer breakthrough through BERT, GPT series, alignment techniques, multimodal extensions, open‑weight releases, and the cost‑efficient DeepSeek‑R1 in 2025, highlighting key technical advances, scaling trends, and their societal impact.

AI AlignmentLLM evolutionMultimodal AI

0 likes · 26 min read

A Comprehensive History of Large Language Models from the Transformer Era (2017) to DeepSeek‑R1 (2025)

AntTech

Mar 26, 2025 · Artificial Intelligence

BodyGen: A Bio‑Inspired Embodied Co‑Design Framework for Autonomous Robot Evolution

BodyGen, a new embodied co‑design framework presented at ICLR 2025, enables robots to autonomously evolve their morphology and control policies using reinforcement learning and transformer‑based networks, achieving up to 60 % performance gains with a lightweight 1.43 M‑parameter model, and its code is publicly released.

Embodied AITransformerco-design

0 likes · 10 min read

BodyGen: A Bio‑Inspired Embodied Co‑Design Framework for Autonomous Robot Evolution

AI Algorithm Path

Mar 19, 2025 · Artificial Intelligence

Understanding Multimodal Large Language Models: Part 1

This article explains the fundamentals of multimodal large language models, covering their definition, typical applications, two main architectural approaches—unified embedding decoder and cross‑modal attention—along with detailed component breakdowns, a PyTorch implementation of image‑patch projection, and training considerations, ending with a discussion of trade‑offs between the methods.

Cross-AttentionImage EncoderLinear Projection

0 likes · 14 min read

Understanding Multimodal Large Language Models: Part 1

IT Services Circle

Mar 19, 2025 · Artificial Intelligence

ByteDance’s AI Video Generation Model Goku, Streamer‑Sales Live‑Selling Model, and MimicTalk 3D Talking‑Head Project

ByteDance and partners open‑source three AI projects—Goku for high‑quality text‑to‑video generation, Streamer‑Sales for multimodal live‑selling LLMs, and MimicTalk for rapid 3D talking‑head creation—detailing their core features, underlying transformer‑based architectures, training pipelines, and public repositories.

AI video generationTransformerVirtual digital human

0 likes · 5 min read

ByteDance’s AI Video Generation Model Goku, Streamer‑Sales Live‑Selling Model, and MimicTalk 3D Talking‑Head Project

AIWalker

Mar 14, 2025 · Artificial Intelligence

Dynamic Tanh Lets He Kaiming and LeCun Drop Transformer Normalization in 9 Lines

Researchers He Kaiming, Yann LeCun and colleagues propose a 9‑line Dynamic Tanh (DyT) layer that replaces LayerNorm/RMSNorm in Transformers, showing comparable or superior accuracy across vision, language, speech and DNA tasks while also reducing inference latency on modern GPUs.

AI researchDeep LearningDynamic Tanh

0 likes · 18 min read

Dynamic Tanh Lets He Kaiming and LeCun Drop Transformer Normalization in 9 Lines

Network Intelligence Research Center (NIRC)

Mar 12, 2025 · Artificial Intelligence

How Sparse Autoencoders Uncover Monosemantic Features in Large Language Models

The article reviews the paper ‘Towards Monosemanticity: Decomposing Language Models With Dictionary Learning’, showing how Anthropic’s sparse autoencoders extract interpretable, monosemantic concepts from transformer layers, enable controlled generation, and reveal trade‑offs such as data‑intensive training and potential performance impacts.

Dictionary LearningFeature ControlLLM Interpretability

0 likes · 9 min read

How Sparse Autoencoders Uncover Monosemantic Features in Large Language Models

AIWalker

Mar 11, 2025 · Artificial Intelligence

MobileMamba: Lightweight Multi‑Receptive‑Field Backbone Beats Existing Mamba Models

MobileMamba introduces a three‑stage, lightweight backbone with a multi‑receptive‑field feature‑interaction module that combines wavelet‑enhanced Mamba, multi‑kernel depthwise convolutions, and redundant‑mapping reduction, delivering up to 83.6% ImageNet Top‑1 accuracy while running 21× faster than LocalVim and 3.3× faster than EfficientVMamba.

BenchmarkCNNMamba

0 likes · 10 min read

MobileMamba: Lightweight Multi‑Receptive‑Field Backbone Beats Existing Mamba Models

NewBeeNLP

Mar 11, 2025 · Artificial Intelligence

How DeepSeek’s New Architecture Redefines LLM Efficiency and Performance

This article analyzes DeepSeek’s recent breakthroughs—including the Multi‑Head Latent Attention (MLA), Group Relative Policy Optimization (GRPO), and a refined Mixture‑of‑Experts design—along with its three‑stage training pipeline, RL‑only R1‑Zero variant, and benchmark comparisons against GPT‑4o‑Mini and Llama 3.1, highlighting both gains and remaining challenges.

DeepSeekLLMMixture of Experts

0 likes · 18 min read

How DeepSeek’s New Architecture Redefines LLM Efficiency and Performance

Cognitive Technology Team

Mar 10, 2025 · Artificial Intelligence

Understanding Transformers: From NLP Challenges to Architecture and Core Mechanisms

This article explains the evolution of natural language processing, the limitations of rule‑based, statistical, and recurrent neural network models, and then introduces the Transformer architecture—covering word and position embeddings, self‑attention, multi‑head attention, Add & Norm, feed‑forward layers, and encoder‑decoder design—to help beginners grasp why Transformers solve key NLP problems.

AINLPSelf-Attention

0 likes · 15 min read

Understanding Transformers: From NLP Challenges to Architecture and Core Mechanisms

Alibaba Cloud Developer

Mar 10, 2025 · Artificial Intelligence

Why Transformers Revolutionized NLP: From Problems to Solutions

This article explains the historical challenges of natural language processing, from rule‑based and statistical models to recurrent networks and their limitations, then introduces the Transformer architecture, its self‑attention mechanism, multi‑head attention, and supporting layers, illustrating how it overcomes previous issues and enables efficient parallel training.

NLPSelf-AttentionTransformer

0 likes · 16 min read

Why Transformers Revolutionized NLP: From Problems to Solutions

AI Frontier Lectures

Mar 7, 2025 · Artificial Intelligence

From Transformers to DeepSeek‑R1: Tracing the Evolution of Large Language Models (2017‑2025)

This article chronicles the rapid development of large language models from the 2017 Transformer breakthrough through successive milestones such as BERT, GPT‑3, ChatGPT, multimodal GPT‑4 variants, open‑weight releases, and the cost‑efficient DeepSeek‑R1, highlighting key architectural innovations, training paradigms, alignment techniques, and industry impact.

Cost‑Efficient InferenceModel AlignmentReasoning Models

0 likes · 27 min read

From Transformers to DeepSeek‑R1: Tracing the Evolution of Large Language Models (2017‑2025)

Cognitive Technology Team

Mar 7, 2025 · Artificial Intelligence

From Word Embeddings to Large Language Models: A Comprehensive Overview of AI Model Evolution

This article traces the development of AI models—from early word embeddings like Word2Vec and ELMo, through transformer‑based encoders such as BERT and decoder‑only models like GPT‑1/2/3, to recent multimodal systems and scaling laws—explaining their architectures, training methods, and impact on modern AI applications.

AIEmbeddingTransformer

0 likes · 22 min read

From Word Embeddings to Large Language Models: A Comprehensive Overview of AI Model Evolution

Baobao Algorithm Notes

Mar 6, 2025 · Artificial Intelligence

Alibaba Unveils QwQ-32B: A 32‑Billion‑Parameter Inference Model with Agent Capabilities

Alibaba has open‑sourced its new QwQ‑32B inference model, a 32.5‑billion‑parameter transformer that rivals top models like DeepSeek‑R1 and o1‑mini, features integrated agent abilities for tool use and critical thinking, and offers a low inference barrier with extensive technical specifications and RL‑based training details.

AlibabaTransformeragent capabilities

0 likes · 4 min read

Alibaba Unveils QwQ-32B: A 32‑Billion‑Parameter Inference Model with Agent Capabilities

AIWalker

Mar 6, 2025 · Artificial Intelligence

How SCMHSA Improves Transformer Next‑Frame Prediction by Reducing Semantic Dilution

The paper introduces a Semantic‑Concentrated Multi‑Head Self‑Attention (SCMHSA) module and a new embedding‑space loss to address semantic dilution and loss‑target mismatch in Transformer‑based video next‑frame prediction, demonstrating significant PSNR and MSE gains across four benchmark datasets.

Computer VisionEmbedding LossSCMHSA

0 likes · 23 min read

How SCMHSA Improves Transformer Next‑Frame Prediction by Reducing Semantic Dilution

JD Cloud Developers

Mar 5, 2025 · Artificial Intelligence

How GLM’s Autoregressive Blank‑Filling Beats BERT, T5, and GPT

GLM introduces a universal language model that combines autoregressive blank‑filling with 2D positional encoding and span‑shuffle training, achieving superior performance over BERT, T5, and GPT across NLU, conditional and unconditional generation tasks, as demonstrated on SuperGLUE and other benchmarks.

Language ModelNLUTransformer

0 likes · 29 min read

How GLM’s Autoregressive Blank‑Filling Beats BERT, T5, and GPT

Architect

Mar 2, 2025 · Artificial Intelligence

Demystifying Mixture of Experts: How MoE Boosts LLMs and Vision Models

This article explains the Mixture of Experts (MoE) architecture, detailing experts, routers, dense vs. sparse layers, load‑balancing strategies such as KeepTopK, auxiliary loss, capacity constraints, the Switch Transformer simplification, and how MoE is applied to both language and vision models, illustrated with concrete examples and parameter counts.

Mixture of ExpertsMoESparse Models

0 likes · 17 min read

Demystifying Mixture of Experts: How MoE Boosts LLMs and Vision Models

IT Architects Alliance

Feb 26, 2025 · Artificial Intelligence

DeepSeek Large Model: Core Architecture, Key Technologies, and Training Strategies

The article provides an in‑depth overview of DeepSeek’s large language model, detailing its mixture‑of‑experts and Transformer foundations, novel attention mechanisms, load‑balancing, multi‑token prediction, FP8 mixed‑precision training, and various training regimes such as knowledge distillation and reinforcement learning.

DeepSeekFP8MLA

0 likes · 18 min read

DeepSeek Large Model: Core Architecture, Key Technologies, and Training Strategies

21CTO

Feb 24, 2025 · Artificial Intelligence

From Transformers to DeepSeek-R1: Evolution of Large Language Models

Since the 2017 introduction of the Transformer architecture, this article chronicles the rapid development of large language models—including BERT, GPT series, multimodal systems, and the cost‑effective DeepSeek‑R1—highlighting key innovations, scaling trends, alignment techniques, and their transformative impact across AI research and industry.

AI evolutionDeepSeekLLM History

0 likes · 23 min read

From Transformers to DeepSeek-R1: Evolution of Large Language Models

AIWalker

Feb 20, 2025 · Artificial Intelligence

Transfusion: A Single Model for Unified Image Generation and Understanding

Transfusion is a 7B‑parameter transformer that jointly trains language modeling and diffusion losses on mixed text‑image data, enabling seamless text generation, image generation, and image understanding within one model and outperforming prior multimodal approaches such as Chameleon across multiple benchmarks.

AI researchLanguage ModelingTransformer

0 likes · 20 min read

Transfusion: A Single Model for Unified Image Generation and Understanding

AIWalker

Feb 19, 2025 · Artificial Intelligence

DeepSeek’s NSA Attention Cuts Inference Time 11× – CEO Liang Co‑author

DeepSeek introduces the NSA sparse attention mechanism, combining dynamic hierarchical sparsity, coarse token compression and fine token selection to achieve up to 11.6× faster inference, lower pre‑training cost, and superior benchmark performance across general, long‑context, and chain‑of‑thought tasks.

BenchmarkDeepSeekLLM optimization

0 likes · 9 min read

DeepSeek’s NSA Attention Cuts Inference Time 11× – CEO Liang Co‑author

Ops Development & AI Practice

Feb 16, 2025 · Artificial Intelligence

Why FlashAttention Supercharges Qwen Models: A Technical Deep Dive

This article explains the FlashAttention algorithm, its memory‑efficient tiling and recomputation techniques, and how enabling the flash_attn flag dramatically speeds up Qwen‑series large models while outlining hardware, software requirements and potential trade‑offs.

FlashAttentionGPU OptimizationPyTorch

0 likes · 8 min read

Why FlashAttention Supercharges Qwen Models: A Technical Deep Dive

IT Architects Alliance

Feb 15, 2025 · Artificial Intelligence

DeepSeek: Architecture, Core Technologies, Training Strategies, and Comparative Analysis

The article provides an in‑depth overview of DeepSeek's transformer‑based foundation, Mixture‑of‑Experts architecture, novel attention mechanisms, multi‑token prediction, FP8 mixed‑precision training, knowledge distillation, reinforcement‑learning approaches, and compares its performance and cost advantages against leading models such as GPT and Gemini.

AI model architectureDeepSeekFP8 training

0 likes · 29 min read

Architect

Feb 13, 2025 · Artificial Intelligence

How to Build a Mini ChatGPT on a Single GPU with MiniMind

This article provides a comprehensive, step‑by‑step guide to training and fine‑tuning a miniature large‑language model called MiniMind, covering lightweight model design, open‑source training pipelines, required datasets, tokenizer options, and deployment via a web UI, all using PyTorch on modest hardware.

AILLMMiniMind

0 likes · 11 min read

How to Build a Mini ChatGPT on a Single GPU with MiniMind

Architects' Tech Alliance

Feb 12, 2025 · Industry Insights

How DeepSeek Is Redefining China’s AI Landscape in 2025

The DeepSeek research framework 2025 reveals that its V3 and R1 models, built on Transformer with MLA and DeepSeek MoE technologies, are accelerating training efficiency, reshaping domestic AI valuation, and positioning open‑source AI as a disruptive force in the global market.

AI modelsChina AIDeepSeek

0 likes · 5 min read

How DeepSeek Is Redefining China’s AI Landscape in 2025

AI Algorithm Path

Feb 12, 2025 · Artificial Intelligence

Essential DeepSeek‑R1 Reading List: Papers Behind the 2025 Hottest LLM

This article compiles a curated reading list of foundational and recent research papers—from the original Transformer to chain‑of‑thought, mixture‑of‑experts, and reinforcement‑learning studies—that together explain the breakthroughs behind DeepSeek‑R1 and guide readers through the technical evolution of modern large language models.

DeepSeekMixture of ExpertsResearch Papers

0 likes · 15 min read

Essential DeepSeek‑R1 Reading List: Papers Behind the 2025 Hottest LLM

vivo Internet Technology

Feb 12, 2025 · Artificial Intelligence

Bidirectional Optimization of NLLB-200 and ChatGPT for Low-Resource Language Translation

The paper proposes a bidirectional optimization framework that fine‑tunes the low‑resource NLLB‑200 translation model with LoRA using data generated by ChatGPT, while also translating low‑resource prompts with NLLB before feeding them to LLMs, thereby improving multilingual translation quality yet requiring careful validation of noisy synthetic data.

Fine-tuningLLMLoRA

0 likes · 28 min read

Bidirectional Optimization of NLLB-200 and ChatGPT for Low-Resource Language Translation

Alibaba Cloud Native

Feb 10, 2025 · Cloud Native

How to Deploy DeepSeek‑R1‑Distill Models on Alibaba Cloud CAP (Ollama & Transformer)

This guide walks you through deploying various DeepSeek‑R1‑Distill models on Alibaba Cloud's Serverless AI platform CAP, covering supported models, deployment options (Ollama and Transformer), step‑by‑step template and model‑service setups, validation methods, and tips for adding custom models.

AICAPDeepSeek

0 likes · 10 min read

How to Deploy DeepSeek‑R1‑Distill Models on Alibaba Cloud CAP (Ollama & Transformer)

AI Algorithm Path

Feb 9, 2025 · Artificial Intelligence

Understanding Multi-Token Prediction in DeepSeek‑R1 Architecture

This article dissects the Multi‑Token Prediction (MTP) technique used in DeepSeek‑R1, contrasting it with traditional next‑token prediction, detailing Meta’s MTP design, DeepSeek’s adapted architecture, loss weighting, and why MTP is applied only during training to boost efficiency and model capability.

DeepSeekMTPModel architecture

0 likes · 9 min read

Understanding Multi-Token Prediction in DeepSeek‑R1 Architecture

IT Architects Alliance

Feb 8, 2025 · Artificial Intelligence

Inside DeepSeek: How Its Innovative Architecture Redefines AI Performance

This article examines DeepSeek's advanced Transformer‑based architecture, dynamic routing, MoE system, multi‑stage training, efficient inference, multimodal capabilities, real‑world applications, technical challenges, and future prospects, providing a comprehensive technical analysis of the model's strengths and limitations.

AI ArchitectureDeepSeekModel Optimization

0 likes · 15 min read

Inside DeepSeek: How Its Innovative Architecture Redefines AI Performance

JavaEdge

Feb 6, 2025 · Artificial Intelligence

Why Training Transformers Faces an Impossible Triangle of Speed, Performance, and Cost

The article explains the “impossible triangle” in Transformer training, showing how speed, model performance, and computational cost cannot all be optimized simultaneously, and uses analogies and real‑world examples like GPT‑4 to illustrate the necessary trade‑offs.

Deep LearningModel TrainingPerformance Tradeoff

0 likes · 7 min read

Why Training Transformers Faces an Impossible Triangle of Speed, Performance, and Cost

DaTaobao Tech

Jan 22, 2025 · Artificial Intelligence

AI Trends 2025: Paths to AGI, Scaling Law Evolution, and Industry Impact

The article surveys the AI revolution driven by foundation models and an evolving Scaling Law, outlining four AGI pathways—large models, intelligent robots, brain‑computer interfaces, and digital life—while highlighting transformer‑based convergence, generative‑first‑principle breakthroughs like DeepSeek‑V3, and transformative industry impacts ranging from consumer robots to Medical 2.0, personalized education, and digital‑simulation platforms such as NVIDIA’s Omniverse.

AGIAIAI industry

0 likes · 23 min read

AI Trends 2025: Paths to AGI, Scaling Law Evolution, and Industry Impact

Baobao Algorithm Notes

Jan 15, 2025 · Artificial Intelligence

How Multi-Token Prediction Boosts LLM Training and Inference Efficiency

This article reviews the evolution of Multi‑Token Prediction (MTP) techniques—from early blockwise parallel decoding to Meta's and DeepSeek's implementations—explaining their architectures, training and inference workflows, and the speed‑up gains they offer for large language models.

DeepSeekInference AccelerationLLM

0 likes · 20 min read

How Multi-Token Prediction Boosts LLM Training and Inference Efficiency

AIWalker

Jan 10, 2025 · Artificial Intelligence

How a Simplified Transformer Enables Lightweight CLIP Training on a Single RTX3090

This paper presents SiCLIP, a framework that simplifies the Transformer architecture, combines weight‑sharing, multi‑stage knowledge distillation, and a novel pair‑matching loss with synthetic captions to train a competitive CLIP model using only one RTX3090 GPU and 1 TB of storage, achieving state‑of‑the‑art data‑size‑parameter‑accuracy trade‑offs.

CLIPLightweight TrainingSynthetic Captions

0 likes · 19 min read

How a Simplified Transformer Enables Lightweight CLIP Training on a Single RTX3090

DevOps

Dec 19, 2024 · Artificial Intelligence

Yann LeCun Discusses AI, Self‑Supervised Learning, and the Future of AGI

Yann LeCun, in a half‑hour interview with Indian entrepreneur Nikhil Kamath, explains the fundamentals of artificial intelligence, critiques current transformer models, describes self‑supervised learning, outlines his joint‑embedding predictive architecture, and shares his vision for AGI, open‑source ecosystems, and the role of PhDs for AI entrepreneurs.

AGITransformerartificial intelligence

0 likes · 16 min read

Yann LeCun Discusses AI, Self‑Supervised Learning, and the Future of AGI

AntTech

Dec 6, 2024 · Artificial Intelligence

Nimbus: Secure and Efficient Two‑Party Inference for Transformers

The paper introduces Nimbus, a two‑party privacy‑preserving inference framework for Transformer models that leverages a client‑side outer‑product linear‑layer protocol and distribution‑aware polynomial approximations for non‑linear layers, achieving up to five‑fold speedups with negligible accuracy loss.

Homomorphic EncryptionPerformance OptimizationTransformer

0 likes · 15 min read

Nimbus: Secure and Efficient Two‑Party Inference for Transformers

Rare Earth Juejin Tech Community

Dec 2, 2024 · Artificial Intelligence

Building a Simple Chatbot with Alibaba Tongyi Large Language Model: Fundamentals and Implementation

This article introduces the basic concepts of supervised and unsupervised machine learning, explains the core mechanisms of large language models such as Transformers, and provides a step‑by‑step guide with code to build a simple chatbot using Alibaba's Tongyi LLM via Spring Boot.

AlibabaChatbotLLM

0 likes · 11 min read

Building a Simple Chatbot with Alibaba Tongyi Large Language Model: Fundamentals and Implementation

DataFunSummit

Nov 24, 2024 · Artificial Intelligence

AI-Driven Forecasting in Modern Supply Chains: Methods, Models, and Practical Guidance

The article explains how modern supply chain forecasting has shifted from qualitative expert judgment to quantitative AI-driven methods such as DeepAR, ensemble learning, and Transformers, and outlines the skills needed for practitioners to build effective predictive models.

AIDeepARSupply Chain

0 likes · 10 min read

AI-Driven Forecasting in Modern Supply Chains: Methods, Models, and Practical Guidance

Cognitive Technology Team

Nov 20, 2024 · Artificial Intelligence

Fundamentals and Implementation of Neural Networks and Transformers with PyTorch Examples

This article provides a comprehensive overview of neural network fundamentals, loss functions, activation functions, embedding techniques, attention mechanisms, multi‑head attention, residual networks, and the full Transformer encoder‑decoder architecture, illustrated with detailed PyTorch code and a practical MiniRBT fine‑tuning case for Chinese text classification.

AIPyTorchTransformer

0 likes · 49 min read

Fundamentals and Implementation of Neural Networks and Transformers with PyTorch Examples

NewBeeNLP

Nov 18, 2024 · Artificial Intelligence

How to Optimize Multi-Head Attention: From MQA to FlashAttention and Beyond

This article examines various techniques for compressing and accelerating the KV cache in transformer models—including MQA, GQA, MLA, sliding‑window and linear attention, flash attention, page and ring attention, as well as mixed‑precision training and ZeRO parallelism—providing code snippets, implementation details, and practical trade‑offs.

FlashAttentionKV cacheModel Parallelism

0 likes · 17 min read

How to Optimize Multi-Head Attention: From MQA to FlashAttention and Beyond

Infra Learning Club

Oct 30, 2024 · Artificial Intelligence

How GPT-3 Evolved: From Transformer Roots to Massive Language Models

The article traces the development of GPT series—from the 2017 Transformer breakthrough, through GPT‑1, GPT‑2, and GPT‑3’s 175 billion parameters, to later models like Codex and ChatGPT—highlighting key papers, architectural choices, and the surprising role of OpenAI’s decoder‑only approach.

GPT-3GoogleLanguage Model

0 likes · 4 min read

How GPT-3 Evolved: From Transformer Roots to Massive Language Models

Tencent Advertising Technology

Oct 17, 2024 · Artificial Intelligence

Long Sequence Modeling for Advertising Recommendation: TIN, Disentangled Side‑Info TIN, Stacked TIN, and Target‑aware SASRec

This article presents a comprehensive solution for heterogeneous long‑behavior sequence modeling in advertising recommendation, introducing the TIN backbone, Disentangled Side‑Info TIN, Stacked TIN, and Target‑aware SASRec, along with platform‑level optimizations that enable million‑scale sequences while delivering significant online performance gains.

AdvertisingDeep LearningPerformance Optimization

0 likes · 15 min read

Long Sequence Modeling for Advertising Recommendation: TIN, Disentangled Side‑Info TIN, Stacked TIN, and Target‑aware SASRec

DataFunTalk

Oct 1, 2024 · Artificial Intelligence

From Early AI to Superintelligence: Challenges and Prospects

The article reviews the evolution of artificial intelligence from early statistical models through deep learning and Transformer architectures, examines current breakthroughs like multimodal models, and discusses the technical, computational, and safety challenges that must be overcome before achieving artificial superintelligence (ASI).

AISuperintelligenceTransformer

0 likes · 8 min read

From Early AI to Superintelligence: Challenges and Prospects

Architect's Alchemy Furnace

Sep 16, 2024 · Artificial Intelligence

Why Transformers Revolutionize AI: From Basics to Advanced Applications

This article explains what AI Transformers are, why they matter, their key components and mechanisms, various applications ranging from language processing to bioinformatics, and how they differ from traditional neural networks, providing a comprehensive overview of Transformer architecture and its impact on modern AI research.

AIDeep LearningSelf-Attention

0 likes · 20 min read

Why Transformers Revolutionize AI: From Basics to Advanced Applications

Sohu Tech Products

Sep 11, 2024 · Artificial Intelligence

How RoPE and FlashAttention Empower GLM-4-Plus for Long-Text Mastery

This article explains the core mechanisms of Transformer models, details the Rotational Position Embedding (RoPE) and FlashAttention techniques for handling long sequences, introduces the GLM-4-Plus series, and presents an empirical evaluation on the THUCNews dataset showing its superior long-text performance.

FlashAttentionGLM-4-PlusLong Text

0 likes · 13 min read

How RoPE and FlashAttention Empower GLM-4-Plus for Long-Text Mastery

Baobao Algorithm Notes

Sep 9, 2024 · Artificial Intelligence

How MoSLoRA Reinvents Low‑Rank Adaptation with Mixer Matrices

This article analyzes the Mixture‑of‑Subspaces in Low‑Rank Adaptation (MoSLoRA) paper, explaining its motivation, design choices that replace LoRA's gate with a mixer matrix, connections to multi‑head attention, experimental findings on LLaMA‑3 fine‑tuning, and theoretical proofs of its re‑parameterization properties.

AILoRAMixture of Experts

0 likes · 12 min read

How MoSLoRA Reinvents Low‑Rank Adaptation with Mixer Matrices

Ops Development & AI Practice

Aug 20, 2024 · Artificial Intelligence

How ERobot Redefines No-Code AI Automation with Natural Language

The article examines Hugging Face's ERobot, an AI model that leverages Transformer-based pre‑trained models to execute a wide range of automation tasks through natural‑language commands, discusses its technical foundations, real‑world applications, future prospects, and the challenges it must overcome.

Hugging FaceNo-codeTransformer

0 likes · 8 min read

How ERobot Redefines No-Code AI Automation with Natural Language

21CTO

Aug 11, 2024 · Artificial Intelligence

Demystifying LLMs: How Tokens, Training, and Transformers Power Generative AI

This article explains the fundamentals of large language models, covering tokenization, probability prediction, Markov chain basics, training data limitations, context windows, and the transition to neural network architectures like Transformers, while providing Python examples and insights into model scaling and the illusion of intelligence.

AILLMNeural Networks

0 likes · 18 min read

Demystifying LLMs: How Tokens, Training, and Transformers Power Generative AI

Architect

Aug 11, 2024 · Artificial Intelligence

Understanding Large Language Models: Tokens, Tokenization, and the Evolution from Markov Chains to Transformers

This article explains how generative AI models work by demystifying tokens, tokenization with tools like tiktoken, simple Markov‑chain training, the limitations of small context windows, and how modern LLMs use neural networks, transformers and attention mechanisms to predict the next token.

LLMMarkov chainTransformer

0 likes · 20 min read

Understanding Large Language Models: Tokens, Tokenization, and the Evolution from Markov Chains to Transformers

DaTaobao Tech

Aug 7, 2024 · Artificial Intelligence

Overview of Large Model Development, AIGC Practices, and Prompt Engineering

The article surveys the rapid emergence of large AI models and AIGC, explains core concepts like AI, AGI, and LLMs, details prompt‑engineering techniques such as chain‑of‑thought, outlines a seven‑layer AIGC stack, discusses technical and ethical challenges, and highlights future multimodal and industry‑specific applications.

AIAIGCLLM

0 likes · 25 min read

Overview of Large Model Development, AIGC Practices, and Prompt Engineering