Tagged articles

383 articles

Page 3 of 4

Jul 31, 2024 · Artificial Intelligence

Quantitative Analysis of Transformer Architecture and Llama Model Performance

This engineering‑focused document reviews transformer fundamentals, derives precise FLOP and memory formulas for attention and feed‑forward layers, defines the MFU performance metric, analyzes memory components and parallelism strategies, examines recent architecture variants such as MQA, GQA, sliding‑window attention and MoE, and provides practice problems applying these calculations.

GPU computingTransformerai

0 likes · 30 min read

Quantitative Analysis of Transformer Architecture and Llama Model Performance

Baidu Intelligent Cloud Tech Hub

Jul 25, 2024 · Artificial Intelligence

How Transformers Work: From Tensor Basics to GPU Performance Analysis

This article provides a comprehensive, engineer‑focused breakdown of transformer architecture—including tensor fundamentals, matrix multiplication, GPU theoretical compute, attention and FFN mechanics, quantitative parameter and FLOP analysis, performance metrics like MFU, parallelism strategies, variant optimizations, and practical exercise questions—offering clear insight into large‑model efficiency and scaling.

FFNGPU performanceTransformer

0 likes · 33 min read

How Transformers Work: From Tensor Basics to GPU Performance Analysis

JavaEdge

Jul 22, 2024 · Artificial Intelligence

What Is a Transformer and Why It’s Transforming AI?

This article explains the fundamentals of transformer models, why they outperform earlier neural networks, their core components such as self‑attention and positional encoding, practical use cases across language and biology, and how they differ from RNNs, CNNs, and other architectures.

Deep LearningSelf-AttentionSequence-to-Sequence

0 likes · 20 min read

What Is a Transformer and Why It’s Transforming AI?

Practical DevOps Architecture

Jun 28, 2024 · Artificial Intelligence

Large Model (LLM) Training Curriculum – Weekly Topics and Resources

This article outlines a five‑week large‑model training curriculum, detailing weekly topics such as transformer fundamentals, encoder‑decoder architectures, self‑attention, LoRA fine‑tuning, and quantization, along with associated video lectures and PDF slide decks for developers.

LLMLoRATransformer

0 likes · 3 min read

Large Model (LLM) Training Curriculum – Weekly Topics and Resources

JD Cloud Developers

Jun 25, 2024 · Artificial Intelligence

Why Do Large Language Models Output Text Word‑by‑Word? Inside the Transformer Mechanics

This article explains the fundamental architecture of large language models, from the dual file nature of parameters and code, through neural network basics, perceptrons, and weight training, to the Transformer’s tokenization, positional encoding, self‑attention, and inference processes, illustrated with diagrams and examples.

Neural NetworkSelf-AttentionTransformer

0 likes · 22 min read

Why Do Large Language Models Output Text Word‑by‑Word? Inside the Transformer Mechanics

JD Tech Talk

Jun 25, 2024 · Artificial Intelligence

Understanding Large Language Models: From Parameters to Transformer Architecture

This article explains the fundamental concepts behind large language models, including their two-file structure, training process, neural network basics, perceptron examples, weight and threshold calculations, the TensorFlow Playground, and a detailed walkthrough of the Transformer architecture with tokenization, positional encoding, self‑attention, normalization, and feed‑forward layers.

Large Language ModelsNeural NetworksSelf-Attention

0 likes · 20 min read

Understanding Large Language Models: From Parameters to Transformer Architecture

Ops Development & AI Practice

Jun 22, 2024 · Artificial Intelligence

Why Transformers Revolutionized AI: From NLP to Vision and Speech

Transformers, introduced in 2017, have reshaped neural networks by leveraging attention mechanisms to outperform RNNs and CNNs across NLP, computer vision, and speech tasks, offering parallel processing, long‑range dependency capture, and versatile applications such as translation, text generation, image classification, and speech recognition.

Attention MechanismComputer VisionDeep Learning

0 likes · 6 min read

Why Transformers Revolutionized AI: From NLP to Vision and Speech

Continuous Delivery 2.0

Jun 18, 2024 · Artificial Intelligence

Google's ML‑Enhanced Code Completion Improves Developer Productivity

Google's research demonstrates that integrating a transformer‑based machine‑learning model with a rule‑based semantic engine for code completion reduces developers' coding iteration time by 6%, increases accepted suggestions to 25‑34%, and completes over 3% of code, highlighting significant productivity gains across multiple programming languages.

IDETransformercode completion

0 likes · 6 min read

Google's ML‑Enhanced Code Completion Improves Developer Productivity

Rare Earth Juejin Tech Community

Jun 12, 2024 · Artificial Intelligence

A Simple Introduction to the Transformer Model

This article provides a comprehensive, beginner-friendly explanation of the Transformer architecture, covering its encoder‑decoder structure, self‑attention, multi‑head attention, positional encoding, residual connections, decoding process, final linear and softmax layers, and training considerations, illustrated with numerous diagrams and code snippets.

Deep LearningNeural NetworksSelf-Attention

0 likes · 24 min read

A Simple Introduction to the Transformer Model

JD Tech

Jun 7, 2024 · Artificial Intelligence

Understanding Attention Mechanisms, Self‑Attention, and Multi‑Head Attention in Transformers

This article explains the fundamentals of attention mechanisms, including biological inspiration, the evolution from early visual attention to modern self‑attention in Transformers, details the scaled dot‑product calculations, positional encoding, and multi‑head attention, illustrating how these concepts enable efficient parallel processing of sequence data.

Positional EncodingSelf-AttentionTransformer

0 likes · 12 min read

Understanding Attention Mechanisms, Self‑Attention, and Multi‑Head Attention in Transformers

Alibaba Cloud Big Data AI Platform

May 30, 2024 · Artificial Intelligence

How Pathformer Redefines Multi-Scale Time Series Forecasting with Adaptive Pathways

Pathformer, a new multi‑scale Transformer model introduced by Alibaba Cloud’s big‑data team and East China Normal University, leverages adaptive pathways to jointly model time resolution and time distance, achieving state‑of‑the‑art forecasting performance and strong generalization across cloud resource workloads and public datasets.

Multi-ScaleTransformeradaptive pathways

0 likes · 7 min read

How Pathformer Redefines Multi-Scale Time Series Forecasting with Adaptive Pathways

Architect's Guide

May 13, 2024 · Artificial Intelligence

Understanding the Core Principles of Transformer Architecture

This article explains how Transformer models work by detailing the encoder‑decoder structure, self‑attention, multi‑head attention, positional encoding, and feed‑forward networks, and shows their applications in machine translation, recommendation systems, and large language models.

Attention MechanismDeep LearningTransformer

0 likes · 11 min read

Understanding the Core Principles of Transformer Architecture

NewBeeNLP

May 9, 2024 · Artificial Intelligence

How UniSAR Unifies Search and Recommendation with Fine‑Grained User Behavior Modeling

This article summarizes the UniSAR framework, which models four types of fine‑grained user transitions between search and recommendation, demonstrates its effectiveness on public datasets, and shows how joint learning improves both search relevance and recommendation quality.

Cross-AttentionSearchTransformer

0 likes · 4 min read

How UniSAR Unifies Search and Recommendation with Fine‑Grained User Behavior Modeling

Baobao Algorithm Notes

May 5, 2024 · Artificial Intelligence

Deep Dive into Transformer Mechanics: Scaling, Q/K Projections, FFNs, and More

This article provides concise technical explanations for 25 common questions about Transformer models, covering scaled dot‑product attention scaling, separate Q/K projections, feed‑forward network design, attention variants, normalization, LoRA versus full‑parameter training, KV‑cache, pre‑ and post‑norm, computational cost analysis, and advanced position‑encoding techniques.

LLMLoRATransformer

0 likes · 25 min read

Deep Dive into Transformer Mechanics: Scaling, Q/K Projections, FFNs, and More

Alipay Experience Technology

Apr 28, 2024 · Artificial Intelligence

Beyond Sora: Exploring Cutting-Edge Video Reconstruction Techniques

This article surveys recent advances in video reconstruction sparked by OpenAI's Sora, examines the technical challenges of unified latent representations, long‑sequence consistency, and variable resolution, and reviews a range of transformer‑based, diffusion, and masked‑generation models together with their code implementations and future research roadmaps.

GenerationLatent SpaceTransformer

0 likes · 35 min read

Beyond Sora: Exploring Cutting-Edge Video Reconstruction Techniques

ITPUB

Apr 20, 2024 · Artificial Intelligence

Unveiling GPT-4’s Magic: How Large Language Models Learn, Reason, and Translate – A Kid‑Friendly Story

This article uses a playful dialogue to demystify how large language models like GPT‑4 work, covering data collection, vectorization, the transformer’s attention mechanism, position encoding, training stages, multilingual translation, reasoning puzzles, and alignment, all illustrated through the tale of a curious learner named Wuming.

Artificial IntelligenceAttention MechanismTransformer

0 likes · 50 min read

Unveiling GPT-4’s Magic: How Large Language Models Learn, Reason, and Translate – A Kid‑Friendly Story

Top Architect

Apr 18, 2024 · Artificial Intelligence

Understanding Transformers: Architecture, Attention Mechanism, Training and Inference

This article provides a comprehensive overview of Transformer models, covering their attention-based architecture, encoder-decoder structure, training procedures including teacher forcing, inference workflow, advantages over RNNs, and various applications in natural language processing such as translation, summarization, and classification.

Attention MechanismDeep LearningInference

0 likes · 11 min read

Understanding Transformers: Architecture, Attention Mechanism, Training and Inference

21CTO

Apr 17, 2024 · Artificial Intelligence

How Sora Generates High‑Quality Text‑to‑Video: A Deep Dive into Its Architecture

This article breaks down OpenAI's Sora text‑to‑video model, exploring its overall structure, visual encoder‑decoder, Spacetime Latent Patch, transformer‑based diffusion, long‑time consistency strategies, training techniques, and the technical choices that enable variable resolution, aspect ratios, and up to 60‑second video generation.

AI video generationLatent DiffusionSora

0 likes · 50 min read

Architect

Apr 16, 2024 · Artificial Intelligence

Unraveling Sora: How OpenAI Might Build a 60‑Second Video Generator

This article dissects the possible architecture of OpenAI's Sora video model, tracing its visual encoder‑decoder, Spacetime Latent Patch, transformer‑based diffusion backbone, long‑time consistency strategies, and training pipeline, while comparing alternatives such as MAGVIT‑v2, TECO, NaViT, and FDM to reveal why each design choice may have been made.

AI ArchitectureLatent DiffusionSora

0 likes · 51 min read

Unraveling Sora: How OpenAI Might Build a 60‑Second Video Generator

AI Algorithm Path

Apr 5, 2024 · Artificial Intelligence

Master CNN, RNN, GAN, and Transformer Architectures in One Guide

This article provides a friendly, step‑by‑step overview of five core deep‑learning architectures—CNN, RNN, GAN, Transformers, and encoder‑decoder—explaining their structures, key components, and typical use cases in image and natural‑language processing.

CNNDeep LearningEncoder-Decoder

0 likes · 12 min read

Architect

Mar 28, 2024 · Artificial Intelligence

Understanding OpenAI's Sora Video Generation Model: Architecture, Workflow, and Core Technologies

This article explains OpenAI's Sora video generation model, detailing its latent diffusion foundation, video compression network, spacetime patch representation, Diffusion Transformer processing, and decoding pipeline, while also reviewing related Stable Diffusion and Transformer concepts that enable high‑quality text‑to‑video synthesis.

Deep LearningLatent DiffusionSora

0 likes · 17 min read

Understanding OpenAI's Sora Video Generation Model: Architecture, Workflow, and Core Technologies

DevOps

Mar 26, 2024 · Artificial Intelligence

OpenAI’s Sora: A One‑Minute Text‑to‑Video Diffusion Transformer Model

OpenAI’s newly released Sora model demonstrates one‑minute text‑to‑video generation using a diffusion‑based transformer architecture that operates on spatiotemporal patches, compresses visual data into latent codes, and builds on a wide range of prior video generation research, while the article also advertises a DevOps certification program.

OpenAISoraTransformer

0 likes · 8 min read

OpenAI’s Sora: A One‑Minute Text‑to‑Video Diffusion Transformer Model

Architect

Mar 26, 2024 · Artificial Intelligence

Why Transformers Outperform RNNs: A Deep Dive into Architecture and Training

This article explains the Transformer model’s core architecture, self‑attention mechanism, encoder‑decoder workflow, training with teacher forcing, inference steps, and why it surpasses RNNs and CNNs, while also outlining its major NLP applications.

Attention MechanismInferenceModel Training

0 likes · 14 min read

Why Transformers Outperform RNNs: A Deep Dive into Architecture and Training

NewBeeNLP

Mar 22, 2024 · Artificial Intelligence

Unraveling Sora: How OpenAI Might Build Its Text‑to‑Video Engine

This article provides a step‑by‑step technical analysis of OpenAI’s Sora model, examining its possible overall architecture, video encoder‑decoder design, Spacetime Latent Patch mechanism, transformer‑based diffusion process, training strategies, and long‑term consistency techniques, while grounding each speculation in publicly available reports and related research.

AI analysisSoraTransformer

0 likes · 50 min read

Unraveling Sora: How OpenAI Might Build Its Text‑to‑Video Engine

DataFunTalk

Mar 21, 2024 · Artificial Intelligence

A Detailed Technical Analysis of Sora: Architecture, Key Components, and Potential Implementation

This article provides a comprehensive, easy‑to‑understand breakdown of Sora’s possible architecture—including its visual encoder‑decoder, Spacetime Latent Patch, transformer‑based diffusion model, long‑time consistency strategies, training techniques, and how it supports variable resolution and duration video generation.

AI ArchitectureSoraSpacetime Patch

0 likes · 49 min read

A Detailed Technical Analysis of Sora: Architecture, Key Components, and Potential Implementation

TAL Education Technology

Mar 20, 2024 · Artificial Intelligence

Understanding AI: From Brain Differences to Data Science Practices and Large Model Applications

This article explains why current AI cannot achieve self‑awareness, outlines data‑science steps for large models—including preprocessing, exploratory analysis, modeling, and evaluation—then surveys general and vertical applications of large language models and details a complete machine‑learning workflow with transformer fine‑tuning techniques.

ApplicationsData ScienceFine-tuning

0 likes · 14 min read

Understanding AI: From Brain Differences to Data Science Practices and Large Model Applications

Architect

Mar 19, 2024 · Artificial Intelligence

How Transformers Power Modern NLP: A Deep Dive into Encoder‑Decoder Mechanics

This article explains the core principles of Transformer models—covering input embeddings, self‑attention, multi‑head attention, positional encoding, feed‑forward networks, and decoder strategies—using concrete examples like "The cat sat on the mat" and "The quick brown fox jumps over the lazy dog" to illustrate each step.

Encoder-DecoderFeed-Forward NetworkNLP

0 likes · 13 min read

How Transformers Power Modern NLP: A Deep Dive into Encoder‑Decoder Mechanics

Ops Development & AI Practice

Mar 17, 2024 · Artificial Intelligence

Why the Transformer Model Revolutionized AI and How It Works

This article explains the Transformer architecture, its self‑attention mechanism, encoder‑decoder design, and the profound impact it has had on natural language processing, computer vision, and large‑scale language models like GPT.

AI ArchitectureDeep LearningNLP

0 likes · 6 min read

Why the Transformer Model Revolutionized AI and How It Works

Alibaba Cloud Big Data AI Platform

Mar 15, 2024 · Artificial Intelligence

Why Arithmetic Feature Interaction Is Key to Deep Tabular Learning

Researchers from Alibaba Cloud AI and Zhejiang University present AMFormer, a Transformer‑based model that incorporates arithmetic feature interaction, demonstrating superior fine‑grained modeling, sample efficiency, and generalization on synthetic and real‑world tabular datasets, establishing a new state‑of‑the‑art in deep tabular learning.

AMFormerDeep LearningTransformer

0 likes · 12 min read

Why Arithmetic Feature Interaction Is Key to Deep Tabular Learning

DeWu Technology

Mar 13, 2024 · Artificial Intelligence

Extending Context Length in LLaMA Models: Structures, Challenges, and Techniques

The article reviews LLaMA’s Transformer and RoPE architecture, explains why its context windows (4K‑128K tokens) are limited, and evaluates industry‑proven extension techniques—including linear, NTK‑aware, and YaRN interpolation plus LongLoRA sparse attention—while addressing memory and quadratic‑cost challenges and presenting a KubeAI workflow for fine‑tuning and deployment.

LLaMALongLoRARoPE

0 likes · 17 min read

Extending Context Length in LLaMA Models: Structures, Challenges, and Techniques

NetEase Smart Enterprise Tech+

Mar 12, 2024 · Artificial Intelligence

How Advanced Video AI Transforms Content Moderation and Retrieval

This article explores how modern video AI techniques—ranging from transformer‑based classification to semi‑supervised retrieval and token‑halting acceleration—enable efficient, accurate detection of prohibited content and fast, scalable video search in the era of short‑form media.

AI moderationSemi-supervised LearningTransformer

0 likes · 18 min read

How Advanced Video AI Transforms Content Moderation and Retrieval

DeWu Technology

Mar 11, 2024 · Artificial Intelligence

Understanding OpenAI's Sora Video Generation Model: Diffusion, Transformers, and Latent Space

OpenAI's Sora video generation model uses latent diffusion, a video compression encoder-decoder, tokenizes spatio-temporal patches, processes them with a diffusion‑trained Transformer conditioned on DALL·E‑style text annotations, then decodes to high‑resolution videos up to a minute long.

Latent DiffusionSoraTransformer

0 likes · 18 min read

Understanding OpenAI's Sora Video Generation Model: Diffusion, Transformers, and Latent Space

NewBeeNLP

Mar 7, 2024 · Artificial Intelligence

How Sora is Redefining Large Vision Models: A Deep Dive into Technology, Limits, and Opportunities

This comprehensive review examines Sora, the first model capable of generating minute‑long, high‑quality videos from text, covering its historical background, core diffusion‑Transformer architecture, data preprocessing strategies, prompt engineering techniques, diverse applications, and the ethical and technical limitations that shape its future.

Multimodal AIPrompt engineeringSora

0 likes · 28 min read

How Sora is Redefining Large Vision Models: A Deep Dive into Technology, Limits, and Opportunities

Sohu Tech Products

Mar 6, 2024 · Artificial Intelligence

Analysis of OpenAI Sora: Data Engineering, Network Architecture, and World Model Implications

OpenAI’s Sora video model unifies image and video data into latent spacetime patches via a VAE, trains on original resolutions with GPT‑4‑expanded captions, employs a Diffusion Transformer backbone for patch‑wise denoising, and demonstrates 3D‑consistent, long‑term world‑model capabilities that hint at a unified computer‑vision paradigm and steps toward AGI.

AI researchOpenAI SoraTransformer

0 likes · 9 min read

Analysis of OpenAI Sora: Data Engineering, Network Architecture, and World Model Implications

Architects' Tech Alliance

Feb 25, 2024 · Artificial Intelligence

How Sora Redefined Video Generation: Breakthroughs and Industry Impact

The article provides an in‑depth technical analysis of OpenAI's Sora, highlighting its 60‑second 1080p video generation capability, the novel patches‑vectorization and transformer training pipeline that leverages GPT‑generated prompts for multimodal alignment, and its potential to become a universal video‑generation base model that could reshape the AI industry.

AGIMultimodal AISora

0 likes · 6 min read

How Sora Redefined Video Generation: Breakthroughs and Industry Impact

Rare Earth Juejin Tech Community

Feb 23, 2024 · Artificial Intelligence

Google’s Open‑Source Gemma Large Language Model: Architecture, Performance, and Community Reception

Google has released the open‑source Gemma LLM series (2B and 7B parameters) built on Gemini‑style architecture, offering free, commercial‑ready models that run on notebooks, support JAX/PyTorch/TensorFlow, outperform many open‑source peers, and have quickly sparked extensive community testing and discussion.

Artificial IntelligenceGemmaGoogle

0 likes · 5 min read

Architect

Feb 22, 2024 · Artificial Intelligence

Sora: OpenAI’s Text‑to‑Video Model – Principles, Impact, and Outlook

The article provides a comprehensive technical overview of OpenAI’s Sora text‑to‑video model, explaining its background, underlying diffusion‑Transformer architecture, key breakthroughs, potential industry impacts, success factors, limitations, and future prospects for AI‑generated video content.

Diffusion ModelsOpenAISora

0 likes · 15 min read

Sora: OpenAI’s Text‑to‑Video Model – Principles, Impact, and Outlook

CSS Magic

Feb 20, 2024 · Artificial Intelligence

OpenAI’s Sora Video Model Is Hyped—But Here Are the Flaws OpenAI Itself Acknowledges

The article walks through OpenAI’s own admission of Sora’s shortcomings—such as unrealistic physics, misplaced spatial details, and erratic object behavior—by showcasing concrete demo failures, additional observations, and technical notes about its diffusion‑based, transformer architecture and metadata embedding.

AI limitationsOpenAISora

0 likes · 7 min read

OpenAI’s Sora Video Model Is Hyped—But Here Are the Flaws OpenAI Itself Acknowledges

21CTO

Feb 17, 2024 · Artificial Intelligence

How OpenAI’s Sora Is Pushing Video Generation to New Frontiers

OpenAI’s Sora model demonstrates large‑scale text‑conditional video generation using a diffusion transformer that operates on spatiotemporal patches, supporting variable durations, resolutions, and aspect ratios while showcasing emergent simulation abilities, flexible sampling, and multimodal editing capabilities, though it still has notable limitations.

AI researchDiffusion ModelsMultimodal

0 likes · 19 min read

How OpenAI’s Sora Is Pushing Video Generation to New Frontiers

Architect

Feb 16, 2024 · Artificial Intelligence

Can OpenAI’s Sora Redefine Text‑to‑Video Generation? An In‑Depth Technical Review

OpenAI’s newly unveiled Sora model transforms short text prompts into up‑to‑one‑minute high‑definition videos, showcasing advanced diffusion‑Transformer architecture, improved occlusion handling, and detailed visual fidelity, while the article examines its technical breakthroughs, compares it to earlier models, and discusses emerging safety and misuse concerns.

AI SafetyDiffusion ModelsOpenAI

0 likes · 12 min read

Can OpenAI’s Sora Redefine Text‑to‑Video Generation? An In‑Depth Technical Review

Huawei Cloud Developer Alliance

Dec 29, 2023 · Artificial Intelligence

Unlocking LLaMA2: Key Architecture Insights and Deployment Tricks

This recap of the MindSpore public course reviews LLaMA2 fundamentals, compares its Transformer structure, details upgrades from LLaMA1, explains core components like RMSNorm, RoPE, KV‑Cache, Grouped Multi‑Query Attention and SwiGLU, outlines industry LLM optimization methods, and previews the upcoming lecture on the Pengcheng Brain 200B model.

Llama2MindSporeTransformer

0 likes · 5 min read

Unlocking LLaMA2: Key Architecture Insights and Deployment Tricks

Rare Earth Juejin Tech Community

Dec 20, 2023 · Artificial Intelligence

BERT Model Overview: Inputs, Encoder, Fine‑tuning, and Variants

This article explains BERT's WordPiece tokenization, input embeddings (token, segment, and position embeddings), encoder architecture for Base and Large models, fine‑tuning strategies for various NLP tasks, and introduces popular variants such as RoBERTa and ALBERT.

BERTNLPTransformer

0 likes · 12 min read

BERT Model Overview: Inputs, Encoder, Fine‑tuning, and Variants

Rare Earth Juejin Tech Community

Dec 8, 2023 · Artificial Intelligence

Simplifying Transformer Blocks: Removing Residual Connections, LayerNorm, and Other Components without Losing Performance

A recent ETH Zurich paper shows that standard Transformer blocks can be drastically simplified by removing residual connections, LayerNorm, projection and value parameters, and even MLP sub‑block components, achieving up to 16% fewer parameters and comparable training speed and downstream performance on both GPT‑style decoders and BERT models.

Deep LearningLLMTransformer

0 likes · 11 min read

Simplifying Transformer Blocks: Removing Residual Connections, LayerNorm, and Other Components without Losing Performance

Amap Tech

Dec 4, 2023 · Artificial Intelligence

End-to-End BEV+Transformer Perception and Modeling for High-Definition Map Production

By fusing LiDAR point clouds and camera images into a unified bird‑eye‑view space and applying Transformer‑based perception, multi‑sensor fusion, and graph‑diffusion modeling, the proposed BEV+Transformer framework automatically detects and smooths ground‑level line features and signs for high‑definition maps with centimeter‑level accuracy, boosting production efficiency and reducing cost.

BEVHD mapSensor Fusion

0 likes · 20 min read

End-to-End BEV+Transformer Perception and Modeling for High-Definition Map Production

Rare Earth Juejin Tech Community

Dec 4, 2023 · Artificial Intelligence

An Overview of BERT: Architecture, Pre‑training Tasks, Comparisons, and Applications

This article provides a comprehensive English overview of BERT, covering its original paper, model architecture, pre‑training objectives (Masked Language Model and Next Sentence Prediction), differences from ELMo, GPT and vanilla Transformers, parameter counts, main contributions, and a range of NLP application scenarios such as text classification, sentiment analysis, NER, and machine translation.

BERTNLPNext Sentence Prediction

0 likes · 16 min read

An Overview of BERT: Architecture, Pre‑training Tasks, Comparisons, and Applications

Rare Earth Juejin Tech Community

Nov 26, 2023 · Artificial Intelligence

Overview of T5 (Text-to-Text Transfer Transformer): Architecture, Variants, Experiments, and Applications

This article provides a comprehensive overview of Google's T5 model, detailing its unified text‑to‑text formulation, encoder‑decoder architecture, three model variants, attention mask designs, training strategies, model sizes, experimental results, and key contributions to natural language processing.

Artificial IntelligenceNLPT5

0 likes · 14 min read

Overview of T5 (Text-to-Text Transfer Transformer): Architecture, Variants, Experiments, and Applications

DataFunSummit

Nov 20, 2023 · Artificial Intelligence

Personalized Title Generation and Automatic Cover Image Synthesis for Content Feeds

This article presents a comprehensive overview of personalized title generation—covering keyword‑based, click‑sequence‑based, and author‑style‑based methods using transformer and LSTM models—and describes an end‑to‑end pipeline for automatic cover image synthesis that combines image restoration, Seq2Seq key‑phrase extraction, object detection, and layout generation to improve user engagement in information‑flow scenarios.

NLPTransformerai

0 likes · 12 min read

Personalized Title Generation and Automatic Cover Image Synthesis for Content Feeds

Rare Earth Juejin Tech Community

Nov 15, 2023 · Artificial Intelligence

Understanding the Transformer Architecture: Encoder, Decoder, and Attention Mechanisms

This article explains the Transformer model, comparing it with RNNs, detailing its encoder‑decoder structure, multi‑head and scaled dot‑product attention, embedding layers, feed‑forward networks, and the final linear‑softmax output, supplemented with diagrams and code examples.

Artificial IntelligenceDeep LearningEncoder-Decoder

0 likes · 10 min read

Understanding the Transformer Architecture: Encoder, Decoder, and Attention Mechanisms

Rare Earth Juejin Tech Community

Nov 12, 2023 · Artificial Intelligence

A Comprehensive Introduction to RNN, LSTM, Attention Mechanisms, and Transformers for Large Language Models

This article provides a thorough overview of large language models, explaining the relationship between NLP and LLMs, the evolution from RNN to LSTM, the fundamentals of attention mechanisms, and the architecture and operation of Transformer models, all illustrated with clear examples and diagrams.

Artificial IntelligenceLSTMNLP

0 likes · 25 min read

A Comprehensive Introduction to RNN, LSTM, Attention Mechanisms, and Transformers for Large Language Models

Baidu Tech Salon

Nov 10, 2023 · Artificial Intelligence

Baidu Search Deep Learning Model Architecture and Optimization Practices

Baidu's Search Architecture team details how its deep‑learning models have evolved to deliver direct answer results via semantic embeddings, describes a massive online inference pipeline that rewrites queries, ranks relevance, and classifies types, and outlines optimization techniques—including data I/O, CPU/GPU balancing, pruning, quantization, and distillation—to achieve high‑throughput, low‑latency search.

BaiduGPU OptimizationInference System

0 likes · 13 min read

Baidu Search Deep Learning Model Architecture and Optimization Practices

NetEase Media Technology Team

Nov 6, 2023 · Artificial Intelligence

Overview of Sequential Recommendation Models

The article surveys sequential recommendation models from early non-deep approaches like FPMC, through RNN-based GRU4Rec and CNN-based Caser, to Transformer-based methods such as SASRec, BERT4Rec, TiSASRec, and recent contrastive-learning techniques, recommending SASRec or its variants for production use.

Deep LearningTransformercontrastive learning

0 likes · 17 min read

Overview of Sequential Recommendation Models

DaTaobao Tech

Oct 25, 2023 · Artificial Intelligence

Prompt Engineering, LLM Supervised Fine‑Tuning, and Mobile Tmall AI Assistant Application

The article explains prompt engineering techniques, supervised fine‑tuning of large language models, and their practical deployment in the Mobile Tmall AI shopping assistant, detailing ChatGPT’s generation steps, Transformer architecture, prompt clarity, delimiters, role‑play, few‑shot and chain‑of‑thought prompting, SFT versus pre‑training, LoRA adapters, data collection, Qwen‑14B training configuration, SDK‑based inference, and comprehensive evaluation.

AI AssistantLLM fine-tuningModel Deployment

0 likes · 14 min read

Prompt Engineering, LLM Supervised Fine‑Tuning, and Mobile Tmall AI Assistant Application

Rare Earth Juejin Tech Community

Oct 21, 2023 · Artificial Intelligence

Understanding LSTM, ELMO, and Transformer Models for Natural Language Processing

This article explains the principles and structures of LSTM networks, introduces the ELMO contextual embedding model with its two‑stage pre‑training and downstream usage, and provides an overview of the Transformer architecture, highlighting their roles in modern NLP tasks.

Deep LearningELMoLSTM

0 likes · 12 min read

Understanding LSTM, ELMO, and Transformer Models for Natural Language Processing

DataFunTalk

Oct 16, 2023 · Artificial Intelligence

Personalized Title Generation and Automatic Cover Image Synthesis for Information‑Flow Scenarios

This article presents technical approaches for generating personalized article titles and automatically synthesizing cover images, covering keyword‑based, click‑sequence‑based, and author‑style‑based title models, as well as image restoration, key‑information extraction, object detection, and layout generation techniques to improve user engagement in recommendation and search systems.

AI recommendationLSTMTransformer

0 likes · 11 min read

Personalized Title Generation and Automatic Cover Image Synthesis for Information‑Flow Scenarios

Alibaba Cloud Big Data AI Platform

Oct 7, 2023 · Artificial Intelligence

How Alibaba Cloud’s New Transformers and Model Fingerprinting Are Shaping ICCV 2023

Alibaba Cloud’s PAI platform showcased three breakthrough papers at ICCV 2023—including the Scale‑Aware Modulation Transformer for efficient vision backbones, the Stable‑DINO detection transformer with improved matching, and a non‑invasive fingerprinting method for deep image‑restoration models—highlighting its growing impact in AI research.

Image RestorationModel FingerprintingTransformer

0 likes · 10 min read

How Alibaba Cloud’s New Transformers and Model Fingerprinting Are Shaping ICCV 2023

Zhuanzhuan Tech

Sep 28, 2023 · Artificial Intelligence

Evolution of Language Models and an Overview of the GPT Series

This article surveys the development of natural language processing from early rule‑based systems through statistical n‑gram models, neural language models, RNNs, LSTMs, ELMo, Transformers and BERT, and then details the architecture, training methods, advantages and limitations of the GPT‑1, GPT‑2, GPT‑3, ChatGPT and GPT‑4 models, concluding with a discussion of future challenges and references.

Artificial IntelligenceDeep LearningGPT

0 likes · 30 min read

Evolution of Language Models and an Overview of the GPT Series

Kuaishou Large Model

Sep 27, 2023 · Artificial Intelligence

DVIS: Decoupled Framework that Sets New SOTA in Video Instance Segmentation

DVIS introduces a decoupled video instance segmentation framework that splits the task into segmentation, tracking, and refinement modules, achieving state-of-the-art performance across VIS, VPS, and VSS benchmarks while maintaining low computational overhead, and demonstrates robustness in both online and offline settings.

Computer VisionDeep LearningTransformer

0 likes · 12 min read

DVIS: Decoupled Framework that Sets New SOTA in Video Instance Segmentation

DaTaobao Tech

Sep 27, 2023 · Artificial Intelligence

FlashAttention-2: Efficient Attention Algorithm for Transformer Acceleration and AIGC Applications

FlashAttention‑2 is an IO‑aware exact attention algorithm that cuts GPU HBM traffic through tiling and recomputation, optimizes non‑matmul FLOPs, expands sequence‑parallelism and warp‑level work distribution, delivering up to 2× speedup over FlashAttention, near‑GEMM efficiency, and enabling longer‑context Transformer training and inference for AIGC with fastunet and negligible accuracy loss.

AIGCAttention optimizationDeep Learning

0 likes · 20 min read

FlashAttention-2: Efficient Attention Algorithm for Transformer Acceleration and AIGC Applications

MaGe Linux Operations

Sep 25, 2023 · Artificial Intelligence

How ChatGPT Works: Inside the Neural Network That Generates Human‑Like Text

Stephen Wolfram explains the inner workings of ChatGPT, covering its transformer architecture, probability‑based word selection, training on massive text corpora, the role of embeddings, neural network layers, attention mechanisms, and the challenges of modeling language, offering a deep technical overview for AI enthusiasts.

ChatGPTNeural NetworksTransformer

0 likes · 80 min read

How ChatGPT Works: Inside the Neural Network That Generates Human‑Like Text

Ant R&D Efficiency

Sep 19, 2023 · Artificial Intelligence

From the Turing Test to GPT‑4: A Historical Overview of Chatbots and Deep Learning

From Turing’s 1950 imitation game to GPT‑4’s multimodal vision‑language capabilities, the field has evolved from simple rule‑based programs like ELIZA and PARRY, through statistical learning and the 2017 Transformer breakthrough, to large-scale generative models that achieve fluent conversation yet still grapple with hallucination and true understanding.

Artificial IntelligenceChatbot HistoryDeep Learning

0 likes · 25 min read

From the Turing Test to GPT‑4: A Historical Overview of Chatbots and Deep Learning

Alipay Experience Technology

Sep 12, 2023 · Artificial Intelligence

Demystifying ChatGPT: From Transformer Basics to Business Applications

This article offers a non‑algorithmic engineer’s clear overview of large language models, explaining ChatGPT’s generative‑pre‑training‑transformer foundation, core mechanisms like attention, practical prompt‑engineering tips, and how enterprises can integrate LLMs into data analysis, smart‑customer service, and other business workflows while noting associated risks.

AI applicationsChatGPTPrompt engineering

0 likes · 28 min read

Demystifying ChatGPT: From Transformer Basics to Business Applications

Open Source Linux

Sep 8, 2023 · Artificial Intelligence

How ChatGPT Works: Inside the Neural Network That Generates Human‑Like Text

This article explains the inner workings of ChatGPT, covering how large language models predict the next token using probability distributions, the role of embeddings, the transformer architecture with attention heads, training methods, loss functions, and why such a massive neural network can produce coherent, human‑like language.

ChatGPTLanguage ModelNeural Networks

0 likes · 79 min read

NetEase Cloud Music Tech Team

Sep 6, 2023 · Artificial Intelligence

Timbre‑Guided TG‑Critic and Transformer‑Based TrOMR: AI Advances in Music Evaluation

This article reviews two recent AI research papers from NetEase Cloud Music Lab: TG‑Critic, a timbre‑guided, reference‑free singing evaluation model that classifies vocal performance using only audio, and TrOMR, a Transformer‑based end‑to‑end polyphonic optical music recognition system that improves note‑sequence prediction and dataset realism.

Audio AnalysisDeep LearningMusic Evaluation

0 likes · 6 min read

Timbre‑Guided TG‑Critic and Transformer‑Based TrOMR: AI Advances in Music Evaluation

Alibaba Cloud Developer

Sep 4, 2023 · Artificial Intelligence

Hands‑On Building a Transformer from Scratch with PyTorch

This tutorial walks you through implementing a full Transformer model in PyTorch, starting from basic linear‑regression code, adding attention mechanisms, multi‑head attention, encoder‑decoder architecture, training loops, and inference, all reinforced with practical debugging tips.

Deep LearningNLPPyTorch

0 likes · 17 min read

Hands‑On Building a Transformer from Scratch with PyTorch

php Courses

Aug 26, 2023 · Artificial Intelligence

Understanding Generative AI: Concepts, Common Models, and Development Guide

Generative AI, a branch of artificial intelligence that creates novel content such as text, images, and music, works by learning patterns from training data, with common models including GANs, VAEs, autoregressive and Transformer-based architectures, and its development involves task definition, data preparation, model design, training, evaluation, and ethical considerations.

Artificial IntelligenceGANModel Development

0 likes · 8 min read

Understanding Generative AI: Concepts, Common Models, and Development Guide

Network Intelligence Research Center (NIRC)

Aug 22, 2023 · Artificial Intelligence

LONGNET: Extending Transformers to Over 1 Billion Tokens

LONGNET introduces dilated attention to enable Transformers to process sequences exceeding one billion tokens with linear computational cost, preserving performance on shorter inputs and demonstrating strong results on long‑sequence modeling and standard language tasks.

Dilated AttentionLONGNETLanguage Modeling

0 likes · 6 min read

LONGNET: Extending Transformers to Over 1 Billion Tokens

Network Intelligence Research Center (NIRC)

Aug 19, 2023 · Artificial Intelligence

Detecting Time‑Series Anomalies with the Anomaly Transformer’s Association Discrepancy

The article explains how the Anomaly Transformer leverages prior‑ and series‑association discrepancies, a learnable Gaussian kernel, and a Minimax training strategy to distinguish normal from abnormal points in time‑series data, achieving state‑of‑the‑art results on five benchmark datasets.

Association DiscrepancyMinimax TrainingSOTA

0 likes · 6 min read

Detecting Time‑Series Anomalies with the Anomaly Transformer’s Association Discrepancy

Model Perspective

Jul 31, 2023 · Artificial Intelligence

From RNN to ChatGPT: How AIGC Evolved with Transformers and Large Models

This article traces the evolution of AI‑generated content (AIGC) from early RNN‑based Seq2Seq models through the transformative impact of the Transformer architecture, covering key milestones such as UniLM, T5, BART, the GPT series, InstructGPT, and the emergence of ChatGPT.

AI content generationAIGCGPT

0 likes · 9 min read

From RNN to ChatGPT: How AIGC Evolved with Transformers and Large Models

Rare Earth Juejin Tech Community

Jul 31, 2023 · Artificial Intelligence

Overview of Deep Neural Network Architectures

This article provides a comprehensive overview of deep neural network families, introducing twelve major architectures—including Feedforward, CNN, RNN, LSTM, DBN, GAN, Autoencoder, Residual, Capsule, Transformer, Attention, and Deep Reinforcement Learning—explaining their principles, structures, training methods, and offering Python/TensorFlow/PyTorch code examples.

CNNDeep LearningGAN

0 likes · 29 min read

Overview of Deep Neural Network Architectures

Rare Earth Juejin Tech Community

Jul 30, 2023 · Artificial Intelligence

ChatGPT Technical Analysis Series – Part 2: GPT1, GPT2, and GPT3 (Encoder vs Decoder, Zero‑Shot, and Scaling)

This article reviews the evolution of the GPT family from GPT‑1 to GPT‑3, comparing encoder‑decoder architectures, explaining the shift from supervised fine‑tuning to zero‑shot and few‑shot learning, and highlighting the architectural and training innovations that enabled large‑scale language models.

Fine-tuningGPTLLM

0 likes · 13 min read

ChatGPT Technical Analysis Series – Part 2: GPT1, GPT2, and GPT3 (Encoder vs Decoder, Zero‑Shot, and Scaling)

Network Intelligence Research Center (NIRC)

Jul 29, 2023 · Artificial Intelligence

Getting Started with GPT: How Generative Pre‑Training and Discriminative Fine‑Tuning Work

This article explains GPT's two‑stage learning—unsupervised generative pre‑training on large raw corpora followed by discriminative fine‑tuning on labeled tasks—detailing the underlying Transformer decoder architecture, loss functions, and task‑specific input transformations.

Fine-tuningGPTGenerative Pre‑Training

0 likes · 5 min read

Getting Started with GPT: How Generative Pre‑Training and Discriminative Fine‑Tuning Work

Rare Earth Juejin Tech Community

Jul 27, 2023 · Artificial Intelligence

Implementing Text‑Based Image Search Using OCR, Transformers, and Vector Databases

This article explains how to build a text‑to‑image search system by first extracting text with OCR, then storing image paths and textual embeddings in a SQLite or Milvus vector database, and finally improving retrieval with Transformer‑based sentence embeddings and image‑captioning models.

MilvusOCRPython

0 likes · 16 min read

Implementing Text‑Based Image Search Using OCR, Transformers, and Vector Databases

Sohu Tech Products

Jul 26, 2023 · Artificial Intelligence

Attention Mechanism, Transformer Architecture, and BERT: An In-Depth Overview

This article provides a comprehensive overview of the attention mechanism, its mathematical foundations, the transformer model architecture—including encoder and decoder components—and the BERT pre‑training model, detailing their principles, implementations, and applications in natural language processing.

Attention MechanismBERTEncoder-Decoder

0 likes · 13 min read

Attention Mechanism, Transformer Architecture, and BERT: An In-Depth Overview

AsiaInfo Technology: New Tech Exploration

Jul 19, 2023 · Artificial Intelligence

How ChatGPT Illuminates the Future Evolution of Data Intelligence

The article examines the rise of artificial general intelligence since 2022, analyzes ChatGPT and other multimodal large models, explains the Transformer architecture, discusses multimodal semantic alignment for AGI, and proposes a four‑level data‑intelligence framework—data, information, knowledge, wisdom—offering a roadmap for future development.

AGIArtificial IntelligenceData Intelligence

0 likes · 16 min read

How ChatGPT Illuminates the Future Evolution of Data Intelligence

Nightwalker Tech

Jul 19, 2023 · Artificial Intelligence

Step‑by‑Step Implementation of Transformer Blocks, Attention, Normalization, Feed‑Forward, Encoder and Decoder in PyTorch

This article provides a comprehensive tutorial on building the core components of a Transformer model—including multi‑head attention, layer normalization, feed‑forward networks, encoder and decoder layers—and assembles them into a complete PyTorch implementation, supplemented with explanatory diagrams and runnable code.

DecoderDeep LearningEncoder

0 likes · 13 min read

Step‑by‑Step Implementation of Transformer Blocks, Attention, Normalization, Feed‑Forward, Encoder and Decoder in PyTorch

Nightwalker Tech

Jul 18, 2023 · Artificial Intelligence

Implementing the Input Processing Layer of a Transformer Model: Tokenization, Embedding, and Positional Encoding

This article explains how to build the input processing stage of a Transformer—including tokenization with Hugging Face tokenizers, token‑to‑embedding conversion using BERT models, custom BPE tokenizers, and positional encoding—providing complete Python code examples and test results.

BPEEmbeddingPositional Encoding

0 likes · 14 min read

Implementing the Input Processing Layer of a Transformer Model: Tokenization, Embedding, and Positional Encoding

Rare Earth Juejin Tech Community

Jul 12, 2023 · Artificial Intelligence

Comprehensive Guide to Vision Transformer (ViT): Architecture, Patch Tokenization, Embedding, Fine‑tuning, and Performance

This article provides an in‑depth, English‑language overview of Vision Transformer (ViT), covering its Transformer‑based architecture, patch‑to‑token conversion, token and position embeddings, fine‑tuning strategies such as 2‑D interpolation, experimental results versus CNNs, and the model’s broader significance for multimodal AI research.

Computer VisionDeep LearningFine‑tuning

0 likes · 25 min read

Comprehensive Guide to Vision Transformer (ViT): Architecture, Patch Tokenization, Embedding, Fine‑tuning, and Performance

Network Intelligence Research Center (NIRC)

Jun 24, 2023 · Artificial Intelligence

How DFX Achieves Low-Latency Multi-FPGA Acceleration for Transformer Text Generation

The article reviews the DFX system—a multi‑FPGA server that uses model‑parallelism and a ring‑topology interconnect to accelerate GPT‑2 text generation, showing 3.78× higher throughput, 3.99× better energy efficiency, and 8.21× greater cost‑effectiveness compared with a four‑GPU V100 baseline.

FPGAGPT-2Hardware acceleration

0 likes · 6 min read

How DFX Achieves Low-Latency Multi-FPGA Acceleration for Transformer Text Generation

Rare Earth Juejin Tech Community

Jun 11, 2023 · Artificial Intelligence

Comprehensive Technical Overview of GPT Series, Transformers, and Emerging Capabilities in Large Language Models

This article provides a detailed technical review of the evolution of GPT models, the Transformer architecture, large language model training methods, emergent abilities such as in‑context learning and chain‑of‑thought, multimodal extensions, and the challenges of data, scaling, and alignment, offering a holistic view for researchers and practitioners.

GPTInstructGPTMultimodal

0 likes · 28 min read

Comprehensive Technical Overview of GPT Series, Transformers, and Emerging Capabilities in Large Language Models

Network Intelligence Research Center (NIRC)

Jun 5, 2023 · Artificial Intelligence

How DETR and Its Successors Evolve: A Deep Dive into the DETR Series for Object Detection

This article reviews the original DETR model, analyzes its strengths and weaknesses, and then examines two major follow‑up works—Deformable‑DETR and DAB‑DETR—explaining how they modify attention mechanisms, introduce deformable convolutions and dynamic anchor boxes to accelerate convergence and improve small‑object detection.

DAB-DETRDETRDeformable-DETR

0 likes · 12 min read

How DETR and Its Successors Evolve: A Deep Dive into the DETR Series for Object Detection

Architects' Tech Alliance

May 15, 2023 · Artificial Intelligence

How Transformer Powers ChatGPT: A Deep Dive into Attention and Architecture

This article provides a comprehensive analysis of the Transformer model behind ChatGPT, covering its origin, core mechanisms such as embedding, positional encoding, self‑attention, multi‑head attention, a step‑by‑step translation example, and the broader implications for AI research and industry.

AI ArchitectureAttention MechanismChatGPT

0 likes · 19 min read

How Transformer Powers ChatGPT: A Deep Dive into Attention and Architecture

Full-Stack Trendsetter

May 15, 2023 · Artificial Intelligence

Do You Really Understand ChatGPT, the Era‑Defining AI?

This article explains what ChatGPT is, how it builds on natural-language-processing and the Transformer-based GPT series, details its model-size growth, architectural enhancements, multilingual support, and walks through the tokenization-to-generation pipeline that enables coherent AI-driven conversations.

ChatGPTDeep LearningGPT-3

0 likes · 8 min read

Do You Really Understand ChatGPT, the Era‑Defining AI?

Rare Earth Juejin Tech Community

May 8, 2023 · Artificial Intelligence

Understanding the Principles Behind ChatGPT: NLP, Transformers, and Reinforcement Learning

This article explains how ChatGPT works by covering the fundamentals of natural language processing, generative language models, deep learning, the Transformer architecture, attention mechanisms, few‑shot learning, and the reinforcement‑learning techniques that align its outputs with human preferences.

ChatGPTNLPReinforcement Learning

0 likes · 24 min read

Understanding the Principles Behind ChatGPT: NLP, Transformers, and Reinforcement Learning

DataFunSummit

May 6, 2023 · Artificial Intelligence

The Convergence of NLP and Computer Vision: Unified Neural Architectures and Pre‑training Strategies

This talk reviews the recent trend of unifying natural‑language processing and computer‑vision models through shared transformer architectures, masked‑image‑modeling pre‑training, brain‑inspired prediction mechanisms, and practical benefits such as knowledge sharing, multimodal applications, and cost efficiency, while highlighting the evolution of Swin Transformer and its next‑generation variants.

NLPTransformerUnified Architecture

0 likes · 20 min read

The Convergence of NLP and Computer Vision: Unified Neural Architectures and Pre‑training Strategies

21CTO

Apr 27, 2023 · Artificial Intelligence

Demystifying Transformers: A Step‑by‑Step Guide to Self‑Attention and Architecture

This article explains the Transformer model—from its encoder‑decoder structure and self‑attention mechanism to multi‑head attention, positional encoding, residual connections, training loss, and inference strategies—providing a clear, visual walkthrough for readers new to modern NLP architectures.

Deep LearningSelf-AttentionTransformer

0 likes · 21 min read

Demystifying Transformers: A Step‑by‑Step Guide to Self‑Attention and Architecture

Kuaishou Tech

Apr 26, 2023 · Artificial Intelligence

Dual-Interest Decomposition Head Attention for Sequence Recommendation with Positive and Negative Feedback

The paper proposes a dual‑interest decomposition head‑attention model that uses a feedback‑aware encoding layer, a factorized head attention mechanism, and separate positive/negative interest towers to improve sequence recommendation performance on short‑video and e‑commerce datasets.

FeedbackSequence ModelingTransformer

0 likes · 8 min read

Dual-Interest Decomposition Head Attention for Sequence Recommendation with Positive and Negative Feedback

Nightwalker Tech

Apr 26, 2023 · Artificial Intelligence

Understanding GPT: Meaning, Evolution, and Training Process

This article explains what GPT (Generative Pre‑trained Transformer) is, traces its development from early neural networks to the latest GPT‑4 models, and details the three‑stage training pipeline of unsupervised learning, supervised fine‑tuning, and reinforcement learning with human feedback.

GPTTransformer

0 likes · 15 min read

Understanding GPT: Meaning, Evolution, and Training Process

JD Tech

Apr 20, 2023 · Artificial Intelligence

Comprehensive Overview of ChatGPT: AI Background, Technical Foundations, and Commercial Applications

This extensive report examines ChatGPT’s origins, the evolution of artificial intelligence and natural language processing, details the underlying Transformer architecture and GPT series, discusses its limitations, and explores the wide-ranging commercial applications and future prospects of generative AI.

AIGCArtificial IntelligenceChatGPT

0 likes · 34 min read

Comprehensive Overview of ChatGPT: AI Background, Technical Foundations, and Commercial Applications

Python Crawling & Data Mining

Apr 5, 2023 · Artificial Intelligence

Why ChatGPT Works: Inside Transformers, RLHF, and AI’s Latest Breakthroughs

This article explores how ChatGPT’s remarkable abilities stem from the Transformer architecture, reinforcement learning from human feedback, and the insights presented in the fourth edition of "Artificial Intelligence: A Modern Approach," highlighting key AI milestones and technical foundations.

Artificial IntelligenceChatGPTDeep Learning

0 likes · 9 min read

Why ChatGPT Works: Inside Transformers, RLHF, and AI’s Latest Breakthroughs

DataFunTalk

Mar 18, 2023 · Artificial Intelligence

Review of Deep Learning Model Evolution, Current Limitations, and Future Trends

The article reviews the historical development of deep learning models, highlights scaling limits, universality, interpretability challenges, and hardware constraints, and then outlines future directions such as efficient architectures, self‑supervised training, broader applications, and emerging AI hardware, while also promoting a related ebook.

AI hardwareAI trendsTransformer

0 likes · 6 min read

Review of Deep Learning Model Evolution, Current Limitations, and Future Trends

360 Quality & Efficiency

Mar 10, 2023 · Artificial Intelligence

What Is ChatGPT? Overview, Performance, and Underlying Technologies

This article explains what ChatGPT is, its impressive conversational performance across tasks such as daily dialogue, document writing, math solving, and coding, and details the underlying Transformer architecture, massive data training, and reinforcement learning from human feedback that make the model so powerful.

Artificial IntelligenceChatGPTRLHF

0 likes · 9 min read

What Is ChatGPT? Overview, Performance, and Underlying Technologies

IT Services Circle

Mar 2, 2023 · Artificial Intelligence

Understanding GPT: Word Vectors, Transformers, and Model Architectures (GPT‑2, GPT‑3)

This article provides a concise technical overview of GPT, explaining how word vectors are constructed, how the Transformer architecture with self‑attention and feed‑forward layers processes these vectors, and how GPT‑2 and GPT‑3 extend the model with decoder‑only and large‑scale designs.

GPTSelf-AttentionTransformer

0 likes · 8 min read

Understanding GPT: Word Vectors, Transformers, and Model Architectures (GPT‑2, GPT‑3)

Top Architect

Mar 1, 2023 · Artificial Intelligence

Understanding the Internals of ChatGPT: Neural Networks, Embeddings, and Training Techniques

This article provides a comprehensive overview of how ChatGPT works, covering its probabilistic text generation, transformer architecture, embedding representations, neural network training processes, and the underlying principles that enable large language models to produce coherent and meaningful human-like language.

ChatGPTLanguage ModelNeural Networks

0 likes · 80 min read

Understanding the Internals of ChatGPT: Neural Networks, Embeddings, and Training Techniques

DataFunTalk

Feb 25, 2023 · Artificial Intelligence

Review of Deep Learning Model Evolution and Future Trends

The article reviews the historical development of deep learning models, highlights current limitations such as scaling inefficiencies, interpretability, and planning, and outlines future directions including efficient architectures, self‑supervised training, cross‑modal transformers, and the impact of AI on fields like life sciences and finance.

AI trendsFuture AITransformer

0 likes · 6 min read

Review of Deep Learning Model Evolution and Future Trends

DataFunTalk

Feb 20, 2023 · Artificial Intelligence

Review of Deep Learning Model Evolution and Future Trends

The article reviews the historical development of deep learning models, highlighting patterns such as scaling limits, increasing generality, interpretability challenges, planning deficiencies, and hardware constraints, and then outlines future directions including efficient architectures, enhanced capabilities, interdisciplinary applications, virtual agents, and novel AI hardware.

AI trendsTransformerself-supervised learning

0 likes · 6 min read

DataFunSummit

Feb 16, 2023 · Artificial Intelligence

Understanding the Transformer Model and Self‑Attention Mechanism with a Complete PyTorch Implementation

This article introduces the Transformer architecture, explains the self‑attention mechanism with visual illustrations, and provides a full, runnable PyTorch code example that implements the encoder‑decoder structure for sequence‑to‑sequence tasks.

NLPPyTorchSelf-Attention

0 likes · 11 min read

Understanding the Transformer Model and Self‑Attention Mechanism with a Complete PyTorch Implementation

Tencent Cloud Developer

Feb 14, 2023 · Artificial Intelligence

ChatGPT: Technology, Impact, and Future Perspectives

Since its November 2022 launch, OpenAI’s ChatGPT—built on Transformer‑based generative AI—has surged to over 100 million users, demonstrated capabilities from MBA exams to software‑engineer interviews, sparked a multibillion‑dollar market with paid subscriptions and Microsoft investment, spurred rival models like Claude, and is reshaping human‑computer interaction while raising ethical concerns and promising multimodal, industry‑specific future applications.

ChatGPTTransformergenerative AI

0 likes · 15 min read

ChatGPT: Technology, Impact, and Future Perspectives

Architect's Guide

Feb 9, 2023 · Artificial Intelligence

Why ChatGPT Is So Powerful: A Technical Overview of NLP Model Evolution

This article explains why ChatGPT performs so well by tracing the evolution of natural‑language processing from rule‑based grammars through statistical n‑gram models to neural architectures like RNNs, LSTMs, attention mechanisms, Transformers, and the massive data and training methods that power modern large language models.

ChatGPTNLPTransformer

0 likes · 14 min read

Why ChatGPT Is So Powerful: A Technical Overview of NLP Model Evolution

21CTO

Feb 6, 2023 · Artificial Intelligence

Understanding the Transformer: How Attention Powers ChatGPT and Modern AI

This article breaks down the Transformer architecture behind ChatGPT, explaining its attention mechanism, embedding, positional encoding, and multi‑head self‑attention, while highlighting the model's impact on AI research, data requirements, and future innovations.

Artificial IntelligenceAttention MechanismChatGPT

0 likes · 18 min read

Understanding the Transformer: How Attention Powers ChatGPT and Modern AI

IT Architects Alliance

Feb 6, 2023 · Artificial Intelligence

Understanding the Transformer Model: A Deep Dive into “Attention Is All You Need”

This article provides a comprehensive, plain‑language walkthrough of the 2017 “Attention Is All You Need” paper, explaining the Transformer’s architecture, core mechanisms such as embedding, positional encoding and self‑attention, and discussing its broader impact on AI research and applications.

Attention MechanismTransformerai

0 likes · 17 min read

Understanding the Transformer Model: A Deep Dive into “Attention Is All You Need”