Tag

Transformer

1 views collected around this technical thread.

Tencent Technical Engineering
Tencent Technical Engineering
May 28, 2025 · Artificial Intelligence

A Beginner-friendly Overview of LLMs, Transformers, Prompts, Function Calling, MCP and Agents

This article provides a concise, easy-to-understand introduction to large language models, the transformer architecture, prompt engineering, temperature settings, function calling, the Model Context Protocol (MCP), agent communication (A2A), and future AI programming trends, using simple analogies and illustrative examples.

AIFunction CallingLLM
0 likes · 11 min read
A Beginner-friendly Overview of LLMs, Transformers, Prompts, Function Calling, MCP and Agents
Didi Tech
Didi Tech
Apr 24, 2025 · Artificial Intelligence

Algorithmic Foundations and Evolution of Natural Language Processing

The article surveys the Algorithmic Foundations of Engineering R&D series, tracing NLP’s evolution from rule‑based systems to today’s multimodal large‑model era, reviewing core machine‑learning and deep‑learning techniques, transformer breakthroughs, representation learning, optimization methods, and emerging research such as retrieval‑augmented generation and AI agents.

AINLPTransformer
0 likes · 43 min read
Algorithmic Foundations and Evolution of Natural Language Processing
Tencent Technical Engineering
Tencent Technical Engineering
Apr 16, 2025 · Artificial Intelligence

Understanding Transformer Architecture for Chinese‑English Translation: A Practical Guide

This practical guide walks through the full Transformer architecture for Chinese‑to‑English translation, detailing encoder‑decoder structure, tokenization and embeddings, batch handling with padding and masks, positional encodings, parallel teacher‑forcing, self‑ and multi‑head attention, and the complete forward and back‑propagation training steps.

EmbeddingPositional EncodingPyTorch
0 likes · 26 min read
Understanding Transformer Architecture for Chinese‑English Translation: A Practical Guide
Architects' Tech Alliance
Architects' Tech Alliance
Mar 31, 2025 · Artificial Intelligence

A Comprehensive History of Large Language Models from the Transformer Era (2017) to DeepSeek‑R1 (2025)

This article reviews the evolution of large language models from the 2017 Transformer breakthrough through BERT, GPT series, alignment techniques, multimodal extensions, open‑weight releases, and the cost‑efficient DeepSeek‑R1 in 2025, highlighting key technical advances, scaling trends, and their societal impact.

AI alignmentLLM evolutionReasoning Models
0 likes · 26 min read
A Comprehensive History of Large Language Models from the Transformer Era (2017) to DeepSeek‑R1 (2025)
AntTech
AntTech
Mar 26, 2025 · Artificial Intelligence

BodyGen: A Bio‑Inspired Embodied Co‑Design Framework for Autonomous Robot Evolution

BodyGen, a new embodied co‑design framework presented at ICLR 2025, enables robots to autonomously evolve their morphology and control policies using reinforcement learning and transformer‑based networks, achieving up to 60 % performance gains with a lightweight 1.43 M‑parameter model, and its code is publicly released.

Co-designRoboticsTransformer
0 likes · 10 min read
BodyGen: A Bio‑Inspired Embodied Co‑Design Framework for Autonomous Robot Evolution
IT Services Circle
IT Services Circle
Mar 19, 2025 · Artificial Intelligence

ByteDance’s AI Video Generation Model Goku, Streamer‑Sales Live‑Selling Model, and MimicTalk 3D Talking‑Head Project

ByteDance and partners open‑source three AI projects—Goku for high‑quality text‑to‑video generation, Streamer‑Sales for multimodal live‑selling LLMs, and MimicTalk for rapid 3D talking‑head creation—detailing their core features, underlying transformer‑based architectures, training pipelines, and public repositories.

AI video generationTransformerVirtual digital human
0 likes · 5 min read
ByteDance’s AI Video Generation Model Goku, Streamer‑Sales Live‑Selling Model, and MimicTalk 3D Talking‑Head Project
Cognitive Technology Team
Cognitive Technology Team
Mar 10, 2025 · Artificial Intelligence

Understanding Transformers: From NLP Challenges to Architecture and Core Mechanisms

This article explains the evolution of natural language processing, the limitations of rule‑based, statistical, and recurrent neural network models, and then introduces the Transformer architecture—covering word and position embeddings, self‑attention, multi‑head attention, Add & Norm, feed‑forward layers, and encoder‑decoder design—to help beginners grasp why Transformers solve key NLP problems.

AINLPSelf‑Attention
0 likes · 15 min read
Understanding Transformers: From NLP Challenges to Architecture and Core Mechanisms
Cognitive Technology Team
Cognitive Technology Team
Mar 7, 2025 · Artificial Intelligence

From Word Embeddings to Large Language Models: A Comprehensive Overview of AI Model Evolution

This article traces the development of AI models—from early word embeddings like Word2Vec and ELMo, through transformer‑based encoders such as BERT and decoder‑only models like GPT‑1/2/3, to recent multimodal systems and scaling laws—explaining their architectures, training methods, and impact on modern AI applications.

AITransformerembedding
0 likes · 22 min read
From Word Embeddings to Large Language Models: A Comprehensive Overview of AI Model Evolution
Cognitive Technology Team
Cognitive Technology Team
Mar 6, 2025 · Artificial Intelligence

From Traditional Machine Learning to Deep Learning: A Comprehensive Guide to Algorithms, Feature Engineering, and Model Training

This article provides a step‑by‑step tutorial that walks readers through the fundamentals of traditional machine‑learning algorithms, feature‑engineering techniques, model training pipelines, evaluation metrics, and then advances to deep‑learning concepts such as MLPs, activation functions, transformers, and modern recommendation‑system models.

Recommendation systemsTransformerdeep learning
0 likes · 63 min read
From Traditional Machine Learning to Deep Learning: A Comprehensive Guide to Algorithms, Feature Engineering, and Model Training
IT Architects Alliance
IT Architects Alliance
Feb 26, 2025 · Artificial Intelligence

DeepSeek Large Model: Core Architecture, Key Technologies, and Training Strategies

The article provides an in‑depth overview of DeepSeek’s large language model, detailing its mixture‑of‑experts and Transformer foundations, novel attention mechanisms, load‑balancing, multi‑token prediction, FP8 mixed‑precision training, and various training regimes such as knowledge distillation and reinforcement learning.

DeepSeekFP8MLA
0 likes · 18 min read
DeepSeek Large Model: Core Architecture, Key Technologies, and Training Strategies
IT Architects Alliance
IT Architects Alliance
Feb 15, 2025 · Artificial Intelligence

DeepSeek: Architecture, Core Technologies, Training Strategies, and Comparative Analysis

The article provides an in‑depth overview of DeepSeek's transformer‑based foundation, Mixture‑of‑Experts architecture, novel attention mechanisms, multi‑token prediction, FP8 mixed‑precision training, knowledge distillation, reinforcement‑learning approaches, and compares its performance and cost advantages against leading models such as GPT and Gemini.

AI Model ArchitectureDeepSeekFP8 Training
0 likes · 29 min read
DeepSeek: Architecture, Core Technologies, Training Strategies, and Comparative Analysis
vivo Internet Technology
vivo Internet Technology
Feb 12, 2025 · Artificial Intelligence

Bidirectional Optimization of NLLB-200 and ChatGPT for Low-Resource Language Translation

The paper proposes a bidirectional optimization framework that fine‑tunes the low‑resource NLLB‑200 translation model with LoRA using data generated by ChatGPT, while also translating low‑resource prompts with NLLB before feeding them to LLMs, thereby improving multilingual translation quality yet requiring careful validation of noisy synthetic data.

Fine-tuningLLMLoRA
0 likes · 28 min read
Bidirectional Optimization of NLLB-200 and ChatGPT for Low-Resource Language Translation
Cognitive Technology Team
Cognitive Technology Team
Feb 9, 2025 · Artificial Intelligence

A Beginner’s Guide to the History and Key Concepts of Deep Learning

From the perceptron’s inception in 1958 to modern Transformer-based models like GPT, this article traces the evolution of deep learning, explaining foundational architectures such as DNNs, CNNs, RNNs, LSTMs, attention mechanisms, and recent innovations like DeepSeek’s MLA, highlighting their principles and impact.

GPTHistoryMLA
0 likes · 19 min read
A Beginner’s Guide to the History and Key Concepts of Deep Learning
DaTaobao Tech
DaTaobao Tech
Jan 22, 2025 · Artificial Intelligence

AI Trends 2025: Paths to AGI, Scaling Law Evolution, and Industry Impact

The article surveys the AI revolution driven by foundation models and an evolving Scaling Law, outlining four AGI pathways—large models, intelligent robots, brain‑computer interfaces, and digital life—while highlighting transformer‑based convergence, generative‑first‑principle breakthroughs like DeepSeek‑V3, and transformative industry impacts ranging from consumer robots to Medical 2.0, personalized education, and digital‑simulation platforms such as NVIDIA’s Omniverse.

AGIAIAI Industry
0 likes · 23 min read
AI Trends 2025: Paths to AGI, Scaling Law Evolution, and Industry Impact
DevOps
DevOps
Dec 19, 2024 · Artificial Intelligence

Yann LeCun Discusses AI, Self‑Supervised Learning, and the Future of AGI

Yann LeCun, in a half‑hour interview with Indian entrepreneur Nikhil Kamath, explains the fundamentals of artificial intelligence, critiques current transformer models, describes self‑supervised learning, outlines his joint‑embedding predictive architecture, and shares his vision for AGI, open‑source ecosystems, and the role of PhDs for AI entrepreneurs.

AGIArtificial IntelligencePhD
0 likes · 16 min read
Yann LeCun Discusses AI, Self‑Supervised Learning, and the Future of AGI
AntTech
AntTech
Dec 6, 2024 · Artificial Intelligence

Nimbus: Secure and Efficient Two‑Party Inference for Transformers

The paper introduces Nimbus, a two‑party privacy‑preserving inference framework for Transformer models that leverages a client‑side outer‑product linear‑layer protocol and distribution‑aware polynomial approximations for non‑linear layers, achieving up to five‑fold speedups with negligible accuracy loss.

Transformerhomomorphic encryptionmachine learning
0 likes · 15 min read
Nimbus: Secure and Efficient Two‑Party Inference for Transformers
DataFunSummit
DataFunSummit
Nov 24, 2024 · Artificial Intelligence

AI-Driven Forecasting in Modern Supply Chains: Methods, Models, and Practical Guidance

The article explains how modern supply chain forecasting has shifted from qualitative expert judgment to quantitative AI-driven methods such as DeepAR, ensemble learning, and Transformers, and outlines the skills needed for practitioners to build effective predictive models.

AIDeepARTransformer
0 likes · 10 min read
AI-Driven Forecasting in Modern Supply Chains: Methods, Models, and Practical Guidance
Cognitive Technology Team
Cognitive Technology Team
Nov 20, 2024 · Artificial Intelligence

Fundamentals and Implementation of Neural Networks and Transformers with PyTorch Examples

This article provides a comprehensive overview of neural network fundamentals, loss functions, activation functions, embedding techniques, attention mechanisms, multi‑head attention, residual networks, and the full Transformer encoder‑decoder architecture, illustrated with detailed PyTorch code and a practical MiniRBT fine‑tuning case for Chinese text classification.

AIPyTorchTransformer
0 likes · 49 min read
Fundamentals and Implementation of Neural Networks and Transformers with PyTorch Examples
AntTech
AntTech
Oct 29, 2024 · Artificial Intelligence

Three Ant Group Papers Featured at EMNLP 2024: Dynamic Transformers, Plug‑and‑Play Visual Reasoner, and Efficient Fine‑Tuning of Large Language Models

This announcement introduces three Ant Group papers accepted at EMNLP 2024—Mixture‑of‑Modules for dynamic Transformer assembly, a plug‑and‑play visual reasoning framework built via data synthesis, and a layer‑wise importance‑aware efficient fine‑tuning method for large language models—highlighting their innovations and upcoming live presentations.

EMNLP 2024Parameter-efficient Fine-tuningTransformer
0 likes · 6 min read
Three Ant Group Papers Featured at EMNLP 2024: Dynamic Transformers, Plug‑and‑Play Visual Reasoner, and Efficient Fine‑Tuning of Large Language Models