Tagged articles

Transformer architecture

3 articles · Page 1 of 1
21CTO
21CTO
Jun 27, 2026 · Artificial Intelligence

Large vs Small Language Models: An Apple‑Centric Technical Comparison

The article analyses how deployment targets, inference economics, and training budgets drive divergent design choices for large (LLM) and small (SLM) Transformer‑based language models, covering architecture tweaks, data‑centric training methods, quantisation, KV‑cache management, and hybrid routing strategies for production systems.

Inference OptimizationQuantizationTransformer architecture
0 likes · 16 min read
Large vs Small Language Models: An Apple‑Centric Technical Comparison
Architects' Tech Alliance
Architects' Tech Alliance
Jun 11, 2025 · Artificial Intelligence

From Transformers to DeepSeek‑R1: The 2017‑2025 Evolution of Large Language Models

This article chronicles the rapid development of large language models from the 2017 Transformer breakthrough through the rise of BERT, GPT‑3, ChatGPT, multimodal systems like GPT‑4V/o, and the recent cost‑efficient DeepSeek‑R1, highlighting key architectural innovations, scaling trends, alignment techniques, and their transformative impact on AI research and industry.

AI alignmentBERTCost‑Efficient Inference
0 likes · 26 min read
From Transformers to DeepSeek‑R1: The 2017‑2025 Evolution of Large Language Models
AntTech
AntTech
May 30, 2025 · Artificial Intelligence

Insights from Ant Group’s 10th Technical Open Day: Multimodal, Embodied, and Future Model Architectures for AGI

The Ant Group’s 10th Technical Open Day gathered leading AI experts who examined the current state and future directions of multimodal large models, embodied AI, world models, transformer architectures, and vertical applications, offering a comprehensive view of the challenges and opportunities on the path toward AGI.

AGIAI safetyEmbodied AI
0 likes · 16 min read
Insights from Ant Group’s 10th Technical Open Day: Multimodal, Embodied, and Future Model Architectures for AGI