Tag

Model Scaling

0 views collected around this technical thread.

DataFunSummit
DataFunSummit
Feb 5, 2025 · Artificial Intelligence

Exploration and Practice of Large‑Model Data Construction

This presentation details engineering‑focused approaches to building, mixing, and filtering data for large language models, covering data preparation, pre‑training mix strategies such as DoReMi, DoGE and online sampling, post‑training data quality selection methods, and practical Q&A on scaling laws and PDF processing.

AIData EngineeringLarge Language Models
0 likes · 15 min read
Exploration and Practice of Large‑Model Data Construction
Architect
Architect
May 5, 2024 · Artificial Intelligence

The Rise of Small Language Models (SLM) and Their Impact on AI Development

Amidst a growing trend that narrows performance gaps between large and small language models, researchers highlight the efficiency, adaptability, and specialized advantages of small language models (SLM), while also discussing the high costs, hallucinations, and security concerns that still challenge large‑scale LLMs.

AI EfficiencyEdge ComputingLLM
0 likes · 9 min read
The Rise of Small Language Models (SLM) and Their Impact on AI Development
DaTaobao Tech
DaTaobao Tech
Sep 11, 2023 · Artificial Intelligence

Large Language Model Upgrade Paths and Architecture Selection

This article analyzes upgrade paths of major LLMs—ChatGLM, LLaMA, Baichuan—detailing performance, context length, and architectural changes, then examines essential capabilities, data cleaning, tokenizer and attention design, and offers practical guidance for balanced scaling and efficient model construction.

BaichuanChatGLMLLM architecture
0 likes · 32 min read
Large Language Model Upgrade Paths and Architecture Selection
Rare Earth Juejin Tech Community
Rare Earth Juejin Tech Community
Jul 24, 2023 · Artificial Intelligence

Comprehensive Survey of Large Language Models: History, Key Technologies, Resources, and Future Directions

This article provides a detailed overview of large language models (LLMs), tracing their evolution from statistical and neural language models to modern pre‑trained transformers, discussing scaling, training, adaptation, utilization, evaluation methods, available resources, and outlining current challenges and future research directions.

AI researchLarge Language ModelsModel Scaling
0 likes · 26 min read
Comprehensive Survey of Large Language Models: History, Key Technologies, Resources, and Future Directions
DataFunTalk
DataFunTalk
May 31, 2023 · Artificial Intelligence

Why GPT Can Exhibit Intelligence Through Next‑Token Prediction: A Comprehensive Exploration of Compression, Knowledge Circuits, and Model Scaling

This article examines the debate over whether large language models truly possess intelligence, arguing that next‑token prediction functions as a form of lossless data compression whose efficiency reflects intelligence, and it surveys research on knowledge extraction, neuron semantics, circuit competition, scaling effects, and the broader philosophical implications of GPT as a mirror of the world’s parameters.

Artificial IntelligenceGPTLarge Language Models
0 likes · 59 min read
Why GPT Can Exhibit Intelligence Through Next‑Token Prediction: A Comprehensive Exploration of Compression, Knowledge Circuits, and Model Scaling
Architect
Architect
Apr 19, 2023 · Artificial Intelligence

Emergence in Large Language Models: Phenomena, Explanations, and Implications

This article reviews the emergence phenomena observed in large language models, explains how model scale, in‑context learning and chain‑of‑thought prompting contribute to sudden performance gains, discusses small‑model alternatives, and explores the relationship between emergence and the training‑time Grokking effect.

AI researchChain-of-ThoughtEmergence
0 likes · 13 min read
Emergence in Large Language Models: Phenomena, Explanations, and Implications
Architect
Architect
Apr 14, 2023 · Artificial Intelligence

Overview of Prominent Large Language Models and Instruction Fine‑Tuning Techniques

The article surveys major large language models—including GPT‑3, T5, LaMDA, Jurassic‑1, MT‑NLG, Gopher, Chinchilla, PaLM, U‑PaLM, OPT, LLaMA, BLOOM, GLM‑130B, and ERNIE 3.0 Titan—explains their architectures, scaling trade‑offs, and then details instruction‑fine‑tuned variants such as T0, FLAN, GPT‑3.5, ChatGPT, GPT‑4, Alpaca and ChatGLM, providing references for further study.

AIChatGPTGPT-3
0 likes · 27 min read
Overview of Prominent Large Language Models and Instruction Fine‑Tuning Techniques
DataFunTalk
DataFunTalk
Mar 18, 2023 · Artificial Intelligence

Review of Deep Learning Model Evolution, Current Limitations, and Future Trends

The article reviews the historical development of deep learning models, highlights scaling limits, universality, interpretability challenges, and hardware constraints, and then outlines future directions such as efficient architectures, self‑supervised training, broader applications, and emerging AI hardware, while also promoting a related ebook.

AI TrendsAI hardwareModel Scaling
0 likes · 6 min read
Review of Deep Learning Model Evolution, Current Limitations, and Future Trends
DataFunTalk
DataFunTalk
Mar 16, 2023 · Artificial Intelligence

Review of Deep Learning Model Evolution and Future Trends

The article reviews the past six years of deep learning model development, highlighting scaling limits, universality of Transformers, challenges in interpretability and control, and predicts future trends such as efficient architectures, multimodal capabilities, reinforcement learning in virtual worlds, and novel AI hardware, while also promoting a new deep‑learning practice ebook.

AI TrendsAI hardwareModel Scaling
0 likes · 6 min read
Review of Deep Learning Model Evolution and Future Trends
DataFunTalk
DataFunTalk
Mar 14, 2023 · Artificial Intelligence

Review of Deep Learning Model Evolution and Future Trends

The article reviews the past six years of deep‑learning model development, highlighting patterns such as increasing scale, growing universality, limited interpretability, and challenges in efficiency, while forecasting future directions like more efficient architectures, enhanced perception, multimodal capabilities, integration with life sciences, and the emergence of general‑purpose intelligent agents, and concludes with a promotion for a deep‑learning practice ebook.

AI TrendsEfficiencyInterpretability
0 likes · 6 min read
Review of Deep Learning Model Evolution and Future Trends
DataFunTalk
DataFunTalk
Feb 25, 2023 · Artificial Intelligence

Review of Deep Learning Model Evolution and Future Trends

The article reviews the historical development of deep learning models, highlights current limitations such as scaling inefficiencies, interpretability, and planning, and outlines future directions including efficient architectures, self‑supervised training, cross‑modal transformers, and the impact of AI on fields like life sciences and finance.

AI TrendsModel ScalingTransformer
0 likes · 6 min read
Review of Deep Learning Model Evolution and Future Trends
DataFunTalk
DataFunTalk
Feb 20, 2023 · Artificial Intelligence

Review of Deep Learning Model Evolution and Future Trends

The article reviews the historical development of deep learning models, highlighting patterns such as scaling limits, increasing generality, interpretability challenges, planning deficiencies, and hardware constraints, and then outlines future directions including efficient architectures, enhanced capabilities, interdisciplinary applications, virtual agents, and novel AI hardware.

AI TrendsModel ScalingTransformer
0 likes · 6 min read
Review of Deep Learning Model Evolution and Future Trends
Architect
Architect
Feb 18, 2023 · Artificial Intelligence

Paradigm Shifts in Large Language Models: From Pre‑training to AGI and Future Research Directions

The article reviews the evolution of large language models, highlighting two major paradigm shifts after GPT‑3, the role of scaling laws, knowledge acquisition, prompting techniques, reasoning abilities, and outlines future research priorities for building more capable and efficient AI systems.

AI reasoningIn-Context LearningLarge Language Models
0 likes · 71 min read
Paradigm Shifts in Large Language Models: From Pre‑training to AGI and Future Research Directions
DataFunTalk
DataFunTalk
Feb 10, 2023 · Artificial Intelligence

ChatGPT: A Revolutionary Breakthrough, Its Core Capabilities, and Impact on Investment Research

This article analyzes why ChatGPT represents a revolutionary advance in AI, explores its emergent abilities and code‑training advantages, evaluates its practical value for investment research through real‑world comparisons with experts, and discusses future trends and challenges for large language models.

AIChatGPTCode Training
0 likes · 16 min read
ChatGPT: A Revolutionary Breakthrough, Its Core Capabilities, and Impact on Investment Research
DataFunTalk
DataFunTalk
Nov 22, 2022 · Artificial Intelligence

NVIDIA's Advances in Multi‑Role Generative Dialogue Modeling and Synthetic Data‑Driven QA

This article reviews NVIDIA's recent work on multi‑role generative dialogue modeling using GPT‑2‑based architectures and on enhancing question‑answering systems with synthetic data pipelines, covering model design, data preparation from Reddit, extensive experiments, scaling effects, and practical Q&A insights.

GPT-2Generative DialogueModel Scaling
0 likes · 17 min read
NVIDIA's Advances in Multi‑Role Generative Dialogue Modeling and Synthetic Data‑Driven QA