Tagged articles
6 articles
Page 1 of 1
MoonWebTeam
MoonWebTeam
Oct 1, 2025 · Artificial Intelligence

Unlocking ChatGPT: A Deep Dive into Transformers, Tokenization, and Self‑Attention

This tutorial walks through the fundamentals of ChatGPT by explaining language modeling, character‑level tokenization, data preprocessing pipelines, the evolution from simple bigram models to scaled dot‑product self‑attention, multi‑head mechanisms, full Transformer blocks, and how to train and generate Shakespeare‑style text with a GPT model.

ChatGPTGPTLanguage Modeling
0 likes · 50 min read
Unlocking ChatGPT: A Deep Dive into Transformers, Tokenization, and Self‑Attention
AIWalker
AIWalker
Feb 20, 2025 · Artificial Intelligence

Transfusion: A Single Model for Unified Image Generation and Understanding

Transfusion is a 7B‑parameter transformer that jointly trains language modeling and diffusion losses on mixed text‑image data, enabling seamless text generation, image generation, and image understanding within one model and outperforming prior multimodal approaches such as Chameleon across multiple benchmarks.

AI researchImage GenerationLanguage Modeling
0 likes · 20 min read
Transfusion: A Single Model for Unified Image Generation and Understanding
Baobao Algorithm Notes
Baobao Algorithm Notes
Mar 29, 2024 · Artificial Intelligence

Can Data Mixing Laws Predict LLM Performance? A Deep Dive into Scaling Laws

This article reviews the paper “Data Mixing Laws: Optimizing Data Mixture by Predicting Language Modeling Performance”, explaining how the authors quantify the impact of data mixture ratios on LLM loss, propose a simple predictive model, validate it on RedPajama and multi‑domain mixes, and outline a scaling‑law procedure for continual pre‑training.

Data MixingData SchedulingLLM
0 likes · 9 min read
Can Data Mixing Laws Predict LLM Performance? A Deep Dive into Scaling Laws
DataFunTalk
DataFunTalk
Jun 21, 2023 · Artificial Intelligence

Low‑Resource NLP Pretraining: Methodology, Experiments, and Zero‑Shot Applications

This article presents a low‑resource NLP pretraining approach that combines transformer‑based language modeling with contrastive vector learning, details the unsupervised sample‑pair construction, introduces a camel‑shaped masking distribution, and demonstrates through extensive experiments that the resulting model achieves strong zero‑shot NLU, NLG, and retrieval performance while requiring minimal compute and data.

Language ModelingLow-Resourcecontrastive learning
0 likes · 10 min read
Low‑Resource NLP Pretraining: Methodology, Experiments, and Zero‑Shot Applications
phodal
phodal
Nov 23, 2020 · Fundamentals

Can a Universal Language Model Translate Any Code to Any Other Language?

The article chronicles a multi‑year effort to build a universal language model that can convert any source programming language into any target language, detailing experiments with Go‑ANTLR, Kotlin‑ANTLR, regex‑based parsing, DSL design, and the emerging Charj language and its tooling.

ANTLRCompiler designDSL
0 likes · 11 min read
Can a Universal Language Model Translate Any Code to Any Other Language?