Unlocking ChatGPT: A Deep Dive into Transformers, Tokenization, and Self‑Attention

This tutorial walks through the fundamentals of ChatGPT by explaining language modeling, character‑level tokenization, data preprocessing pipelines, the evolution from simple bigram models to scaled dot‑product self‑attention, multi‑head mechanisms, full Transformer blocks, and how to train and generate Shakespeare‑style text with a GPT model.

ChatGPTGPTLanguage Modeling

0 likes · 50 min read

Unlocking ChatGPT: A Deep Dive into Transformers, Tokenization, and Self‑Attention

AIWalker

Feb 20, 2025 · Artificial Intelligence

Transfusion: A Single Model for Unified Image Generation and Understanding

Transfusion is a 7B‑parameter transformer that jointly trains language modeling and diffusion losses on mixed text‑image data, enabling seamless text generation, image generation, and image understanding within one model and outperforming prior multimodal approaches such as Chameleon across multiple benchmarks.

AI researchImage GenerationLanguage Modeling

0 likes · 20 min read

Transfusion: A Single Model for Unified Image Generation and Understanding

Baobao Algorithm Notes

Mar 29, 2024 · Artificial Intelligence

Can Data Mixing Laws Predict LLM Performance? A Deep Dive into Scaling Laws

This article reviews the paper “Data Mixing Laws: Optimizing Data Mixture by Predicting Language Modeling Performance”, explaining how the authors quantify the impact of data mixture ratios on LLM loss, propose a simple predictive model, validate it on RedPajama and multi‑domain mixes, and outline a scaling‑law procedure for continual pre‑training.

Data MixingData SchedulingLLM

0 likes · 9 min read

Can Data Mixing Laws Predict LLM Performance? A Deep Dive into Scaling Laws

Network Intelligence Research Center (NIRC)

Aug 22, 2023 · Artificial Intelligence

LONGNET: Extending Transformers to Over 1 Billion Tokens

LONGNET introduces dilated attention to enable Transformers to process sequences exceeding one billion tokens with linear computational cost, preserving performance on shorter inputs and demonstrating strong results on long‑sequence modeling and standard language tasks.

Dilated AttentionLONGNETLanguage Modeling

0 likes · 6 min read

LONGNET: Extending Transformers to Over 1 Billion Tokens

DataFunTalk

Jun 21, 2023 · Artificial Intelligence

Low‑Resource NLP Pretraining: Methodology, Experiments, and Zero‑Shot Applications

This article presents a low‑resource NLP pretraining approach that combines transformer‑based language modeling with contrastive vector learning, details the unsupervised sample‑pair construction, introduces a camel‑shaped masking distribution, and demonstrates through extensive experiments that the resulting model achieves strong zero‑shot NLU, NLG, and retrieval performance while requiring minimal compute and data.

Language ModelingLow-Resourcecontrastive learning

0 likes · 10 min read

Low‑Resource NLP Pretraining: Methodology, Experiments, and Zero‑Shot Applications

phodal

Nov 23, 2020 · Fundamentals

Can a Universal Language Model Translate Any Code to Any Other Language?

The article chronicles a multi‑year effort to build a universal language model that can convert any source programming language into any target language, detailing experiments with Go‑ANTLR, Kotlin‑ANTLR, regex‑based parsing, DSL design, and the emerging Charj language and its tooling.

ANTLRCompiler designDSL

0 likes · 11 min read

Can a Universal Language Model Translate Any Code to Any Other Language?