Instant LoRA Generation and Long‑Document Internalization: Cost‑Amortized Model Updates via 0.1‑Second Forward Pass

The article analyzes the quadratic attention and KV‑Cache bottlenecks of Transformers on ultra‑long inputs and the heavy compute cost of traditional supervised fine‑tuning, then presents Sakana AI's Cost Amortization framework—Doc‑to‑LoRA and Text‑to‑LoRA—that shifts weight updates to a meta‑training hypernetwork, achieving sub‑50 MB memory for 128K‑token inference, sub‑GB update memory for long‑document QA, and zero‑shot task adaptation with sub‑second latency.

Cost AmortizationLoRALong-context

0 likes · 13 min read

Instant LoRA Generation and Long‑Document Internalization: Cost‑Amortized Model Updates via 0.1‑Second Forward Pass

Meta-training

Instant LoRA Generation and Long‑Document Internalization: Cost‑Amortized Model Updates via 0.1‑Second Forward Pass