What 2023 Taught Us About LLMs and AI‑Guided Optimization

The author reviews a year of rapid progress in large language models, highlighting breakthrough papers such as Positional Interpolation, StreamingLLM, Deja Vu, and RLCD, and discusses how AI‑guided optimization techniques like SurCo, LANCER, and GenCo are reshaping research and industry applications.

NewBeeNLP
NewBeeNLP
NewBeeNLP
What 2023 Taught Us About LLMs and AI‑Guided Optimization

Large Language Models (LLM)

2023 saw several LLM papers attract wide community attention. Positional Interpolation demonstrated that a single line‑code change to RoPE, combined with modest fine‑tuning, can dramatically extend the pre‑training context window, sparking a surge of open‑source long‑context models.

StreamingLLM (Attention Sink) showed that preserving only the first four tokens during inference breaks the context‑window limit, enabling "infinite chat" behavior. The method quickly spread to Intel Extension for Transformers, HuggingFace Transformers, and the mobile offline LLM MLC Chat.

The idea behind StreamingLLM originated from the H2O paper, which observed that discarding 80% of the KV‑cache does not harm next‑token perplexity, prompting investigation of the remaining influential tokens.

Deja Vu (ICML'23 oral) introduced sparsity‑based inference acceleration: it predicts which neurons and attention heads will be active in future layers and loads only those weights into GPU cache, drastically reducing memory I/O. Subsequent work such as Shanghai Jiao‑Tong University's PowerIter combined this with CPU‑GPU joint inference.

The author also proposed RLCD , a method that generates labeled generation samples using both positive and negative prompts, eliminating manual annotation and supporting fine‑tuning or reward‑model training to avoid RLAIF pitfalls.

Two theoretical papers were released on Transformer dynamics: Scan&Snap (NeurIPS'23) analyzed a single‑layer linear MLP + attention, while JoMA extended the analysis to multi‑layer nonlinear MLP + attention, revealing that attention becomes sparse during training before partially densifying, and explaining why Transformers can learn high‑level concepts.

AI‑Guided Optimization

The SurCo framework (ICML'23) uses a linear surrogate cost to augment traditional combinatorial solvers, enabling indirect solutions to nonlinear combinatorial problems such as table sharding, optical device design, and nonlinear shortest‑path problems. SurCo won the best paper award at the ICML'23 SODL workshop.

Subsequent work includes LANCER (NeurIPS'23), which reduces the number of calls to the combinatorial optimizer, improving efficiency for portfolio optimization, and GenCo , which generates diverse feasible solutions for nonlinear problems and applies them to game level and optical device design.

Contrastive‑learning approaches like CL‑LNS (ICML'23) and its successor ConPAS accelerate Large Neighborhood Search by learning heuristic search rules. The overall conclusion is that fully replacing decades‑old combinatorial methods with ML is still difficult; practical solutions involve high‑level ML policies that invoke existing solvers when needed.

Reflections on the LLM Era

The pace of research has become extremely fast, with major conferences serving more as social gatherings than sources of cutting‑edge results. Real‑time discussions now happen on Discord, X (Twitter), HuggingFace repos, and GitHub issues.

Instances of parallel discovery (e.g., StreamingLLM vs. LM‑Infinite) caused frustration, but the author persisted, re‑framed the problem, and produced additional experiments (Table 2) that confirmed the existence of Attention Sink. Similar parallel work appeared in ViT analysis.

Rapid iteration rewards hands‑on code exploration; those who write and test code quickly gain deeper understanding and can outpace even venture‑capital‑backed teams. The LLM wave also reshapes research thinking, making tasks that once seemed impossible—such as self‑reflection or executing novel textual instructions—achievable with a few well‑crafted prompts.

Looking ahead, the author expects continued acceleration driven by better hardware, open‑source ecosystems, and evolving researcher mindsets, potentially empowering individuals and small teams to make unique contributions.

References

Positional Interpolation: https://arxiv.org/abs/2306.15595

StreamingLLM: https://arxiv.org/abs/2309.17453

Blog: https://huggingface.co/blog/tomaarsen/attention-sinks

Video: https://www.youtube.com/watch?v=409tNlaByds

Media coverage: https://venturebeat.com/ai/streamingllm-shows-how-one-token-can-keep-ai-models-running-smoothly-indefinitely/

Discussion: https://news.ycombinator.com/item?id=37740932

Intel Extension for Transformers: https://twitter.com/HaihaoShen/status/1715335763032780853

HuggingFace Transformers PR: https://github.com/huggingface/transformers/pull/26681

MLC Chat: https://twitter.com/davidpissarra/status/1735761373261427189

H2O: https://arxiv.org/abs/2306.14048

RLCD: https://arxiv.org/abs/2307.12950

Scan&Snap: https://arxiv.org/abs/2305.16380

JoMA: https://arxiv.org/abs/2310.00535

Hong Kong University talk: https://twitter.com/hkudatascience/status/1706967154887962986

RIKEN talk: https://youtu.be/u05Z74dF0Gg

Remote talk: https://www.youtube.com/watch?v=eXPhvQgAT_I

SurCo: https://arxiv.org/abs/2210.12547

SODL workshop: https://sods-icml2023.github.io/

LANCER: https://arxiv.org/abs/2307.08964

GenCo: https://arxiv.org/abs/2310.02442v1

CL‑LNS: https://arxiv.org/abs/2302.01578

LM‑Infinite: https://arxiv.org/abs/2308.16137

ViT analysis: https://arxiv.org/abs/2309.16588

Online discussion of Positional Interpolation: https://kaiokendev.github.io/til#extending-context-to-8k

Improved RoPE scaling: https://www.reddit.com/r/LocalLLaMA/comments/14lz7j5/ntkaware_scaled_rope_allows_llama_models_to_have/

Twitter question: https://twitter.com/MarkwardtAdam/status/1674425742615269385

Twitter reply: https://twitter.com/tydsh/status/1674436093356421120

GPT‑4 speculation (part 2): https://zhuanlan.zhihu.com/p/622518320

LLMlarge language modelsTransformersAI Optimizationresearch trends
NewBeeNLP
Written by

NewBeeNLP

Always insightful, always fun

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.