Artificial Intelligence 6 min read

Weekly AI Paper Digest: Open-Source LLMs, Agent Systems, and Long-Context Reasoning

This week’s AI paper roundup reviews six recent research works—including RecGPT‑V2, Nemotron 3 Nano, FrontierScience benchmark, AutoGLM, Deeper‑GXX, and QwenLong‑L1.5—highlighting advances in large‑language‑model‑driven recommendation, Mixture‑of‑Experts models, expert‑level scientific reasoning, GUI‑based foundation agents, graph neural network deepening, and ultra‑long‑context inference.

HyperAI Super Neural

Dec 19, 2025

Weekly AI Paper Digest: Open-Source LLMs, Agent Systems, and Long-Context Reasoning

This week’s AI paper roundup highlights six recent research works that push the boundaries of large‑language‑model (LLM) applications, from recommendation and agent systems to ultra‑long‑context reasoning.

RecGPT‑V2 addresses four limitations of RecGPT‑V1—high computational cost, limited template diversity, restricted supervised generalization, and result‑only evaluation—by introducing (1) a hierarchical multi‑agent system, (2) a meta‑prompting framework, (3) constrained reinforcement learning, and (4) an agent‑as‑judge evaluation module. The authors demonstrate that intent reasoning with LLMs is both technically feasible and commercially viable at industrial scale.

Nemotron 3 Nano is a 30B‑A3B Mixture‑of‑Experts Mamba‑Transformer model pretrained on 250 trillion tokens (including over 30 trillion new tokens compared with Nemotron 2). After supervised fine‑tuning and large‑scale reinforcement learning, the model shows notable gains in agent behavior, reasoning ability, and dialogue interaction, and supports context lengths up to one million tokens.

FrontierScience proposes a benchmark for assessing AI’s expert‑level scientific reasoning. It comprises two tracks: an Olympiad track covering International Physics, Chemistry, and Biology Olympiad‑level problems, and a Research track featuring doctoral‑level, open‑ended scientific questions.

AutoGLM introduces a new series of foundation agents built on the ChatGLM architecture, designed to autonomously control graphical user interfaces (GUIs). The system is demonstrated on web browsers and mobile devices, showcasing practical autonomous interaction with real‑world GUIs.

Deeper‑GXX presents a technique for deepening arbitrary graph neural networks (GNNs), enabling more expressive representations without altering the original architecture.

QwenLong‑L1.5 builds on the Qwen3‑30B‑A3B‑Thinking architecture and applies systematic post‑training innovations to achieve superior long‑context reasoning. On a long‑context benchmark, it approaches GPT‑5 and Gemini‑2.5‑Pro performance, improving the baseline by 9.90 points. Its memory‑agent framework yields an additional 9.48‑point gain on ultra‑long tasks ranging from one to four million tokens.