PaperAgent
Author

PaperAgent

Daily updates, analyzing cutting-edge AI research papers

216
Articles
1
Likes
414
Views
0
Comments
Recent Articles

Latest from PaperAgent

100 recent articles max
PaperAgent
PaperAgent
May 3, 2026 · Artificial Intelligence

Skill Graphs Reveal Why Training Diversity Beats Quantity for Terminal Agents

The paper shows that, instead of increasing the number of training tasks, controlling the diversity of scene‑skill combinations via a large‑scale Skill Graph dramatically improves terminal‑agent performance, with Qwen3‑32B surpassing a 480B model on the Terminal‑Bench 2.0 benchmark.

LLMQwen3Skill Graphs
0 likes · 9 min read
Skill Graphs Reveal Why Training Diversity Beats Quantity for Terminal Agents
PaperAgent
PaperAgent
May 2, 2026 · Artificial Intelligence

Can Harnesses Self‑Evolve? Fudan & Peking University’s Agentic Harness Engineering Breakthrough

The paper introduces Agentic Harness Engineering (AHE), showing that a 10‑round evolution improves Coding Agent pass@1 from 69.7% to 77.0% on Terminal‑Bench 2—outperforming Codex‑CLI—and that the evolved harness transfers zero‑shot to SWE‑bench and multiple model families, thanks to three observability pillars.

Ablation StudyCoding AgentHarness Engineering
0 likes · 11 min read
Can Harnesses Self‑Evolve? Fudan & Peking University’s Agentic Harness Engineering Breakthrough
PaperAgent
PaperAgent
Apr 30, 2026 · Artificial Intelligence

DeepSeek Unveils Open‑Source Multimodal Model: “Thinking with Visual Primitives”

DeepSeek releases an open‑source multimodal LLM that introduces a visual‑primitive framework—elevating bounding boxes and points to token level—to close the reference gap, achieve extreme KV‑cache compression, and outperform GPT‑5.4, Claude‑Sonnet‑4.6 and Gemini‑3‑Flash on counting, spatial reasoning, maze navigation and path‑tracing benchmarks.

DeepSeekLLMVisual Primitives
0 likes · 13 min read
DeepSeek Unveils Open‑Source Multimodal Model: “Thinking with Visual Primitives”
PaperAgent
PaperAgent
Apr 30, 2026 · Artificial Intelligence

How Agentic AI is Redefining World Modeling

The article reviews the paper "Agentic World Modeling: Foundations, Capabilities, Laws, and Beyond", introducing a two‑axis framework (capability levels L1‑L3 and law domains) to map diverse world‑modeling systems, highlighting that most current systems stall at L1, that explicit law encoding is crucial for long‑term stability, and that L3 represents the ultimate, self‑evolving model.

AI agentsAI researchagentic AI
1 likes · 6 min read
How Agentic AI is Redefining World Modeling
PaperAgent
PaperAgent
Apr 29, 2026 · Artificial Intelligence

Skill‑Driven Reasoning Cuts Tokens by Up to 59% While Boosting Accuracy

The article introduces the TRS (Thinking with Reasoning Skills) framework, which distills historical LLM reasoning traces into reusable skill cards, enabling offline skill‑base construction and online retrieval that dramatically reduces token consumption (6‑59%) and often improves accuracy on math and coding tasks.

Code GenerationReasoning SkillsTRS
0 likes · 13 min read
Skill‑Driven Reasoning Cuts Tokens by Up to 59% While Boosting Accuracy
PaperAgent
PaperAgent
Apr 28, 2026 · Artificial Intelligence

MiniCPM‑o 4.5 Achieves Full‑Duplex Multimodal AI That DeepSeek V4 Missed

MiniCPM‑o 4.5 introduces the world’s first end‑to‑end full‑duplex multimodal 9‑billion‑parameter model, powered by the Omni‑Flow framework, running on a single consumer‑grade GPU with 12 GB memory, and delivers benchmark results that match or surpass Gemini 2.5 Flash while offering open‑source demos, APIs, and a Windows/macOS installer.

AIMiniCPM-obenchmark
0 likes · 13 min read
MiniCPM‑o 4.5 Achieves Full‑Duplex Multimodal AI That DeepSeek V4 Missed
PaperAgent
PaperAgent
Apr 27, 2026 · Artificial Intelligence

A Comprehensive Review of Modern LLM Agent Memory Frameworks

The article surveys recent LLM‑based agent memory research, presenting a unified framework that breaks memory systems into four components, detailing their design choices, experimental evaluation on LOCOMO and LONGMEMEVAL, key findings, and a new low‑token SOTA architecture.

Agent MemoryEvaluationInformation Retrieval
0 likes · 8 min read
A Comprehensive Review of Modern LLM Agent Memory Frameworks
PaperAgent
PaperAgent
Apr 26, 2026 · Artificial Intelligence

ICLR 2026 Outstanding Papers Reveal the Real Test for LLMs

The ICLR 2026 Outstanding Paper awards spotlight two studies—one proving Transformers are mathematically succinct and another showing that all major LLMs lose about 39% performance in multi‑turn conversations, exposing a reliability gap missed by single‑turn benchmarks.

AI benchmarksICLR 2026LLM evaluation
0 likes · 7 min read
ICLR 2026 Outstanding Papers Reveal the Real Test for LLMs
PaperAgent
PaperAgent
Apr 25, 2026 · Artificial Intelligence

86K‑Star Repo Turns Karpathy’s Coding Wisdom into Practical AI‑Coding Rules

The article shares four concrete principles distilled from Andrej Karpathy’s experience—captured in the 86.1k‑star "andrej‑karpathy‑skills" repository—to help developers steer large language models toward reliable, concise, and goal‑driven code changes, with installation tips for Claude Code and other AI assistants.

AI codingClaude CodeKarpathy
0 likes · 7 min read
86K‑Star Repo Turns Karpathy’s Coding Wisdom into Practical AI‑Coding Rules