Author

PaperAgent

Daily updates, analyzing cutting-edge AI research papers

219

Articles

Likes

433

Views

Comments

Latest from PaperAgent

100 recent articles max

PaperAgent

Mar 22, 2026 · Artificial Intelligence

How AI Agents Like OpenClaw Turn LLMs into Autonomous Assistants

This article explains what AI agents are, how they differ from ordinary language‑model interfaces, and walks through OpenClaw’s workflow, tool usage, security challenges, memory handling, and advanced features such as sub‑agents and context compaction, offering practical insights for building safe autonomous AI systems.

AI AgentOpenClawSecurity

0 likes · 27 min read

How AI Agents Like OpenClaw Turn LLMs into Autonomous Assistants

PaperAgent

Mar 22, 2026 · Artificial Intelligence

Can LLM Agents Self‑Evolve Without Retraining? Inside Memento‑Skills

The article analyzes the Memento‑Skills framework, which treats external memory as executable skills to enable deployment‑time continual learning for frozen LLM agents, detailing its read‑write reflective loop, skill‑as‑memory design, behavior‑trained skill router, experimental validation on GAIA and HLE benchmarks, and theoretical guarantees without gradient updates.

AIContinual learningLLM

0 likes · 9 min read

Can LLM Agents Self‑Evolve Without Retraining? Inside Memento‑Skills

PaperAgent

Mar 21, 2026 · Artificial Intelligence

How Cursor’s Composer 2 Leverages Self‑Summarization and RL for Long‑Horizon Tasks

The article examines Cursor’s Composer 2 model, detailing its self‑summarization reinforcement‑learning workflow, the limitations of traditional compression methods, token‑efficient results on the CursorBench benchmark, and a challenging Terminal‑Bench case study that demonstrates dramatically reduced token usage while improving performance.

Composer 2CursorSelf‑Summarization

0 likes · 9 min read

How Cursor’s Composer 2 Leverages Self‑Summarization and RL for Long‑Horizon Tasks

PaperAgent

Mar 21, 2026 · Artificial Intelligence

Can AI Truly Be Creative? Inside the CreativeBench Benchmark

This article examines the CreativeBench benchmark, which redefines machine creativity by measuring both the quality and novelty of generated solutions, explains its combinatorial and exploratory task designs, details the self‑evolving task construction process, and discusses key findings and the EvoRePE enhancement method.

AI benchmarkEvoRePElarge language models

0 likes · 18 min read

Can AI Truly Be Creative? Inside the CreativeBench Benchmark

PaperAgent

Mar 21, 2026 · Artificial Intelligence

Can Peer Review Boost Large Language Model Ensembles? Introducing LLM‑PeerReview

This article analyzes the unsupervised LLM‑PeerReview framework, which uses a peer‑review inspired scoring, reasoning, and selection pipeline—including a novel flipped‑triple scoring trick—to combine multiple large language models and achieve significant performance gains over existing ensemble and collaboration baselines.

Artificial IntelligenceFlipped Triple ScoringLLM Ensemble

0 likes · 11 min read

Can Peer Review Boost Large Language Model Ensembles? Introducing LLM‑PeerReview

PaperAgent

Mar 19, 2026 · Artificial Intelligence

How Scale‑SWE’s Real‑World Software Engineering Dataset Supercharges AI Models

The Scale‑SWE project releases a 100k‑task real software‑engineering dataset built with a sandboxed multi‑agent workflow, demonstrating that models fine‑tuned on this data achieve 64% on SWE‑bench‑Verified and surpass leading industrial baselines, highlighting the critical value of authentic SWE data.

AI agentsQwen3-30A3B-InstructScale-SWE

0 likes · 7 min read

How Scale‑SWE’s Real‑World Software Engineering Dataset Supercharges AI Models

PaperAgent

Mar 19, 2026 · Artificial Intelligence

How MDER‑DR Boosts Multi‑Hop KG QA with Entity‑Centric Summaries

The article presents the MDER‑DR two‑stage framework that tackles semantic loss in knowledge‑graph triple indexing by generating context‑aware entity summaries and using an LLM‑driven decompose‑parse retrieval loop, achieving up to 66% performance gains on multi‑hop question answering benchmarks.

Entity SummarizationKG QALLM

0 likes · 5 min read

How MDER‑DR Boosts Multi‑Hop KG QA with Entity‑Centric Summaries

PaperAgent

Mar 17, 2026 · Artificial Intelligence

Can Attention Replace Fixed Residuals? Inside the ‘Attention Residuals’ Breakthrough

This article analyzes the newly released Attention Residuals paper, explaining how learnable attention weighting replaces fixed residual addition to mitigate information dilution in deep LLMs, detailing the proposed Block AttnRes design, engineering trade‑offs, experimental results, and its significance for foundational model architecture.

Block AttentionLLMResidual Connections

0 likes · 9 min read

Can Attention Replace Fixed Residuals? Inside the ‘Attention Residuals’ Breakthrough

PaperAgent

Mar 16, 2026 · Artificial Intelligence

How GLM-5-Turbo Turns an AI Research Lab into a 24‑Hour Autonomous Writer

The article details how the newly released GLM-5-Turbo "lobster" model powers an AI research Lab that automatically generates a complete OpenClaw survey paper—from topic brainstorming and literature mining to outline drafting, manuscript writing, and AAAI‑style submission—within an hour, showcasing benchmark results, prompt templates, and practical skill installations.

AI research automationAutoClawGLM-5-Turbo

0 likes · 10 min read

How GLM-5-Turbo Turns an AI Research Lab into a 24‑Hour Autonomous Writer

PaperAgent

Mar 15, 2026 · Artificial Intelligence

Why LLM Tool‑Calling Benchmarks Miss Real Users: Introducing WildToolBench

WildToolBench reveals that existing LLM tool‑calling benchmarks overlook real‑world user behavior, and a comprehensive evaluation of 58 models shows even the strongest agents achieve less than 15% session accuracy, highlighting a huge gap between reported performance and practical usability.

LLMagentic AIbenchmark

0 likes · 10 min read

Why LLM Tool‑Calling Benchmarks Miss Real Users: Introducing WildToolBench