PaperAgent
Author

PaperAgent

Daily updates, analyzing cutting-edge AI research papers

216
Articles
1
Likes
415
Views
0
Comments
Recent Articles

Latest from PaperAgent

100 recent articles max
PaperAgent
PaperAgent
Mar 21, 2026 · Artificial Intelligence

Can AI Truly Be Creative? Inside the CreativeBench Benchmark

This article examines the CreativeBench benchmark, which redefines machine creativity by measuring both the quality and novelty of generated solutions, explains its combinatorial and exploratory task designs, details the self‑evolving task construction process, and discusses key findings and the EvoRePE enhancement method.

AI benchmarkEvoRePElarge language models
0 likes · 18 min read
Can AI Truly Be Creative? Inside the CreativeBench Benchmark
PaperAgent
PaperAgent
Mar 21, 2026 · Artificial Intelligence

Can Peer Review Boost Large Language Model Ensembles? Introducing LLM‑PeerReview

This article analyzes the unsupervised LLM‑PeerReview framework, which uses a peer‑review inspired scoring, reasoning, and selection pipeline—including a novel flipped‑triple scoring trick—to combine multiple large language models and achieve significant performance gains over existing ensemble and collaboration baselines.

Artificial IntelligenceFlipped Triple ScoringLLM Ensemble
0 likes · 11 min read
Can Peer Review Boost Large Language Model Ensembles? Introducing LLM‑PeerReview
PaperAgent
PaperAgent
Mar 19, 2026 · Artificial Intelligence

How Scale‑SWE’s Real‑World Software Engineering Dataset Supercharges AI Models

The Scale‑SWE project releases a 100k‑task real software‑engineering dataset built with a sandboxed multi‑agent workflow, demonstrating that models fine‑tuned on this data achieve 64% on SWE‑bench‑Verified and surpass leading industrial baselines, highlighting the critical value of authentic SWE data.

AI agentsModel EvaluationQwen3-30A3B-Instruct
0 likes · 7 min read
How Scale‑SWE’s Real‑World Software Engineering Dataset Supercharges AI Models
PaperAgent
PaperAgent
Mar 19, 2026 · Artificial Intelligence

How MDER‑DR Boosts Multi‑Hop KG QA with Entity‑Centric Summaries

The article presents the MDER‑DR two‑stage framework that tackles semantic loss in knowledge‑graph triple indexing by generating context‑aware entity summaries and using an LLM‑driven decompose‑parse retrieval loop, achieving up to 66% performance gains on multi‑hop question answering benchmarks.

Entity SummarizationKG QAKnowledge Graph
0 likes · 5 min read
How MDER‑DR Boosts Multi‑Hop KG QA with Entity‑Centric Summaries
PaperAgent
PaperAgent
Mar 17, 2026 · Artificial Intelligence

Can Attention Replace Fixed Residuals? Inside the ‘Attention Residuals’ Breakthrough

This article analyzes the newly released Attention Residuals paper, explaining how learnable attention weighting replaces fixed residual addition to mitigate information dilution in deep LLMs, detailing the proposed Block AttnRes design, engineering trade‑offs, experimental results, and its significance for foundational model architecture.

Block AttentionDeep LearningLLM
0 likes · 9 min read
Can Attention Replace Fixed Residuals? Inside the ‘Attention Residuals’ Breakthrough
PaperAgent
PaperAgent
Mar 16, 2026 · Artificial Intelligence

How GLM-5-Turbo Turns an AI Research Lab into a 24‑Hour Autonomous Writer

The article details how the newly released GLM-5-Turbo "lobster" model powers an AI research Lab that automatically generates a complete OpenClaw survey paper—from topic brainstorming and literature mining to outline drafting, manuscript writing, and AAAI‑style submission—within an hour, showcasing benchmark results, prompt templates, and practical skill installations.

AI research automationAutoClawGLM-5-Turbo
0 likes · 10 min read
How GLM-5-Turbo Turns an AI Research Lab into a 24‑Hour Autonomous Writer
PaperAgent
PaperAgent
Mar 15, 2026 · Artificial Intelligence

Why LLM Tool‑Calling Benchmarks Miss Real Users: Introducing WildToolBench

WildToolBench reveals that existing LLM tool‑calling benchmarks overlook real‑world user behavior, and a comprehensive evaluation of 58 models shows even the strongest agents achieve less than 15% session accuracy, highlighting a huge gap between reported performance and practical usability.

EvaluationLLMagentic AI
0 likes · 10 min read
Why LLM Tool‑Calling Benchmarks Miss Real Users: Introducing WildToolBench
PaperAgent
PaperAgent
Mar 11, 2026 · Artificial Intelligence

Can Full‑Modal AI Agents Master Vision, Audio, and Tools? Meet OmniGAIA & OmniAtlas

This article introduces OmniGAIA, a challenging full‑modal benchmark with 360 real‑world tasks, and OmniAtlas, a training framework that equips multimodal agents with active perception and tool‑integrated reasoning, showing substantial performance gains over existing open‑source models through extensive experiments and analysis.

AgentOmniAtlasOmniGAIA
0 likes · 16 min read
Can Full‑Modal AI Agents Master Vision, Audio, and Tools? Meet OmniGAIA & OmniAtlas
PaperAgent
PaperAgent
Mar 10, 2026 · Information Security

How Token‑Draining Attacks and Formal Defenses Threaten OpenClaw’s Skill Ecosystem

The article analyzes recent security research on OpenClaw, exposing large‑scale malicious Skill injections, a novel token‑exhaustion attack called Clawdrain, and the SkillFortify formal framework that achieves near‑perfect detection of malicious Skills while highlighting the limitations of heuristic scanners.

OpenClawToken Exhaustionformal verification
0 likes · 11 min read
How Token‑Draining Attacks and Formal Defenses Threaten OpenClaw’s Skill Ecosystem
PaperAgent
PaperAgent
Mar 10, 2026 · Artificial Intelligence

How MemSifter Delivers High‑Precision, Low‑Cost Long‑Term Memory for LLMs

MemSifter introduces a lightweight agent that outsources memory retrieval for large language models, using a Think‑and‑Rank pipeline and a task‑result‑oriented reinforcement‑learning training paradigm to achieve superior retrieval accuracy and efficiency across eight benchmark tasks while keeping inference overhead minimal.

AgentLLMReinforcement Learning
0 likes · 13 min read
How MemSifter Delivers High‑Precision, Low‑Cost Long‑Term Memory for LLMs