Tagged articles

16 articles

Page 1 of 1

May 18, 2026 · Artificial Intelligence

ICML 2026: From Single‑Threaded Thinking to Native Parallel Reasoning in Agents

The paper introduces Native Parallel Reasoner (NPR), a framework that lets language agents generate and maintain multiple reasoning paths using a three‑stage self‑distillation and parallel reinforcement‑learning training paradigm, achieving up to 4.6× speedup and significant accuracy gains across eight reasoning benchmarks.

AI reasoningLarge Language ModelsNative Parallel Reasoner

0 likes · 18 min read

ICML 2026: From Single‑Threaded Thinking to Native Parallel Reasoning in Agents

PaperAgent

May 13, 2026 · Artificial Intelligence

One-for-All Multi-Agent Collaboration: Adaptive Cross-Task Topology Design

The paper introduces OFA-MAS, a one‑for‑all multi‑agent system that learns a universal topology designer using task‑aware graph encoding and a Mixture‑of‑Experts generator, achieving superior performance, OOD generalization, robustness, and efficiency across six major benchmarks.

LLMMixture of ExpertsTask-Aware Graph Encoder

0 likes · 14 min read

One-for-All Multi-Agent Collaboration: Adaptive Cross-Task Topology Design

Machine Heart

May 12, 2026 · Artificial Intelligence

DECS Cuts Overthinking in Models: Halve Inference Tokens and Raise Accuracy

DECS, a novel training framework introduced by researchers from Fudan, Shanghai Jiao Tong, and the Shanghai AI Lab, theoretically exposes the flaws of length‑penalty rewards and, through token‑level reward decoupling and dynamic batch scheduling, reduces inference token counts by over 50% while improving accuracy across multiple benchmarks.

DECSLarge Language Modelsbenchmark evaluation

0 likes · 9 min read

DECS Cuts Overthinking in Models: Halve Inference Tokens and Raise Accuracy

Machine Heart

May 5, 2026 · Artificial Intelligence

Agent-World: Scaling Real-World Environments for Co‑Evolving Agents and Their Worlds

Agent-World introduces a universal training arena that automatically mines real‑world data from the internet to build over 1,900 diverse environments and 19,800 tools, then generates long‑horizon tasks through graph‑based and programmatic synthesis, creating a self‑evolving loop where agents are evaluated, diagnosed, and the environment is refined, achieving state‑of‑the‑art results on 23 benchmarks.

AI agentsAgent-WorldLarge-Scale Training

0 likes · 14 min read

Agent-World: Scaling Real-World Environments for Co‑Evolving Agents and Their Worlds

Bighead's Algorithm Notes

Apr 14, 2026 · Artificial Intelligence

How Self‑Supervised HINTS Extracts Human Insights from Time Series to Boost Forecast Accuracy

The paper introduces HINTS, a two‑stage self‑supervised framework that leverages Friedkin‑Johnsen opinion dynamics to mine latent human‑driven factors from time‑series residuals, integrates them via attention into state‑of‑the‑art predictors, and demonstrates consistent accuracy gains and interpretability across nine benchmark and real‑world datasets.

Attention MechanismFriedkin-Johnsen modelbenchmark evaluation

0 likes · 17 min read

How Self‑Supervised HINTS Extracts Human Insights from Time Series to Boost Forecast Accuracy

SuanNi

Apr 3, 2026 · Artificial Intelligence

How GEMS Lets a 6B Open‑Source Model Beat Top Closed‑Source Image Generators

The article presents the GEMS (Agent‑Native Multimodal Generation with Memory and Skills) framework, detailing its multi‑agent loop, hierarchical memory compression, on‑demand skill modules, and extensive benchmark results that show a lightweight 6B model surpassing larger proprietary systems on complex image‑generation tasks.

GEMSImage GenerationMultimodal AI

0 likes · 14 min read

How GEMS Lets a 6B Open‑Source Model Beat Top Closed‑Source Image Generators

SuanNi

Mar 20, 2026 · Artificial Intelligence

How XSKILL Lets Multimodal AI Agents Learn Without Updating Parameters

XSKILL introduces a dual‑stream framework that separates task‑level skills stored as Markdown and action‑level experiences stored as JSON, enabling multimodal large language model agents to continuously improve by extracting, summarizing, and reusing knowledge from past trajectories without modifying model parameters, achieving significant gains across visual tool, multimodal search, and integrated benchmarks.

Agent FrameworkMultimodal AIbenchmark evaluation

0 likes · 12 min read

How XSKILL Lets Multimodal AI Agents Learn Without Updating Parameters

Instant Consumer Technology Team

Dec 18, 2025 · Artificial Intelligence

How a Multi‑Agent Framework Boosts Graph Chain‑of‑Thought Reasoning Efficiency

The paper introduces GLM, a multi‑agent Graph‑CoT framework with an optimized LLM serving architecture that dramatically improves accuracy, reduces token consumption, lowers latency, and increases throughput across diverse domains, as demonstrated by extensive GRBench evaluations.

LLM optimizationMulti-AgentToken efficiency

0 likes · 10 min read

How a Multi‑Agent Framework Boosts Graph Chain‑of‑Thought Reasoning Efficiency

AntTech

Oct 14, 2025 · Artificial Intelligence

How Ring-1T Achieves Trillion-Scale Deep Thinking and Competitive Benchmarks

The Ring-1T model, a trillion-parameter AI system released as open source, leverages advanced reinforcement learning techniques, extensive benchmark evaluations, and custom training frameworks to deliver balanced performance across math, code, reasoning, and creative tasks while highlighting current limitations and future development plans.

AI modelReinforcement Learningbenchmark evaluation

0 likes · 8 min read

How Ring-1T Achieves Trillion-Scale Deep Thinking and Competitive Benchmarks

DataFunTalk

Jun 17, 2025 · Artificial Intelligence

MiniMax M1: Open‑Source LLM That Rivals Gemini 2.5 Pro in Long‑Context Benchmarks

MiniMax’s newly released open‑source M1 model, built on the Lightning Attention‑enhanced MiniMax‑01 base, delivers up to 1 million token context, achieves near‑state‑of‑the‑art performance on MRCR and other long‑context benchmarks, and showcases impressive multilingual translation, code completion, and creative applications.

Lightning AttentionMiniMaxbenchmark evaluation

0 likes · 11 min read

MiniMax M1: Open‑Source LLM That Rivals Gemini 2.5 Pro in Long‑Context Benchmarks

Tencent Technical Engineering

Jun 5, 2025 · Artificial Intelligence

How AI Agents Turn 0‑Day Vulnerability Hunting into an Automated Production Line

This article explores how a multi‑agent AI system dramatically improves 0‑day vulnerability detection by automating code audit, reducing false positives, and outperforming traditional static analysis tools in large‑scale real‑world benchmarks.

0day vulnerabilityAI AgentAutomated Security Testing

0 likes · 9 min read

How AI Agents Turn 0‑Day Vulnerability Hunting into an Automated Production Line

AI Frontier Lectures

Apr 6, 2025 · Artificial Intelligence

Can Multi‑Round Thinking Boost LLM Accuracy Without Extra Training?

A new study from the a‑m‑team introduces “Think Twice”, a test‑time multi‑round reasoning technique that, without additional training or model changes, repeatedly prompts large language models to self‑correct, yielding notable accuracy gains across benchmarks such as AIME, MATH‑500, GPQA‑Diamond and LiveCodeBench, while also producing shorter, more confident answers.

Artificial IntelligenceLLMMulti-round reasoning

0 likes · 6 min read

Can Multi‑Round Thinking Boost LLM Accuracy Without Extra Training?

Alibaba Cloud Big Data AI Platform

Mar 29, 2025 · Artificial Intelligence

How DistilQwen2.5‑R1 Boosts Small‑Model Reasoning with Innovative Knowledge Distillation

The article introduces the DistilQwen2.5‑R1 series, which leverages a novel knowledge‑distillation pipeline—including CoT data evaluation, improvement, and validation—to transfer deep reasoning abilities from large models like DeepSeek‑R1 to compact models, achieving superior performance across math, code, and scientific benchmarks and providing open‑source checkpoints and deployment guides for practical use.

AI inferenceLarge Language Modelsbenchmark evaluation

0 likes · 17 min read

How DistilQwen2.5‑R1 Boosts Small‑Model Reasoning with Innovative Knowledge Distillation

Baobao Algorithm Notes

Oct 29, 2024 · Artificial Intelligence

Reproducing OpenAI o1: Steiner Model’s Reasoning, Training, and Evaluation

This report details the design, data synthesis, three‑stage training pipeline, and benchmark evaluation of the open‑source Steiner reasoning model, which aims to emulate OpenAI o1’s inference‑time scaling while highlighting current performance gaps and future research challenges.

Inference ScalingLLMReasoning Models

0 likes · 14 min read

Reproducing OpenAI o1: Steiner Model’s Reasoning, Training, and Evaluation

Baobao Algorithm Notes

Jun 28, 2024 · Artificial Intelligence

What Makes Gemma 2 a Competitive Open‑Source LLM? Architecture, Training, and Evaluation Insights

The article provides a detailed technical overview of Gemma 2, covering its decoder‑only transformer design, novel attention mechanisms, logit soft‑capping, RMSNorm, knowledge‑distillation training on trillions of tokens, extensive pre‑training infrastructure, and benchmark evaluations that demonstrate its competitiveness against larger proprietary models.

AIGemma 2Model architecture

0 likes · 14 min read

What Makes Gemma 2 a Competitive Open‑Source LLM? Architecture, Training, and Evaluation Insights

AntTech

Apr 17, 2024 · Artificial Intelligence

LLMRG: Improving Recommendations through Large Language Model Reasoning Graphs

LLMRG introduces a novel framework that leverages large language models to construct personalized reasoning graphs, integrating chain reasoning, self‑verification, divergent extension, and knowledge‑base self‑improvement, thereby enhancing recommendation accuracy, interpretability, and performance across multiple benchmark datasets without additional user or item information.

AIInterpretabilityLarge Language Models

0 likes · 9 min read

LLMRG: Improving Recommendations through Large Language Model Reasoning Graphs