Tagged articles

self-play

11 articles · Page 1 of 1

Machine Learning Algorithms & Natural Language Processing

Jun 19, 2026 · Artificial Intelligence

AutoResearch SKILL Open‑Source: Framework for Long‑Horizon Autonomous Research

The Deli AutoResearch SKILL, now open‑sourced, presents a three‑layer framework that tackles cognitive loops, stalling, and runtime fragility in long‑horizon tasks by persisting state, detecting stalls, and using a heartbeat watchdog, and it includes a paper‑writing skill with self‑play experiments that achieve self‑rated scores up to 8.6.

Autonomous AgentsRL experimentsState Management

0 likes · 17 min read

AutoResearch SKILL Open‑Source: Framework for Long‑Horizon Autonomous Research

AI Frontier Lectures

Feb 10, 2026 · Artificial Intelligence

How SE‑Bench Uncovers the Hidden Challenges of Knowledge Internalization in Self‑Evolving AI

The paper introduces SE‑Bench, a code‑based benchmark that isolates knowledge internalization by obfuscating NumPy APIs, and uses it to reveal the Open‑Book paradox, the RL gap, and the potential of self‑play for true self‑evolution in large language models.

.aiSE-Benchknowledge internalization

0 likes · 17 min read

How SE‑Bench Uncovers the Hidden Challenges of Knowledge Internalization in Self‑Evolving AI

Amap Tech

Jan 14, 2026 · Artificial Intelligence

How ArenaRL Enables Open‑World Travel Agents to Learn via Comparative Reinforcement Learning

Gaode Maps and Tongyi DeepResearch unveil ArenaRL, an open‑domain reinforcement‑learning framework that replaces absolute scoring with relative ranking, uses self‑play and a linear‑complexity tournament, and demonstrates measurable gains on POI ranking and complex travel‑planning tasks.

ArenaRLRankingopen-domain

0 likes · 8 min read

How ArenaRL Enables Open‑World Travel Agents to Learn via Comparative Reinforcement Learning

Data Party THU

Oct 10, 2025 · Artificial Intelligence

Can Language Models Self‑Train Without Data? Inside the Language Self‑Play Framework

This article examines the Language Self‑Play (LSP) approach for data‑free training of large language models, detailing its challenger‑solver game formulation, advantage calculations, loss functions, self‑reward extension, experimental setup on AlpacaEval, and results that show LSP can match or surpass data‑driven baselines.

LLMLarge Language Modelsdata-free training

0 likes · 14 min read

Can Language Models Self‑Train Without Data? Inside the Language Self‑Play Framework

Fighter's World

Sep 30, 2024 · Artificial Intelligence

Exploring Google NotebookLM: Use Cases, Interaction Experience, and Key Insights

The author reviews Google NotebookLM, describing how it aids deep paper reading, boosts chat willingness with guided prompts, maintains conversation coherence through self‑play insights, highlights the audio‑overview feature, and reflects on AI concepts such as the "bitter lesson" and the limits of self‑play in open scenarios.

AI researchGoogleLLM

0 likes · 22 min read

Exploring Google NotebookLM: Use Cases, Interaction Experience, and Key Insights

Architect

Sep 28, 2024 · Artificial Intelligence

How Does OpenAI’s o1 Model Leverage Self‑Play RL and New Scaling Laws?

The article provides an in‑depth technical analysis of OpenAI’s multimodal o1 model, explaining its self‑play reinforcement‑learning pipeline, the novel train‑time and test‑time compute scaling laws, its long‑think reasoning abilities demonstrated through a cipher example, and speculative architectures for generator‑verifier systems.

Large Language ModelsOpenAIRL scaling

0 likes · 35 min read

How Does OpenAI’s o1 Model Leverage Self‑Play RL and New Scaling Laws?

DataFunSummit

Sep 22, 2023 · Artificial Intelligence

Exploring Game AI Agents: Review, LLM‑Driven Exploration, and Future Directions

This article reviews the evolution of game AI agents, examines how large language models (LLMs) can drive new AI behaviors in games, and discusses practical case studies across genres such as Werewolf‑style, war‑SLG, and MOBA games, concluding with challenges and future research directions.

AI agentsGame DevelopmentLLM

0 likes · 31 min read

Exploring Game AI Agents: Review, LLM‑Driven Exploration, and Future Directions

Code DAO

Dec 14, 2021 · Artificial Intelligence

Building a Chess AI from Scratch: Combining AlphaZero and Transformers (Part 2)

This article walks through constructing a learnable chess AI by integrating AlphaZero‑style Monte Carlo Tree Search with a decoder‑only Transformer, detailing the game tree logic, model architecture, input and output encodings, self‑play training loop, and code implementation in PyTorch.

AlphaZeroMonteCarloTreeSearchPyTorch

0 likes · 23 min read

Building a Chess AI from Scratch: Combining AlphaZero and Transformers (Part 2)

Kuaishou Tech

Jun 18, 2021 · Artificial Intelligence

DouZero: A Simple Monte‑Carlo Based AI Achieving Human‑Level Performance in Dou Dizhu

The paper presents DouZero, a reinforcement‑learning AI for the Chinese card game Dou Dizhu that combines a Monte‑Carlo method with a value network, uses binary matrix encodings for states and actions, and achieves human‑level play and state‑of‑the‑art results on modest GPU hardware.

.aiDouZeroMonte Carlo

0 likes · 15 min read

DouZero: A Simple Monte‑Carlo Based AI Achieving Human‑Level Performance in Dou Dizhu

Programmer DD

Jan 3, 2021 · Artificial Intelligence

How Self‑Play and GAIL Powered the WeKick AI to Win the First Google Football Kaggle Championship

After a nostalgic gaming session, the author recounts how Tencent’s upgraded AI, WeKick, leveraged self‑play reinforcement learning, GAIL‑based adversarial simulation, and a multi‑style League framework to dominate the inaugural Google Football Kaggle competition, illustrating the escalating complexity of multi‑agent AI in real‑time strategy games.

GAILKaggle competitionMulti-Agent Systems

0 likes · 8 min read

How Self‑Play and GAIL Powered the WeKick AI to Win the First Google Football Kaggle Championship

DataFunTalk

Mar 20, 2019 · Artificial Intelligence

Addressing Sparse Reward Problems in Model-Free Reinforcement Learning

This article reviews the challenges of model‑free reinforcement learning, especially sparse reward issues exemplified by Montezuma’s Revenge, and surveys recent approaches such as expert demonstrations, curriculum learning, self‑play, hierarchical reinforcement learning, and count‑based exploration to mitigate these problems.

Model-freecurriculum-learningexploration

0 likes · 12 min read

Addressing Sparse Reward Problems in Model-Free Reinforcement Learning