Tagged articles

curriculum-learning

18 articles · Page 1 of 1

Jul 4, 2026 · Artificial Intelligence

When Swapping Two Images Breaks VLMs: EgoTSR Enables Robots to Judge Real Task Progress

The paper reveals that visual language models often rely on chronological bias, mistaking later frames for progress, and introduces EgoTSR—a 46‑million‑sample ego‑centric dataset and three‑stage curriculum that teaches models to assess task state, evaluate with forward‑reverse tests, and achieve over 92% accuracy on long‑term robotic tasks.

chronological-biascurriculum-learningego-centric reasoning

0 likes · 11 min read

When Swapping Two Images Breaks VLMs: EgoTSR Enables Robots to Judge Real Task Progress

Machine Heart

Jun 21, 2026 · Artificial Intelligence

From Detection to Repair: Closing the Loop in AI‑Generated Image Forensics

GenShield, a unified autoregressive framework introduced by researchers from Peking University, combines explainable AI‑generated image detection with controllable artifact correction, leveraging a two‑stage Visual Chain‑of‑Thought curriculum and the newly built GenShield‑Set dataset to achieve state‑of‑the‑art performance on both detection and repair benchmarks.

AI-generated image detectionGenShieldVisual Chain-of-Thought

0 likes · 8 min read

From Detection to Repair: Closing the Loop in AI‑Generated Image Forensics

Alimama Tech

Jun 4, 2026 · Artificial Intelligence

ICML 2026 Highlights: Five Taotian Group Papers Pushing Multimodal AI Boundaries

The article showcases five ICML 2026 papers from the Taotian Group that tackle core multimodal AI challenges—interactive video try‑on, high‑resolution vision, e‑commerce video reasoning, sparse‑reward reinforcement learning, and curriculum learning for large language models—detailing their problem statements, novel solutions, and strong experimental results.

BenchmarkICML 2026Large Language Models

0 likes · 15 min read

ICML 2026 Highlights: Five Taotian Group Papers Pushing Multimodal AI Boundaries

Machine Learning Algorithms & Natural Language Processing

May 21, 2026 · Artificial Intelligence

Breaking the UED Bottleneck: PACE Locates the Reinforcement‑Learning Zone of Proximal Development

The paper introduces PACE, a Parameter‑Change based Unsupervised Environment Design method that evaluates training levels by the magnitude of induced policy‑parameter updates, offering a low‑variance, computationally cheap signal that consistently outperforms prior UED approaches on MiniGrid and Craftax benchmarks.

CraftaxICML 2026MiniGrid

0 likes · 11 min read

Breaking the UED Bottleneck: PACE Locates the Reinforcement‑Learning Zone of Proximal Development

Machine Heart

May 21, 2026 · Artificial Intelligence

Breaking the Traditional UED Bottleneck: Using RL to Precisely Locate the Zone of Proximal Development

The paper introduces PACE, a Parameter Change Environment Design method that evaluates training levels by measuring induced policy parameter updates, offering a low‑variance learning‑progress signal that outperforms prior UED approaches on MiniGrid and Craftax benchmarks, achieving higher success rates and more stable generalization.

CraftaxICML 2026MiniGrid

0 likes · 10 min read

Breaking the Traditional UED Bottleneck: Using RL to Precisely Locate the Zone of Proximal Development

Bighead's Algorithm Notes

Feb 1, 2026 · Artificial Intelligence

Beyond Historical Data: Adaptive Synthesis for Financial Time Series

This article reviews a recent paper that proposes a drift‑aware data‑stream system integrating machine‑learning‑based adaptive control into financial data management, introducing a parametric data‑operation module, a gradient‑based bi‑level optimizer, and a curriculum planner to improve model robustness and risk‑adjusted returns in non‑stationary markets.

adaptive data synthesisconcept driftcurriculum-learning

0 likes · 18 min read

Beyond Historical Data: Adaptive Synthesis for Financial Time Series

Tencent Advertising Technology

Dec 25, 2025 · Artificial Intelligence

How RAVEN Leverages Reinforcement Reasoning for Precise Ad Video Violation Grounding

RAVEN is a reinforcement‑reasoning framework that combines curriculum learning with hierarchical rewards to enable multimodal large language models to accurately locate and classify violation segments in advertisement videos, even under noisy, large‑scale industrial data.

Advertisingcurriculum-learningmultimodal LLM

0 likes · 17 min read

How RAVEN Leverages Reinforcement Reasoning for Precise Ad Video Violation Grounding

Amap Tech

Nov 4, 2025 · Artificial Intelligence

Spacetime‑GR: AI‑Powered Spatiotemporal Model Transforming POI Recommendations

This article introduces Spacetime‑GR, a large‑scale generative recommendation model that integrates hierarchical geographic POI indexing and spatiotemporal token encoding to enhance POI prediction for Amap, detailing its pre‑training pipeline, data cleaning, curriculum learning strategy, experimental results, scaling law observations, and the resulting improvements in hit rate and discovery rate.

AmapGenerative AIPOI recommendation

0 likes · 14 min read

Spacetime‑GR: AI‑Powered Spatiotemporal Model Transforming POI Recommendations

Network Intelligence Research Center (NIRC)

Nov 4, 2025 · Artificial Intelligence

SEAgent: A Self‑Evolving Computer Agent that Learns Software Use Autonomously

SEAgent introduces a self‑evolving framework that enables a GUI agent to master unfamiliar software through autonomous exploration and experience learning, leveraging a curriculum generator, a world‑state model, and GRPO‑based reinforcement with adversarial imitation, achieving state‑of‑the‑art performance on OSWorld.

GUI automationSEAgentautonomous learning

0 likes · 6 min read

SEAgent: A Self‑Evolving Computer Agent that Learns Software Use Autonomously

Architect

Mar 9, 2025 · Artificial Intelligence

Experiments with Reinforcement Learning Fine‑Tuning of a 0.5B Qwen Model on the KK Dataset

The author reports a series of reinforcement‑learning‑based fine‑tuning experiments on a 0.5‑billion‑parameter Qwen‑0.5VB instruct model using the KK dataset, detailing reward design adjustments, curriculum‑style data scaling, observed convergence issues, and hypotheses about why small models fail to develop long reasoning chains.

LLM fine-tuningcurriculum-learningreinforcement learning

0 likes · 11 min read

Experiments with Reinforcement Learning Fine‑Tuning of a 0.5B Qwen Model on the KK Dataset

Baobao Algorithm Notes

Mar 5, 2025 · Artificial Intelligence

Why My 0.5B LLM’s Reasoning Collapsed During RLHF on Logic Puzzles

The author experiments with reinforcement‑learning‑from‑human‑feedback on a 0.5B Qwen instruct model using Logic‑RL and Open‑R1, discovers that reward mis‑design and curriculum learning cause the model to produce overly short or incorrect reasoning chains on knight‑and‑knave puzzles, and analyses the underlying causes.

Artificial IntelligenceLogic ReasoningRLHF

0 likes · 11 min read

Why My 0.5B LLM’s Reasoning Collapsed During RLHF on Logic Puzzles

Alibaba Cloud Big Data AI Platform

Nov 8, 2024 · Artificial Intelligence

How TAPIR Boosts Small LLMs with Task‑Aware Curriculum Planning

The paper introduces TAPIR, a task‑aware curriculum planning framework that distills instruction‑following abilities from black‑box LLM teachers into smaller student models by filtering difficult prompts, resampling tasks, enhancing response styles, and iteratively optimizing across multiple training rounds, achieving superior performance on benchmark evaluations.

Instruction TuningKnowledge DistillationLLM distillation

0 likes · 10 min read

How TAPIR Boosts Small LLMs with Task‑Aware Curriculum Planning

Baobao Algorithm Notes

Sep 24, 2024 · Artificial Intelligence

From Zero to One: A Practical Guide to Pretraining Large Language Models

This comprehensive guide walks you through every stage of LLM pretraining—from data sourcing, cleaning, and deduplication to tokenizer design, model architecture choices, training framework selection, optimization tricks, and evaluation methods—highlighting common pitfalls and practical solutions for building robust models.

LLM pretrainingTraining Frameworkcurriculum-learning

0 likes · 34 min read

From Zero to One: A Practical Guide to Pretraining Large Language Models

DataFunTalk

Aug 24, 2023 · Artificial Intelligence

Multi-Agent Decision Large Models: Challenges, Action Semantic Networks, Permutation Invariance/Equivariance, and Automated Curriculum Learning

This talk outlines the fundamental challenges of multi‑agent decision large models, introduces three core design priors—action semantic networks, permutation invariance/equivariance, and cross‑task automated curriculum learning— and demonstrates how these concepts improve performance across diverse environments such as StarCraft, Neural‑MMO, and SMAC.

.aiMulti-Agent Reinforcement Learningaction semantics

0 likes · 12 min read

Multi-Agent Decision Large Models: Challenges, Action Semantic Networks, Permutation Invariance/Equivariance, and Automated Curriculum Learning

Alimama Tech

Sep 7, 2022 · Artificial Intelligence

Curriculum-Guided Bayesian Reinforcement Learning for ROI-Constrained Real-Time Bidding

The paper presents a Curriculum‑Guided Bayesian Reinforcement Learning (CBRL) framework that models ROI‑constrained real‑time bidding as a partially observable constrained MDP, using hard‑margin indicator rewards and a curriculum of relaxed proxy problems to achieve fast, constraint‑satisfying, Bayes‑optimal policies that outperform existing methods on large‑scale industrial data.

Bayesian RLMDPROI constraint

0 likes · 15 min read

Curriculum-Guided Bayesian Reinforcement Learning for ROI-Constrained Real-Time Bidding

Baobao Algorithm Notes

Mar 3, 2022 · Artificial Intelligence

How Hierarchical Curriculum Learning Improves Dialogue Response Selection

This article explains how treating negative response candidates with varying difficulty through a hierarchical curriculum learning framework—combining corpus‑level and instance‑level curricula—enhances dialogue response selection models, backed by experiments on Douban, Ubuntu, and E‑Commerce datasets.

curriculum-learningdialogue response selectionhierarchical learning

0 likes · 8 min read

How Hierarchical Curriculum Learning Improves Dialogue Response Selection

Youku Technology

Dec 2, 2021 · Artificial Intelligence

Hybrid Curriculum Learning for Emotion Recognition in Conversation

The paper introduces a hybrid curriculum learning framework that tackles emotion shift and confusing labels in emotion recognition in conversation by applying nested curriculum stages at both conversation and utterance levels, enabling a progressive easy‑to‑hard training that markedly boosts classic ERC model performance across four public datasets and is already deployed in Alibaba’s entertainment AI brain script health‑check service.

Emotion Recognitionconversation analysiscurriculum-learning

0 likes · 2 min read

Hybrid Curriculum Learning for Emotion Recognition in Conversation

DataFunTalk

Mar 20, 2019 · Artificial Intelligence

Addressing Sparse Reward Problems in Model-Free Reinforcement Learning

This article reviews the challenges of model‑free reinforcement learning, especially sparse reward issues exemplified by Montezuma’s Revenge, and surveys recent approaches such as expert demonstrations, curriculum learning, self‑play, hierarchical reinforcement learning, and count‑based exploration to mitigate these problems.

Model-freecurriculum-learningexploration

0 likes · 12 min read

Addressing Sparse Reward Problems in Model-Free Reinforcement Learning