Tagged articles
13 articles
Page 1 of 1
Bighead's Algorithm Notes
Bighead's Algorithm Notes
Feb 1, 2026 · Artificial Intelligence

Beyond Historical Data: Adaptive Synthesis for Financial Time Series

This article reviews a recent paper that proposes a drift‑aware data‑stream system integrating machine‑learning‑based adaptive control into financial data management, introducing a parametric data‑operation module, a gradient‑based bi‑level optimizer, and a curriculum planner to improve model robustness and risk‑adjusted returns in non‑stationary markets.

Quantitative Financeadaptive data synthesisconcept drift
0 likes · 18 min read
Beyond Historical Data: Adaptive Synthesis for Financial Time Series
Tencent Advertising Technology
Tencent Advertising Technology
Dec 25, 2025 · Artificial Intelligence

How RAVEN Leverages Reinforcement Reasoning for Precise Ad Video Violation Grounding

RAVEN is a reinforcement‑reasoning framework that combines curriculum learning with hierarchical rewards to enable multimodal large language models to accurately locate and classify violation segments in advertisement videos, even under noisy, large‑scale industrial data.

AdvertisingReinforcement Learningcurriculum learning
0 likes · 17 min read
How RAVEN Leverages Reinforcement Reasoning for Precise Ad Video Violation Grounding
Amap Tech
Amap Tech
Nov 4, 2025 · Artificial Intelligence

Spacetime‑GR: AI‑Powered Spatiotemporal Model Transforming POI Recommendations

This article introduces Spacetime‑GR, a large‑scale generative recommendation model that integrates hierarchical geographic POI indexing and spatiotemporal token encoding to enhance POI prediction for Amap, detailing its pre‑training pipeline, data cleaning, curriculum learning strategy, experimental results, scaling law observations, and the resulting improvements in hit rate and discovery rate.

AmapPOI recommendationcurriculum learning
0 likes · 14 min read
Spacetime‑GR: AI‑Powered Spatiotemporal Model Transforming POI Recommendations
Network Intelligence Research Center (NIRC)
Network Intelligence Research Center (NIRC)
Nov 4, 2025 · Artificial Intelligence

SEAgent: A Self‑Evolving Computer Agent that Learns Software Use Autonomously

SEAgent introduces a self‑evolving framework that enables a GUI agent to master unfamiliar software through autonomous exploration and experience learning, leveraging a curriculum generator, a world‑state model, and GRPO‑based reinforcement with adversarial imitation, achieving state‑of‑the‑art performance on OSWorld.

GUI automationReinforcement LearningSEAgent
0 likes · 6 min read
SEAgent: A Self‑Evolving Computer Agent that Learns Software Use Autonomously
Architect
Architect
Mar 9, 2025 · Artificial Intelligence

Experiments with Reinforcement Learning Fine‑Tuning of a 0.5B Qwen Model on the KK Dataset

The author reports a series of reinforcement‑learning‑based fine‑tuning experiments on a 0.5‑billion‑parameter Qwen‑0.5VB instruct model using the KK dataset, detailing reward design adjustments, curriculum‑style data scaling, observed convergence issues, and hypotheses about why small models fail to develop long reasoning chains.

LLM fine-tuningReinforcement Learningcurriculum learning
0 likes · 11 min read
Experiments with Reinforcement Learning Fine‑Tuning of a 0.5B Qwen Model on the KK Dataset
Baobao Algorithm Notes
Baobao Algorithm Notes
Mar 5, 2025 · Artificial Intelligence

Why My 0.5B LLM’s Reasoning Collapsed During RLHF on Logic Puzzles

The author experiments with reinforcement‑learning‑from‑human‑feedback on a 0.5B Qwen instruct model using Logic‑RL and Open‑R1, discovers that reward mis‑design and curriculum learning cause the model to produce overly short or incorrect reasoning chains on knight‑and‑knave puzzles, and analyses the underlying causes.

Artificial IntelligenceLogic ReasoningRLHF
0 likes · 11 min read
Why My 0.5B LLM’s Reasoning Collapsed During RLHF on Logic Puzzles
Alibaba Cloud Big Data AI Platform
Alibaba Cloud Big Data AI Platform
Nov 8, 2024 · Artificial Intelligence

How TAPIR Boosts Small LLMs with Task‑Aware Curriculum Planning

The paper introduces TAPIR, a task‑aware curriculum planning framework that distills instruction‑following abilities from black‑box LLM teachers into smaller student models by filtering difficult prompts, resampling tasks, enhancing response styles, and iteratively optimizing across multiple training rounds, achieving superior performance on benchmark evaluations.

Instruction TuningLLM distillationTAPIR
0 likes · 10 min read
How TAPIR Boosts Small LLMs with Task‑Aware Curriculum Planning
Baobao Algorithm Notes
Baobao Algorithm Notes
Sep 24, 2024 · Artificial Intelligence

From Zero to One: A Practical Guide to Pretraining Large Language Models

This comprehensive guide walks you through every stage of LLM pretraining—from data sourcing, cleaning, and deduplication to tokenizer design, model architecture choices, training framework selection, optimization tricks, and evaluation methods—highlighting common pitfalls and practical solutions for building robust models.

LLM PretrainingTokenizerTraining Framework
0 likes · 34 min read
From Zero to One: A Practical Guide to Pretraining Large Language Models
DataFunTalk
DataFunTalk
Aug 24, 2023 · Artificial Intelligence

Multi-Agent Decision Large Models: Challenges, Action Semantic Networks, Permutation Invariance/Equivariance, and Automated Curriculum Learning

This talk outlines the fundamental challenges of multi‑agent decision large models, introduces three core design priors—action semantic networks, permutation invariance/equivariance, and cross‑task automated curriculum learning— and demonstrates how these concepts improve performance across diverse environments such as StarCraft, Neural‑MMO, and SMAC.

action semanticsaicurriculum learning
0 likes · 12 min read
Multi-Agent Decision Large Models: Challenges, Action Semantic Networks, Permutation Invariance/Equivariance, and Automated Curriculum Learning
Alimama Tech
Alimama Tech
Sep 7, 2022 · Artificial Intelligence

Curriculum-Guided Bayesian Reinforcement Learning for ROI-Constrained Real-Time Bidding

The paper presents a Curriculum‑Guided Bayesian Reinforcement Learning (CBRL) framework that models ROI‑constrained real‑time bidding as a partially observable constrained MDP, using hard‑margin indicator rewards and a curriculum of relaxed proxy problems to achieve fast, constraint‑satisfying, Bayes‑optimal policies that outperform existing methods on large‑scale industrial data.

Bayesian RLMDPROI constraint
0 likes · 15 min read
Curriculum-Guided Bayesian Reinforcement Learning for ROI-Constrained Real-Time Bidding
Baobao Algorithm Notes
Baobao Algorithm Notes
Mar 3, 2022 · Artificial Intelligence

How Hierarchical Curriculum Learning Improves Dialogue Response Selection

This article explains how treating negative response candidates with varying difficulty through a hierarchical curriculum learning framework—combining corpus‑level and instance‑level curricula—enhances dialogue response selection models, backed by experiments on Douban, Ubuntu, and E‑Commerce datasets.

curriculum learningdialogue response selectionhierarchical learning
0 likes · 8 min read
How Hierarchical Curriculum Learning Improves Dialogue Response Selection
Youku Technology
Youku Technology
Dec 2, 2021 · Artificial Intelligence

Hybrid Curriculum Learning for Emotion Recognition in Conversation

The paper introduces a hybrid curriculum learning framework that tackles emotion shift and confusing labels in emotion recognition in conversation by applying nested curriculum stages at both conversation and utterance levels, enabling a progressive easy‑to‑hard training that markedly boosts classic ERC model performance across four public datasets and is already deployed in Alibaba’s entertainment AI brain script health‑check service.

Emotion Recognitionconversation analysiscurriculum learning
0 likes · 2 min read
Hybrid Curriculum Learning for Emotion Recognition in Conversation
DataFunTalk
DataFunTalk
Mar 20, 2019 · Artificial Intelligence

Addressing Sparse Reward Problems in Model-Free Reinforcement Learning

This article reviews the challenges of model‑free reinforcement learning, especially sparse reward issues exemplified by Montezuma’s Revenge, and surveys recent approaches such as expert demonstrations, curriculum learning, self‑play, hierarchical reinforcement learning, and count‑based exploration to mitigate these problems.

Model-freecurriculum learningexploration
0 likes · 12 min read
Addressing Sparse Reward Problems in Model-Free Reinforcement Learning