Tagged articles
2 articles
Page 1 of 1
Machine Heart
Machine Heart
May 17, 2026 · Artificial Intelligence

How CASCADE Enables LLM Agents to Learn from Experience During Live Deployment

The paper introduces CASCADE, a deployment‑time learning framework that lets LLM agents continuously select and reuse past cases via a contextual‑bandit approach, achieving higher long‑term success rates across diverse online tasks without updating the base model.

CASCADECase-Based ReasoningContextual Bandit
0 likes · 10 min read
How CASCADE Enables LLM Agents to Learn from Experience During Live Deployment
58 Tech
58 Tech
Dec 28, 2021 · Artificial Intelligence

Reinforcement Learning for Cold‑Start Job Recommendation in 58.com

This talk explains how 58.com tackles the cold‑start and interest‑divergence problems of its massive blue‑collar job recruitment platform by modeling the recommendation process as a reinforcement‑learning task, detailing the use of multi‑armed bandit, contextual bandit, and linear‑UCB algorithms, offline evaluation pipelines, online deployment, and observed performance gains.

Contextual BanditReinforcement Learningcold start
0 likes · 25 min read
Reinforcement Learning for Cold‑Start Job Recommendation in 58.com