Artificial Intelligence 16 min read

Myths and Misconceptions in Reinforcement Learning – Summary of Csaba Szepesvári’s KDD 2020 Deep Learning Day Talk

This article summarizes Csaba Szepesvári’s 2020 KDD Deep Learning Day presentation on common myths and misconceptions in reinforcement learning, covering the scope of RL, safety concerns, generalization challenges, causal reasoning, and broader meta‑considerations for the field.

DataFunSummit
DataFunSummit
DataFunSummit
Myths and Misconceptions in Reinforcement Learning – Summary of Csaba Szepesvári’s KDD 2020 Deep Learning Day Talk

Csaba Szepésvári, a professor at the University of Alberta and former DeepMind Foundations lead, delivered a talk titled “Myths and Misconceptions in Reinforcement Learning” at KDD 2020 Deep Learning Day. The author of this article compiled the video’s content, adding Chinese commentary and original slide images for reference.

The talk is divided into two parts. First, it provides a high‑level overview of reinforcement learning (RL), asking whether one should study RL, what problems RL faces, and how RL relates to neighboring fields such as deep learning, evolutionary search, and supervised learning. It emphasizes that RL is a family of problem settings—not a single algorithmic recipe—and distinguishes three basic RL categories: online RL, batch RL, and planning/simulation optimization.

Next, the speaker addresses several “myths” and “fallacies.” Key points include: (1) RL is often mistakenly reduced to a handful of algorithms (TD, DQN, PPO) while ignoring broader problem formulations; (2) concerns about RL’s safety stem from its integration into feedback loops, which also affect supervised learning systems; (3) safety guarantees are conditional and must be validated beyond simulators; (4) claims that RL has already achieved superhuman performance are overstated, as performance depends on the comparison baseline; (5) apparent failures of RL in practice often arise from using a narrow set of methods rather than the full RL toolbox.

The presentation also discusses meta‑considerations such as the difficulty of testing on training data, the role of generalization in online, batch, and planning settings, and the importance of causal reasoning. Szepésvári argues that batch RL inherently involves causal inference, and that POMDPs can express many causal problems.

Additional topics include the relationship between self‑supervised learning and RL for handling sparse rewards, the need for robust performance metrics, and the pitfalls of over‑optimizing on benchmark rankings (Goodhart’s law). The speaker concludes that while many myths exist, RL remains a vibrant research area with open challenges, and that blind faith in authority hinders progress.

References cited include Henderson et al. (2018) on deep RL relevance, the “Deep Reinforcement Learning: An Overview” arXiv paper, and several works on causality and RL by Judea Pearl, Bernhard Schölkopf, and Elias Bareinboim.

Safetyreinforcement learningmeta-learningcausalityMisconceptionsGeneralizationMyths
DataFunSummit
Written by

DataFunSummit

Official account of the DataFun community, dedicated to sharing big data and AI industry summit news and speaker talks, with regular downloadable resource packs.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.