Artificial Intelligence 7 min read

Deep Reinforcement Learning: Concepts, Black‑Box Optimization, and the Cross‑Entropy Method

This article introduces reinforcement learning and deep reinforcement learning, explains key algorithms such as DQN and policy gradients, discusses black‑box optimization and the cross‑entropy method, and provides resources and code examples for further study.

Dada Group Technology

Jun 9, 2017

Deep Reinforcement Learning: Concepts, Black‑Box Optimization, and the Cross‑Entropy Method

Reinforcement learning (RL) trains an agent to make decisions by interacting with an environment, receiving observations, taking actions, and obtaining rewards; the article illustrates this with a ping‑pong example.

Deep reinforcement learning (Deep RL) combines RL with deep neural networks to enable end‑to‑end learning from high‑dimensional inputs, highlighted by breakthroughs such as DeepMind’s Atari DQN and AlphaGo.

The piece explains that DQN merges Q‑learning with deep networks to address the curse of dimensionality, while policy‑gradient methods represent the strategy‑iteration branch of RL.

Black‑box optimization treats the decision network as an opaque function, focusing on maximizing reward through parameter updates, often using gradient‑based methods or derivative‑free approaches similar to evolutionary algorithms.

The cross‑entropy method (CEM) is presented as a black‑box optimization technique that samples policy parameters from a probability distribution, evaluates rewards, selects top performers, and updates the distribution iteratively, akin to evolutionary strategies like CMA‑ES.

Practical resources include a GitHub implementation of CEM, links to the MLSS‑2016 PPT by John Schulman, UC Berkeley’s CS‑294 deep RL course, OpenAI Gym, and classic literature such as Sutton’s reinforcement learning book.

Additionally, the article mentions the New Dada algorithm team, their logistics‑crowdsourcing challenges, and a job posting, while providing references and a disclaimer about content sourced from John Schulman’s presentation.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Artificial Intelligence deep reinforcement learning DQN Black-Box Optimization cross-entropy method

Written by

Dada Group Technology

Sharing insights and experiences from Dada Group's R&D department on product refinement and technology advancement, connecting with fellow geeks to exchange ideas and grow together.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.