Artificial Intelligence 7 min read

Deep Reinforcement Learning: Concepts, Black‑Box Optimization, and the Cross‑Entropy Method

This article introduces reinforcement learning and deep reinforcement learning, explains key algorithms such as DQN and policy gradients, discusses black‑box optimization and the cross‑entropy method, and provides resources and code examples for further study.

Dada Group Technology
Dada Group Technology
Dada Group Technology
Deep Reinforcement Learning: Concepts, Black‑Box Optimization, and the Cross‑Entropy Method

Reinforcement learning (RL) trains an agent to make decisions by interacting with an environment, receiving observations, taking actions, and obtaining rewards; the article illustrates this with a ping‑pong example.

Deep reinforcement learning (Deep RL) combines RL with deep neural networks to enable end‑to‑end learning from high‑dimensional inputs, highlighted by breakthroughs such as DeepMind’s Atari DQN and AlphaGo.

The piece explains that DQN merges Q‑learning with deep networks to address the curse of dimensionality, while policy‑gradient methods represent the strategy‑iteration branch of RL.

Black‑box optimization treats the decision network as an opaque function, focusing on maximizing reward through parameter updates, often using gradient‑based methods or derivative‑free approaches similar to evolutionary algorithms.

The cross‑entropy method (CEM) is presented as a black‑box optimization technique that samples policy parameters from a probability distribution, evaluates rewards, selects top performers, and updates the distribution iteratively, akin to evolutionary strategies like CMA‑ES.

Practical resources include a GitHub implementation of CEM, links to the MLSS‑2016 PPT by John Schulman, UC Berkeley’s CS‑294 deep RL course, OpenAI Gym, and classic literature such as Sutton’s reinforcement learning book.

Additionally, the article mentions the New Dada algorithm team, their logistics‑crowdsourcing challenges, and a job posting, while providing references and a disclaimer about content sourced from John Schulman’s presentation.

Artificial Intelligencedeep reinforcement learningDQNblack-box optimizationcross-entropy method
Dada Group Technology
Written by

Dada Group Technology

Sharing insights and experiences from Dada Group's R&D department on product refinement and technology advancement, connecting with fellow geeks to exchange ideas and grow together.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.