Reward Shaping — 3 Technical Articles

Machine Learning Algorithms & Natural Language Processing

Apr 16, 2026 · Artificial Intelligence

Efficient Reasoning with Reward Shaping: Compressing Qwen 30B‑Series Chains by 20‑40%

The article analyzes how reward‑shaping techniques can shorten the chain‑of‑thought outputs of Qwen 30‑parameter series models by 20‑40% while preserving or slightly improving performance on AIME‑25 and out‑of‑distribution benchmarks, and it details the experimental design, strategic considerations, and practical insights behind this efficient reasoning approach.

Efficient InferenceQwenReward Shaping

0 likes · 16 min read

Efficient Reasoning with Reward Shaping: Compressing Qwen 30B‑Series Chains by 20‑40%

Data Party THU

Nov 23, 2025 · Artificial Intelligence

Can a Drone Learn to Land Itself? A Deep Reinforcement Learning Walkthrough

This article walks through the fundamentals of reinforcement learning, builds a custom drone‑landing simulation, defines state and action spaces, designs reward functions, implements a neural‑network policy with Bernoulli sampling, and trains it using REINFORCE with baseline techniques, while exposing common pitfalls such as reward‑cheating.

OpenAI GymPolicy GradientPython

0 likes · 22 min read

Can a Drone Learn to Land Itself? A Deep Reinforcement Learning Walkthrough

Alibaba Cloud Developer

Feb 24, 2017 · Artificial Intelligence

How Reinforcement Learning Transforms E‑Commerce Search and Recommendation

This article explores how Taobao leverages reinforcement learning, multi‑armed bandits, and reward‑shaping techniques to improve large‑scale e‑commerce search ranking and recommendation, detailing problem modeling, algorithm designs such as Tabular Q‑learning and DDPG, experimental results from Double‑11, and advanced models like GBDT+FTRL and Wide‑&‑Deep.

Bandit AlgorithmsRecommendation SystemsReward Shaping

0 likes · 19 min read