Reinforcement Learning Tutorial 8: Building State Feature Representations for Objective Optimization
This tutorial explains how to construct state feature vectors for reinforcement‑learning value‑function approximation, covering linear, polynomial, Fourier, and radial‑basis representations, as well as state aggregation techniques such as coarse coding and tile coding, and discusses non‑parametric approaches like kernel methods.
Linear Feature Construction
Let the state be represented by a feature vector x(s). A linear value estimator has the form V̂(s)=wᵀx(s). To capture interactions between base features without leaving the linear‑optimization setting, new features can be created as functions of the original components. For example, adding the product x₁(s)·x₂(s) as a new feature x₃(s) allows the weight vector to assign a strong negative contribution when both x₁ and x₂ are large. In a maze scenario, x₁ might measure distance from the exit and x₂ the density of nearby traps; the quadratic feature penalises states that are simultaneously far from the goal and surrounded by traps.
Polynomial Basis
Polynomial features are generated by taking all monomials of the base features up to a chosen degree d . With two base features x₁ and x₂, a second‑order polynomial expansion yields the four‑dimensional vector [x₁, x₂, x₁·x₂, 1]. Higher degrees cause the number of features to grow exponentially, so in practice a subset of useful monomials is selected to keep the optimization tractable.
Fourier Basis
Fourier bases approximate (periodic) functions using cosine terms. For a state interval normalised to [0,1], set the period T=2. The one‑dimensional basis of order n consists of n+1 features:
φ₀(s)=1,
φ_k(s)=cos(π·k·s) for k=1,…,nWhen n=5 the basis contains six features. Using only cosine terms (the function is treated as even) reduces computation while preserving expressive power. In many RL problems Fourier bases outperform polynomial bases because they capture smooth variations with fewer features.
State Aggregation
State aggregation groups similar states so that all members of a group share a single value estimate. Updates to any state in the group modify the shared estimate, effectively reducing the number of distinct parameters.
Coarse Coding
Coarse coding partitions the state space into overlapping regions; each region corresponds to a binary feature that is 1 if the state lies inside the region and 0 otherwise. Overlap permits a state to activate multiple features, providing richer information while keeping the representation sparse. For example, an arrangement of 18 circular regions can encode a state X that belongs to regions 8, 12 and 13 as a binary vector with ones at those positions and zeros elsewhere.
Tile Coding
Tile coding creates several offset copies (tilings) of a base tiling. Each tiling is divided into equal subtiles; a state activates exactly one subtile per tiling. The resulting binary vector has a fixed number of ones equal to the number of tilings, which stabilises learning rates. In a two‑dimensional example, a 4×4 base tiling duplicated three times (with different offsets) yields 48 binary features; any state is represented by three ones—one per tiling.
Radial‑Basis Functions (RBF)
RBFs extend binary coding by allowing continuous feature values. A typical Gaussian RBF is φ_i(s)=exp(-‖s‑c_i‖² / (2σ²)) where c_i is the centre of the i‑th basis and σ controls its width. The feature vector consists of the values of all such Gaussians. An illustration uses nine prototype points uniformly placed in a unit square; the distance from a state to each prototype forms the feature vector.
Non‑Parametric Function Approximation
Non‑parametric methods store a set of training samples and evaluate a new state by consulting these samples. The simplest approach is k‑nearest neighbours (kNN), which assigns a value based on the average of the k closest stored states. Because kNN requires a linear scan of the dataset, its inference cost grows with the number of samples. Research on similarity search (e.g., locality‑sensitive hashing, tree‑based indexes) aims to accelerate such queries.
Kernel‑based approximations weight stored samples by a similarity function (the kernel) and can be learned with gradient or semi‑gradient methods.
Good feature engineering can be the difference between a successful RL solution and a failed one, especially when the problem requires thousands of features (Sutton & Barto, 2020, Chapter 9).
In summary, linear value‑function approximation guarantees convergence but may lack expressive power. Enriching the feature space with interaction terms, polynomial or Fourier bases, and state‑aggregation schemes such as coarse coding or tile coding retains linear optimisation while dramatically improving approximation quality. When the underlying value function cannot be captured by any fixed basis, non‑parametric approaches (memory‑based or kernel‑based) provide additional flexibility without imposing a predetermined functional form.
Reference: Sutton, R. S., & Barto, A. G. (2020). Reinforcement Learning: An Introduction (2nd ed.). http://incompleteideas.net/book/RLbook2020.pdf
AI Algorithm Path
A public account focused on deep learning, computer vision, and autonomous driving perception algorithms, covering visual CV, neural networks, pattern recognition, related hardware and software configurations, and open-source projects.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
