Artificial Intelligence 12 min read

Logic-Q: Program Sketch Optimization Boosts Deep Reinforcement Learning for Quantitative Trading

Logic-Q introduces a program‑sketch paradigm that injects lightweight, plug‑and‑play market‑trend logic into deep reinforcement learning agents, dramatically improving trend detection, reducing drawdowns, and outperforming state‑of‑the‑art DRL strategies on multiple quantitative‑trading benchmarks.

Bighead's Algorithm Notes

Dec 21, 2025

Logic-Q: Program Sketch Optimization Boosts Deep Reinforcement Learning for Quantitative Trading

Background

Deep reinforcement learning (DRL) has achieved notable success in quantitative‑trading tasks such as stock trading, portfolio allocation, and order execution, largely because it can learn directly from market micro‑structure without extensive expert feature engineering. However, prior work shows that DRL policies often over‑fit noisy historical data, struggle to identify market trends, and suffer large losses during market crashes. Embedding human expert knowledge about market trends is a natural remedy, yet such knowledge is abstract and difficult to quantify.

Problem Definition

The paper identifies three key shortcomings of current DRL trading agents:

Inaccurate market‑trend recognition, leading to missed opportunities or severe losses during crashes.

Over‑fitting to spurious noise, causing poor performance under extreme conditions.

Human expert trend knowledge is abstract and hard to encode numerically.

Method

3.1 Overall Framework

Logic‑Q is a generic logic‑guided DRL framework that adopts a program‑synthesis sketch paradigm. A lightweight market‑trend perception sketch encodes abstract expert logic while leaving numeric parameters as placeholders. Bayesian optimization parameterizes the sketch; the instantiated sketch consumes market features I_{mar} and outputs a conditional adjustment parameter \phi_{\tau} that modifies a pre‑trained DRL policy \pi_{\theta} into an adjusted policy \pi.

3.2 Market‑Trend Perception Sketch

The sketch consists of multiple conditional statements, each describing one of five trend types (steady decline, steady rise, rapid decline, rapid rise, oscillation). It takes three market indicators as input: volatility vol_g(t), downside risk dr_g(t), and growth rate gr_g(t). The formulas for these indicators are explicitly given in the paper (e.g., volatility is the standard deviation of recent closing prices, downside risk is the variance of negative returns, growth rate measures price increase relative to the start of a window).

When a condition is satisfied, the sketch returns the corresponding adjustment parameter \phi_{\tau}.

3.3 Strategy Adjustment Based on the Sketch

For a single‑model RL setting, the adjustment parameter is used as a Softmax temperature to scale logits, effectively reshaping the action‑probability distribution at each timestep. In an ensemble RL setting, the parameters act as weight tensors that combine the predictions of multiple sub‑policies via a bagging‑style weighted average.

3.4 Sketch Optimization

Because the sketch is symbolic, the authors employ Bayesian optimization to tune its parameters on a small validation set, maximizing a task‑specific objective J(\phi). For order‑execution tasks, the objective is expected cumulative discounted reward; for stock‑trading tasks, the Sharpe ratio on validation data is optimized.

Experiments

4.1 Experimental Setup

Order‑execution task: Historical minute‑level data from China A‑share market (CSI 800 constituents) are used. Baselines include traditional TWAP/VWAP and two DRL methods (PPO, OPD). Evaluation metrics are price advantage (PA), excess annualized return (ARR), gain‑loss ratio (GLR), and the proportion of positive PA (POS).

Stock‑trading task: Datasets span US equities, Hong Kong equities, and cryptocurrency markets, collected from Yahoo Finance. Baselines comprise rule‑based strategies, equal‑weight buy‑and‑hold (BAH), and several state‑of‑the‑art DRL agents (DDPG, PPO, Sharpe‑Ens, AlphaMix). Metrics include annualized return (AR), cumulative return (CR), annualized volatility (AV), maximum drawdown (MD), and Sharpe ratio (SR).

4.2 Results

Single‑model RL improvement: On the order‑execution benchmark, Logic‑Q significantly raises total return, reduces maximum drawdown, and keeps volatility within acceptable limits. An ablation without the program sketch (Logic‑Q w/o sketch) degrades performance sharply, confirming the sketch’s effectiveness. Adding market features alone (OPD (Aug)) does not yield comparable gains.

Ensemble RL improvement: On the stock‑trading benchmark, Logic‑Q outperforms the strongest baselines (Sharpe‑Ens, AlphaMix) in return while achieving lower drawdown and the highest Sharpe ratio. Removing the sketch (Logic‑Q w/o PS) again harms performance, and merely augmenting market information does not help. During market crashes, Logic‑Q’s drawdown is markedly lower than other methods.

Interpretability: Analysis of the March 9‑23 2020 market‑crash case shows that the optimized sketch anticipates the downturn, aligning closely with human expert judgments and resulting in reduced drawdown for the Logic‑Q agent.

Overall, the experiments demonstrate that embedding lightweight, logic‑guided program sketches into DRL agents provides both performance gains and better interpretability for quantitative‑trading applications.

Deep Reinforcement Learning bayesian optimization quantitative trading Logic-Q Market Trend Detection Program Sketch

Written by

Bighead's Algorithm Notes

Focused on AI applications in the fintech sector

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.