Artificial Intelligence 30 min read

LLM‑SM Hybrid Strategies: Boosting Decision Optimization and Store Design

Recent advances in large language models (LLMs) have sparked interest in their decision‑making capabilities, yet challenges remain; this article explores classic prediction‑optimization pipelines, introduces emerging LLM‑as‑Predictor/Ranker/Optimizer paradigms, and details practical case studies on delivery‑price optimization and intelligent store‑decoration recommendation using LLM‑SM hybrid systems.

Ele.me Technology

Nov 7, 2025

LLM‑SM Hybrid Strategies: Boosting Decision Optimization and Store Design

Background

In recent years, the rapid development of large language models (LLM) in semantic understanding, logical reasoning, and generation has drawn attention to their potential in complex decision tasks. However, LLMs still face high inference cost, uncontrollable outputs, hallucinations, and difficulty handling high‑precision calculations in intricate business scenarios. Small models (SMs) such as DNN, BERT, and uplift models are stable and efficient for specific prediction tasks but lack generalization, strategy exploration, and semantic modeling capabilities. Combining LLMs with SMs has become a recent research hotspot.

Classic Decision Paradigm and LLM+SMs

Cooperative Paradigm

The typical solving framework follows a two‑stage "prediction + optimization" logic. The first stage builds a prediction model to estimate key metrics (e.g., CTR, CVR) under a given strategy. The second stage performs constrained optimization to maximize overall business objectives while respecting constraints such as budget or minimum margin.

Although this framework is clear and interpretable, it suffers from error propagation, mismatched objectives between prediction loss and business goals, difficulty expressing complex multi‑objective constraints, and computational inefficiency when the decision space grows exponentially.

Emerging Paradigms with LLMs (LLM‑as‑X)

To overcome these limitations, researchers have introduced LLM‑as‑Predictor, LLM‑as‑Ranker, LLM‑as‑Judge, and LLM‑as‑Optimizer architectures, leveraging the strong reasoning ability of LLMs to complement small models. The core benefits are capability complementarity, cost control, and closed‑loop optimization.

LLM‑as‑Predictor

LLM directly generates strategy outputs, often fine‑tuned or reinforced for domain tasks. Example: REC‑R1 (Amazon, 2025) uses reinforcement learning to connect a generative LLM with a recommendation system, optimizing the LLM via feedback from a fixed black‑box recommender.

Advantages: no need for complex small‑model design, suitable for low‑resource multi‑task scenarios; can incorporate external knowledge to alleviate data sparsity.

Limitations: higher inference cost and latency (3‑5× slower than small models); unstable RL training; lower interpretability for business attribution.

LLM‑as‑Ranker

Small models (e.g., dual‑tower DNN, BERT‑base) perform fast recall, while a fine‑tuned or prompt‑optimized LLM re‑ranks the top‑K candidates with refined scoring.

Advantages: small models ensure low latency and high throughput; LLM improves ranking quality, especially in semantic matching and context understanding.

Applicable scenarios: selecting the optimal strategy from independent candidates, where LLM excels in ranking precision.

Representative works include "Large Language Models are Zero‑Shot Rankers for Recommender Systems" and "Make Large Language Model a Better Ranker".

LLM‑as‑Optimizer (OPRO)

OPRO (DeepMind, 2024) treats the LLM as an optimizer that reasons and plans. The LLM receives a natural‑language description of the optimization problem, generates candidate solutions, receives external evaluation feedback, and iterates until convergence. Experiments show OPRO matches traditional algorithms on simple combinatorial problems and outperforms them on complex business scenarios.

Advantages: endows LLM with precise estimation ability, leverages historical successes/failures for iterative improvement, removes the need for exact objective formulation, and incorporates broader world knowledge.

Limitation: performance is sensitive to the initial prompt and requires careful exploration‑exploitation design to avoid local optima.

Other LLM‑SM Synergies

LLM as Data Generator: creates high‑fidelity synthetic data to improve SM training.

LLM as Student/Teacher: SM guides LLM token probabilities; LLM distills knowledge to SM.

Hybrid Fusion: combines models of different scales via voting, weighting, or gating.

Case Study 1: Delivery‑Price Optimization

Ele.me allows merchants to set tiered delivery‑price thresholds by time and distance, which heavily influences order conversion, average order value, and profit. The combinatorial configuration makes manual optimization infeasible.

Problem formulation: maximize expected net revenue across time‑distance segments, subject to constraints on minimum net‑rate, reasonable price ranges, adjacent segment price differences, and adjustment limits. Variables include distance segment d, time segment t, delivery price p_{d,t}, estimated order volume q_{d,t}, gross unit price g_{d,t}, subsidy s_{d,t}, platform commission c_{d,t}, and a revenue function R(p_{d,t}).

The classic "prediction + optimization" pipeline suffers from error propagation and inability to capture long‑term merchant growth. To address this, a hybrid solution combines a small‑model scoring engine (predicting order volume and net‑rate for any price configuration) with an LLM decision generator that iteratively refines price proposals using a ReAct loop.

Small model focuses on "what": predicts business metric changes for a given price.

LLM focuses on "why" and "how": reasons over predictions, business knowledge, and strategic goals to propose optimal prices.

ReAct Optimization Process

Initial price generation: enumerate feasible price combinations respecting segment rules.

Scoring: feed each candidate to the uplift model to obtain expected order gain and net‑rate change.

LLM reasoning: analyze candidate strengths/weaknesses, assess risks (e.g., conversion drop), and propose next‑round candidates.

Iteration: repeat scoring and LLM reasoning until a stopping criterion (e.g., score threshold or max iterations) is met.

This hybrid approach yields higher revenue, better alignment with long‑term merchant objectives, and more interpretable recommendations compared to pure operations research methods.

Case Study 2: Intelligent Store Decoration Recommendation

Store decoration (visual atmosphere and product placement) significantly impacts user experience and merchant performance. Merchants often lack design skills, face overwhelming material choices, and rely on intuition for product selection.

The solution expands the "five‑piece set" (top banner, special poster, interior poster, shop avatar, dish background) and builds a two‑stage pipeline: a large‑model style generator followed by a small‑model scoring engine.

LLM style recommendation: given merchant intent (e.g., desired color or theme) or automatically based on season, the LLM retrieves up to 20 candidate material sets and selects the top three.

Scoring model: a multimodal uplift model predicts order volume and net‑rate for each decoration configuration, incorporating image, text, and numeric features.

Model architecture:

Uplift Modeling : uses propensity‑score weighting and adversarial training to mitigate confounding bias between price choices and merchant characteristics.

Monotonicity Modeling : enforces that order volume should not decrease and net‑rate should not increase with higher delivery prices via monotonic neural networks.

Context‑aware Embedding : captures district‑level supply‑demand features.

Multimodal small model design includes:

IFP (Item Feature Processor) for single‑item image, text, and numeric features.

MIP (Multi‑Item Processor) for variable‑length item lists (e.g., window displays) using attention mechanisms.

PIMIP (Poster Image Multi‑Item Processor) for poster components, combining cross‑attention between poster image and associated items.

The overall pipeline follows ReAct: the LLM generates a full‑store decoration plan, the scoring model evaluates it, and the LLM iteratively refines the plan until the score exceeds a predefined threshold.

Evaluation

Model evaluation combines human expert assessment, qualitative scoring of strategy rationale, and online A/B experiments comparing LLM‑driven recommendations against baseline non‑strategic suggestions. Results show notable improvements in order uplift, gross margin, and conversion stability.

LLM ReAct Uplift Modeling Decision Optimization Hybrid Modeling

Written by

Ele.me Technology

Creating a better life through technology

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.