Enhancing Reinforcement Learning with Label-Sensitive Reward for Natural Language Understanding
This paper introduces RLLR, a label‑sensitive reward reinforcement learning method that improves natural language understanding tasks by aligning training objectives with label accuracy, and demonstrates its effectiveness across eight public NLU datasets and real‑world advertising feature evaluation, outperforming standard RLHF and SFT baselines.