From Sub-Ability Diagnosis to Human-Aligned Generation: Bridging the Gap for Text Length Control via MARKERGEN
MarkerGen introduces a novel, plug‑and‑play framework that decomposes length‑controllable text generation into four sub‑abilities—identifying, counting, planning, and aligning—integrates external tokenizers and dynamic markers, and achieves significantly lower length errors and higher quality across diverse models, tasks, and languages.
Length‑controllable text generation (LCTG) remains a bottleneck for large language models (LLMs). Existing end‑to‑end methods lack fine‑grained supervision of the sub‑abilities required for precise length control, leading to poor generalization across tasks, scales, and languages.
The authors propose a bottom‑up decomposition of LCTG into four sub‑abilities: Identifying, Counting, Planning, and Aligning. Detailed error analyses reveal that Identifying and Counting errors dominate overall performance.
Based on this analysis, the MARKERGEN framework is introduced. It augments LLMs with external tokenizers and counters to compensate for deficiencies in basic length modeling, and dynamically inserts explicit length markers during generation. A three‑stage decoupled generation paradigm—Planning, Semantic Focusing, and Length Alignment—separates semantic generation from length constraints.
Extensive experiments across multiple LLMs (e.g., Qwen2.5, Llama‑3.1) and tasks (summarization, storytelling, QA) show that MARKERGEN reduces average absolute length error from 18.32% to 5.57% (a 12.57% absolute improvement) while improving quality scores and consuming only ~64% of the token budget.
Further evaluations demonstrate strong generalization across models, tasks, length scales (18–1450 tokens), constraint granularities (exact vs. interval), and languages (including Chinese GAOKAO benchmark), consistently keeping violation rates below 3%.
Ablation studies on TruthfulQA with Qwen2.5‑32B‑Instruct confirm the importance of external tool calls, the decaying‑interval marker insertion strategy, and the three‑stage decoupled generation, each contributing to lower error and higher quality.
Attention‑map analysis on Llama‑3.1‑8B‑Instruct shows that shallow layers focus on the inserted length markers for explicit length modeling, while deeper layers shift attention to semantic content, illustrating the two‑phase workflow of MARKERGEN.
Overall, MARKERGEN provides a plug‑and‑play, efficient solution that bridges the gap between sub‑ability diagnosis and human‑aligned generation, setting a new benchmark for industrial‑grade length‑controllable text generation.
Xiaohongshu Tech REDtech
Official account of the Xiaohongshu tech team, sharing tech innovations and problem insights, advancing together.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.