Artificial Intelligence 14 min read

From Sub-Ability Diagnosis to Human-Aligned Generation: Bridging the Gap for Text Length Control via MARKERGEN

MarkerGen introduces a novel, plug‑and‑play framework that decomposes length‑controllable text generation into four sub‑abilities—identifying, counting, planning, and aligning—integrates external tokenizers and dynamic markers, and achieves significantly lower length errors and higher quality across diverse models, tasks, and languages.

Xiaohongshu Tech REDtech
Xiaohongshu Tech REDtech
Xiaohongshu Tech REDtech
From Sub-Ability Diagnosis to Human-Aligned Generation: Bridging the Gap for Text Length Control via MARKERGEN

Length‑controllable text generation (LCTG) remains a bottleneck for large language models (LLMs). Existing end‑to‑end methods lack fine‑grained supervision of the sub‑abilities required for precise length control, leading to poor generalization across tasks, scales, and languages.

The authors propose a bottom‑up decomposition of LCTG into four sub‑abilities: Identifying, Counting, Planning, and Aligning. Detailed error analyses reveal that Identifying and Counting errors dominate overall performance.

Based on this analysis, the MARKERGEN framework is introduced. It augments LLMs with external tokenizers and counters to compensate for deficiencies in basic length modeling, and dynamically inserts explicit length markers during generation. A three‑stage decoupled generation paradigm—Planning, Semantic Focusing, and Length Alignment—separates semantic generation from length constraints.

Extensive experiments across multiple LLMs (e.g., Qwen2.5, Llama‑3.1) and tasks (summarization, storytelling, QA) show that MARKERGEN reduces average absolute length error from 18.32% to 5.57% (a 12.57% absolute improvement) while improving quality scores and consuming only ~64% of the token budget.

Further evaluations demonstrate strong generalization across models, tasks, length scales (18–1450 tokens), constraint granularities (exact vs. interval), and languages (including Chinese GAOKAO benchmark), consistently keeping violation rates below 3%.

Ablation studies on TruthfulQA with Qwen2.5‑32B‑Instruct confirm the importance of external tool calls, the decaying‑interval marker insertion strategy, and the three‑stage decoupled generation, each contributing to lower error and higher quality.

Attention‑map analysis on Llama‑3.1‑8B‑Instruct shows that shallow layers focus on the inserted length markers for explicit length modeling, while deeper layers shift attention to semantic content, illustrating the two‑phase workflow of MARKERGEN.

Overall, MARKERGEN provides a plug‑and‑play, efficient solution that bridges the gap between sub‑ability diagnosis and human‑aligned generation, setting a new benchmark for industrial‑grade length‑controllable text generation.

LLMtext generationLength-Controlled GenerationMarkerGenSub-Ability Diagnosis
Xiaohongshu Tech REDtech
Written by

Xiaohongshu Tech REDtech

Official account of the Xiaohongshu tech team, sharing tech innovations and problem insights, advancing together.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.