Artificial Intelligence 15 min read

Instruction Embedding: Latent Representations of Instructions for Task Identification

The paper introduces Instruction Embedding—a task‑focused text representation learned on the new Instruction Embedding Benchmark—and shows that Prompt‑based Instruction Embedding (PIE) outperforms standard embeddings in clustering, similarity, and downstream tasks such as data selection, in‑context example retrieval, test‑set compression, and task‑correlation analysis.

Xiaohongshu Tech REDtech
Xiaohongshu Tech REDtech
Xiaohongshu Tech REDtech
Instruction Embedding: Latent Representations of Instructions for Task Identification

Instruction fine‑tuning is crucial for improving the instruction‑following ability of large language models (LLMs). While traditional text embeddings capture overall semantic information, instruction data require representations that highlight the underlying task rather than sentence‑level meaning.

At NeurIPS 2024, the Xiaohongshu search team introduced the concept of Instruction Embedding —a specialized subset of text embeddings that focuses on task identification. To support this research, they built the Instruction Embedding Benchmark (IEB), a dataset designed for training and evaluating instruction embeddings.

The IEB dataset was constructed by extracting instructions from three widely used fine‑tuning corpora (Databricks‑Dolly, Alpaca, Self‑instruct) and parsing them with the Berkeley Neural Parser to obtain syntactic structures. Instructions were categorized into four syntactic groups (VP, SBARQ, SQ, Others) and further refined into task‑type subclasses (e.g., verb‑driven knowledge questions, noun‑only queries). Complex sentences were augmented using GPT‑4 prompts to generate challenging samples.

Quality control involved automatic filtering of low‑frequency classes, GPT‑4‑based verification, synonym‑based class merging via WordNet, and manual evaluation of 100 sampled categories, achieving a 93 % accuracy rate.

For modeling, the authors proposed a Prompt‑based Instruction Embedding (PIE) method inspired by PromptBERT. PIE uses carefully crafted prompts to steer the model toward extracting task‑related information, with the final token’s hidden state serving as the instruction embedding. The model was fine‑tuned on the EFT subset of IEB using a SimCSE‑style contrastive loss, where positive pairs are instructions sharing the same verb‑noun task label and hard negatives are selected by keeping the verb constant while varying the noun.

Evaluation comprised two intrinsic tasks: Instruction Clustering (ICT) using k‑means and Instruction Intention Similarity (IIS) aligned with STS benchmarks. PIE‑enhanced models consistently outperformed baselines (BERT, Llama‑2, Vicuna) on both ICT (higher ARI, purity, homogeneity) and IIS.

Four downstream tasks were also examined:

Instruction data selection via clustering, showing improved diversity and performance on IFT test sets and AlpacaEval.

In‑Context Learning (ICL) example selection, where cosine similarity‑based retrieval of two most relevant instructions boosted LLM performance across multiple models.

Test‑set compression, demonstrating that selecting representative samples with instruction embeddings reduces estimation error for small benchmark subsets.

Task‑correlation analysis across open‑source datasets, revealing clear separation between mathematical and code‑related instruction clusters.

Overall, the study demonstrates that instruction embeddings, trained on the IEB benchmark and guided by task‑oriented prompts, provide superior representations for instruction‑centric downstream tasks compared to conventional text embeddings.

contrastive learningFine-tuninglarge language modelsinstruction embeddingtask identificationtext embedding
Xiaohongshu Tech REDtech
Written by

Xiaohongshu Tech REDtech

Official account of the Xiaohongshu tech team, sharing tech innovations and problem insights, advancing together.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.