Artificial Intelligence 25 min read

Unlocking Text Classification with Qwen2: Experiments, Tips, and LoRA Fine‑Tuning

This article shares practical experiments and insights on using Qwen2ForSequenceClassification for short‑ and long‑text sentiment tasks, compares it with BERT, outlines improvement strategies such as generative fine‑tuning and LoRA, and provides end‑to‑end code, training details, and evaluation results.

Baobao Algorithm Notes

Jan 10, 2025

Unlocking Text Classification with Qwen2: Experiments, Tips, and LoRA Fine‑Tuning

Experiment Results and Conclusions

Key observations from several months of large‑model classification experiments:

Short text (query sentiment) : Qwen2 underperforms BERT on single‑sentence queries.

Long text (ASR‑transcribed documents) : BERT truncated at 512 tokens loses accuracy compared with LLMs that can process the full sequence. Sliding‑window BERT was not evaluated.

Base vs. Instruct models : With limited data (<10K samples) Instruct‑tuned models consistently beat plain Base models.

SFT vs. LoRA : For datasets under 10K samples LoRA fine‑tuning yields higher accuracy and lower tuning cost than full‑parameter SFT.

Improvement Strategies

Generative fine‑tuning with domain‑similar data can improve performance when the mixed data have comparable length distributions (e.g., 1.2K vs. 5K tokens) and a mixing ratio around 2:1.

Prompt engineering: add concise label descriptions; for short texts try few‑shot prompting.

Classification‑head fine‑tuning + generative fine‑tuning: for >10K samples fine‑tune the Base model, use pseudo‑labeling on unlabeled data, experiment with LoRA on the embedding layer, model distillation, and simple hyper‑parameter search (learning rate, epochs, LoRA rank).

Heavy‑weight approach: pre‑train on domain data then instruction‑fine‑tune (still unverified for Qwen2‑7B‑Instruct).

Practical tips: use smaller learning rates for larger models, clean noisy or mislabeled samples after error analysis, and define comprehensive labeling rules for complex business scenarios.

Future work includes dynamic padding and multi‑label classification‑head validation.

Text Classification – From BERT to LLM

Sequence classification can be performed with AutoModelForSequenceClassification from HuggingFace for the following model classes:

Qwen2ForSequenceClassification

LlamaForSequenceClassification

BertForSequenceClassification

All three share a backbone model plus a linear classification head.

BertForSequenceClassification

The class inherits from BertPreTrainedModel and consists of a BertModel, a Dropout, and a linear classifier. The number of labels is supplied via num_labels.

class BertForSequenceClassification(BertPreTrainedModel):
    def __init__(self, config):
        super().__init__(config)
        self.num_labels = config.num_labels
        self.bert = BertModel(config)
        self.dropout = nn.Dropout(config.hidden_dropout_prob)
        self.classifier = nn.Linear(config.hidden_size, self.num_labels)
        self.init_weights()

The forward() method automatically computes CrossEntropyLoss for classification or MSELoss for regression and returns (loss, logits, hidden_states, attentions).

Qwen2ForSequenceClassification

Qwen2 follows the same pattern but supports three problem types:

single_label_classification : uses CrossEntropyLoss.

multi_label_classification : uses BCEWithLogitsLoss with multi‑hot labels.

regression : uses MSELoss (single‑dimensional output).

class Qwen2ForSequenceClassification(Qwen2PreTrainedModel):
    def __init__(self, config):
        super().__init__(config)
        self.num_labels = config.num_labels
        self.model = Qwen2Model(config)
        self.score = nn.Linear(config.hidden_size, self.num_labels, bias=False)
        self.post_init()
    ...
    def forward(self, input_ids=None, attention_mask=None, labels=None, ...):
        hidden_states = self.model(...).last_hidden_state
        logits = self.score(hidden_states)
        loss = None
        if labels is not None:
            if self.config.problem_type == "regression":
                loss_fct = MSELoss()
                loss = loss_fct(logits.squeeze(), labels.squeeze())
            elif self.config.problem_type == "single_label_classification":
                loss_fct = CrossEntropyLoss()
                loss = loss_fct(logits, labels)
            else:
                loss_fct = BCEWithLogitsLoss()
                loss = loss_fct(logits, labels)
        return SequenceClassifierOutputWithPast(loss=loss, logits=logits, ...)

LoRA Fine‑Tuning Qwen2ForSequenceClassification

Steps performed in a ModelScope environment:

Load model qwen/Qwen2.5-3B-Instruct and tokenizer.

Create a synthetic dataset of 20 samples (10 positive, 10 negative) via prompt generation and split into train/validation/test JSON files.

Load datasets with datasets.Dataset, tokenize (max length 24, left padding), and build a DataCollatorWithPadding.

Define evaluation metrics (accuracy, precision, recall, F1) using sklearn.

Configure Trainer with LoRA settings:

target modules:

q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj

rank = 64, lora_alpha = 128, dropout = 0.1

training arguments: 3 epochs, learning rate 5e‑5, batch sizes 8 (train) / 4 (eval)

Train, evaluate on validation and test sets, then merge LoRA weights and save the final model.

Key code snippets:

from modelscope import AutoModelForCausalLM, AutoTokenizer
model_name = "qwen/Qwen2.5-3B-Instruct"
model = AutoModelForCausalLM.from_pretrained(
    model_name, torch_dtype="auto", device_map="auto")
 tokenizer = AutoTokenizer.from_pretrained(model_name)

from transformers import AutoModelForSequenceClassification, Trainer, TrainingArguments
from peft import LoraConfig, get_peft_model, TaskType

peft_config = LoraConfig(
    task_type=TaskType.SEQ_CLS,
    target_modules=["q_proj","k_proj","v_proj","o_proj","gate_proj","up_proj","down_proj"],
    r=64, lora_alpha=128, lora_dropout=0.1)

model = AutoModelForSequenceClassification.from_pretrained(
    model_name, num_labels=2, id2label={0:"正向",1:"负向"},
    torch_dtype=torch.bfloat16, device_map="auto", trust_remote_code=True,
    attn_implementation="flash_attention2")
model = get_peft_model(model, peft_config)

training_args = TrainingArguments(
    output_dir="./output/seq_cls",
    learning_rate=5e-5,
    per_device_train_batch_size=8,
    per_device_eval_batch_size=4,
    num_train_epochs=3,
    evaluation_strategy="epoch",
    save_strategy="epoch",
    load_best_model_at_end=True)

trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=tokenized_train,
    eval_dataset=tokenized_valid,
    tokenizer=tokenizer,
    data_collator=data_collator,
    compute_metrics=compute_metrics)
trainer.train()

# Merge LoRA weights and save
saved_model = model.merge_and_unload()
saved_model.save_pretrained("/model/qwen2-3b/seqcls")

After merging, the model architecture looks like:

PeftModelForSequenceClassification(
  (base_model): LoraModel(
    (model): Qwen2ForSequenceClassification(
      (model): Qwen2Model(...)
      (score): Linear(in_features=2048, out_features=2, bias=False)
    )
  )
)

Prediction example:

txt = "退钱，什么辣鸡基金"
model_inputs = tokenizer([txt], max_length=24, truncation=True, return_tensors="pt").to(saved_model.device)
logits = saved_model(**model_inputs).logits
pred = int(torch.argmax(logits, dim=1).cpu())
print({0:"正向",1:"负向"}[pred])  # 输出: 负向

Self‑Test Results

Short‑Text Sentiment (3‑class, ~6K samples, max 128 chars)

Accuracy: 0.9334
Micro Precision/Recall/F1: 0.9334
Macro Precision: 0.9293, Macro Recall: 0.9551, Macro F1: 0.9388
Weighted F1: 0.9338

LLMs (7B/3B/1.5B/0.5B) did not surpass large BERT models unless few‑shot prompting or pseudo‑label generation with very large models (e.g., 72B) was applied.

Long‑Text Classification (ASR‑transcribed, avg 740 chars, max 4631)

LoRA fine‑tuning (rank=96, alpha=192) results:

Epoch 1 – Accuracy: 0.8416, Macro F1: 0.7726
Epoch 3 – Accuracy: 0.8848, Macro F1: 0.8528

Long‑text tasks benefit from a few epochs and appropriate LoRA rank selection.

References

Qiu Zhenyu, "Large Model Usage in Traditional NLP Tasks", https://zhuanlan.zhihu.com/p/704983302

SegmentFault comparison of LoRA fine‑tuning on disaster tweet classification, https://segmentfault.com/a/1190000044485544

GitHub repository for SFT classification head, https://github.com/muyaostudio/qwen2_seq_cls

Hao Keke, "Building Text Classification with LlamaForSequenceClassification", https://zhuanlan.zhihu.com/p/691459595

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

LLM LoRA FineTuning Qwen2 SequenceClassification TextClassification

Written by

Baobao Algorithm Notes

Author of the BaiMian large model, offering technology and industry insights.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.