Unlocking Text Classification with Qwen2: Experiments, Tips, and LoRA Fine‑Tuning
This article shares practical experiments and insights on using Qwen2ForSequenceClassification for short‑ and long‑text sentiment tasks, compares it with BERT, outlines improvement strategies such as generative fine‑tuning and LoRA, and provides end‑to‑end code, training details, and evaluation results.
Experiment Results and Conclusions
Key observations from several months of large‑model classification experiments:
Short text (query sentiment) : Qwen2 underperforms BERT on single‑sentence queries.
Long text (ASR‑transcribed documents) : BERT truncated at 512 tokens loses accuracy compared with LLMs that can process the full sequence. Sliding‑window BERT was not evaluated.
Base vs. Instruct models : With limited data (<10K samples) Instruct‑tuned models consistently beat plain Base models.
SFT vs. LoRA : For datasets under 10K samples LoRA fine‑tuning yields higher accuracy and lower tuning cost than full‑parameter SFT.
Improvement Strategies
Generative fine‑tuning with domain‑similar data can improve performance when the mixed data have comparable length distributions (e.g., 1.2K vs. 5K tokens) and a mixing ratio around 2:1.
Prompt engineering: add concise label descriptions; for short texts try few‑shot prompting.
Classification‑head fine‑tuning + generative fine‑tuning: for >10K samples fine‑tune the Base model, use pseudo‑labeling on unlabeled data, experiment with LoRA on the embedding layer, model distillation, and simple hyper‑parameter search (learning rate, epochs, LoRA rank).
Heavy‑weight approach: pre‑train on domain data then instruction‑fine‑tune (still unverified for Qwen2‑7B‑Instruct).
Practical tips: use smaller learning rates for larger models, clean noisy or mislabeled samples after error analysis, and define comprehensive labeling rules for complex business scenarios.
Future work includes dynamic padding and multi‑label classification‑head validation.
Text Classification – From BERT to LLM
Sequence classification can be performed with AutoModelForSequenceClassification from HuggingFace for the following model classes:
Qwen2ForSequenceClassification
LlamaForSequenceClassification
BertForSequenceClassification
All three share a backbone model plus a linear classification head.
BertForSequenceClassification
The class inherits from BertPreTrainedModel and consists of a BertModel, a Dropout, and a linear classifier. The number of labels is supplied via num_labels.
class BertForSequenceClassification(BertPreTrainedModel):
def __init__(self, config):
super().__init__(config)
self.num_labels = config.num_labels
self.bert = BertModel(config)
self.dropout = nn.Dropout(config.hidden_dropout_prob)
self.classifier = nn.Linear(config.hidden_size, self.num_labels)
self.init_weights()The forward() method automatically computes CrossEntropyLoss for classification or MSELoss for regression and returns (loss, logits, hidden_states, attentions).
Qwen2ForSequenceClassification
Qwen2 follows the same pattern but supports three problem types:
single_label_classification : uses CrossEntropyLoss.
multi_label_classification : uses BCEWithLogitsLoss with multi‑hot labels.
regression : uses MSELoss (single‑dimensional output).
class Qwen2ForSequenceClassification(Qwen2PreTrainedModel):
def __init__(self, config):
super().__init__(config)
self.num_labels = config.num_labels
self.model = Qwen2Model(config)
self.score = nn.Linear(config.hidden_size, self.num_labels, bias=False)
self.post_init()
...
def forward(self, input_ids=None, attention_mask=None, labels=None, ...):
hidden_states = self.model(...).last_hidden_state
logits = self.score(hidden_states)
loss = None
if labels is not None:
if self.config.problem_type == "regression":
loss_fct = MSELoss()
loss = loss_fct(logits.squeeze(), labels.squeeze())
elif self.config.problem_type == "single_label_classification":
loss_fct = CrossEntropyLoss()
loss = loss_fct(logits, labels)
else:
loss_fct = BCEWithLogitsLoss()
loss = loss_fct(logits, labels)
return SequenceClassifierOutputWithPast(loss=loss, logits=logits, ...)LoRA Fine‑Tuning Qwen2ForSequenceClassification
Steps performed in a ModelScope environment:
Load model qwen/Qwen2.5-3B-Instruct and tokenizer.
Create a synthetic dataset of 20 samples (10 positive, 10 negative) via prompt generation and split into train/validation/test JSON files.
Load datasets with datasets.Dataset, tokenize (max length 24, left padding), and build a DataCollatorWithPadding.
Define evaluation metrics (accuracy, precision, recall, F1) using sklearn.
Configure Trainer with LoRA settings:
target modules:
q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_projrank = 64, lora_alpha = 128, dropout = 0.1
training arguments: 3 epochs, learning rate 5e‑5, batch sizes 8 (train) / 4 (eval)
Train, evaluate on validation and test sets, then merge LoRA weights and save the final model.
Key code snippets:
from modelscope import AutoModelForCausalLM, AutoTokenizer
model_name = "qwen/Qwen2.5-3B-Instruct"
model = AutoModelForCausalLM.from_pretrained(
model_name, torch_dtype="auto", device_map="auto")
tokenizer = AutoTokenizer.from_pretrained(model_name)
from transformers import AutoModelForSequenceClassification, Trainer, TrainingArguments
from peft import LoraConfig, get_peft_model, TaskType
peft_config = LoraConfig(
task_type=TaskType.SEQ_CLS,
target_modules=["q_proj","k_proj","v_proj","o_proj","gate_proj","up_proj","down_proj"],
r=64, lora_alpha=128, lora_dropout=0.1)
model = AutoModelForSequenceClassification.from_pretrained(
model_name, num_labels=2, id2label={0:"正向",1:"负向"},
torch_dtype=torch.bfloat16, device_map="auto", trust_remote_code=True,
attn_implementation="flash_attention2")
model = get_peft_model(model, peft_config)
training_args = TrainingArguments(
output_dir="./output/seq_cls",
learning_rate=5e-5,
per_device_train_batch_size=8,
per_device_eval_batch_size=4,
num_train_epochs=3,
evaluation_strategy="epoch",
save_strategy="epoch",
load_best_model_at_end=True)
trainer = Trainer(
model=model,
args=training_args,
train_dataset=tokenized_train,
eval_dataset=tokenized_valid,
tokenizer=tokenizer,
data_collator=data_collator,
compute_metrics=compute_metrics)
trainer.train()
# Merge LoRA weights and save
saved_model = model.merge_and_unload()
saved_model.save_pretrained("/model/qwen2-3b/seqcls")After merging, the model architecture looks like:
PeftModelForSequenceClassification(
(base_model): LoraModel(
(model): Qwen2ForSequenceClassification(
(model): Qwen2Model(...)
(score): Linear(in_features=2048, out_features=2, bias=False)
)
)
)Prediction example:
txt = "退钱,什么辣鸡基金"
model_inputs = tokenizer([txt], max_length=24, truncation=True, return_tensors="pt").to(saved_model.device)
logits = saved_model(**model_inputs).logits
pred = int(torch.argmax(logits, dim=1).cpu())
print({0:"正向",1:"负向"}[pred]) # 输出: 负向Self‑Test Results
Short‑Text Sentiment (3‑class, ~6K samples, max 128 chars)
Accuracy: 0.9334
Micro Precision/Recall/F1: 0.9334
Macro Precision: 0.9293, Macro Recall: 0.9551, Macro F1: 0.9388
Weighted F1: 0.9338LLMs (7B/3B/1.5B/0.5B) did not surpass large BERT models unless few‑shot prompting or pseudo‑label generation with very large models (e.g., 72B) was applied.
Long‑Text Classification (ASR‑transcribed, avg 740 chars, max 4631)
LoRA fine‑tuning (rank=96, alpha=192) results:
Epoch 1 – Accuracy: 0.8416, Macro F1: 0.7726
Epoch 3 – Accuracy: 0.8848, Macro F1: 0.8528Long‑text tasks benefit from a few epochs and appropriate LoRA rank selection.
References
Qiu Zhenyu, "Large Model Usage in Traditional NLP Tasks", https://zhuanlan.zhihu.com/p/704983302
SegmentFault comparison of LoRA fine‑tuning on disaster tweet classification, https://segmentfault.com/a/1190000044485544
GitHub repository for SFT classification head, https://github.com/muyaostudio/qwen2_seq_cls
Hao Keke, "Building Text Classification with LlamaForSequenceClassification", https://zhuanlan.zhihu.com/p/691459595
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Baobao Algorithm Notes
Author of the BaiMian large model, offering technology and industry insights.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
