Boost Creative Writing with Zhi-Create-Qwen3-32B: Training, Eval & Deployment
This article introduces the open‑source Zhi‑Create‑Qwen3‑32B model, detailing its fine‑tuned training on creative‑writing data, the multi‑domain dataset strategy, curriculum‑learning based SFT, evaluation on WritingBench, and practical deployment options across various hardware and inference frameworks.
Introduction
Zhihu recently open‑sourced the Zhi‑Create‑Qwen3‑32B model, a large language model specifically fine‑tuned for creative‑writing tasks based on the Qwen3‑32B base. Using a refined training strategy and the WritingBench benchmark, the model achieves an 82.08 score, surpassing the base model’s 78.97.
Training Process
2.1 Dataset
The training corpus consists of three categories: carefully filtered open‑source datasets, synthetic chain‑of‑thought (CoT) data, and high‑quality Zhihu Q&A content. The data mix includes:
Open‑source datasets: Dolphin‑r1, Congliu/Chinese‑DeepSeek‑R1‑Distill‑data‑110k, a‑m‑team/AM‑DeepSeek‑R1‑0528‑Distilled, etc.
Professional content: curated Zhihu Q&A.
Synthetic data: CoT reasoning corpora generated by models such as deepseek‑ai/DeepSeek‑R1‑0528.
All data pass a Reward Model filter. Creative‑writing data accounts for 23% of the corpus, while the remaining 77% balances mathematics, code, and general knowledge to preserve overall capabilities.
Data distribution is illustrated below:
2.2 Training Methods
Supervised Fine‑Tuning (SFT) : Curriculum Learning is applied to progressively train on samples of increasing difficulty, mitigating catastrophic forgetting while enhancing creative‑writing ability.
Samples are stratified by inference complexity and context length.
Samples that performed poorly in early rounds are prioritized for further optimization.
Difficulty‑increasing training schedule ensures steady performance gains.
Reinforcement‑style Fine‑Tuning : A combination of RAFT (Reward Ranking Fine‑Tuning) and DPO is used to build a mixed evaluation system, addressing issues such as mixed Chinese‑English code, repetitive outputs, and improving logical reasoning.
Evaluation Results
WritingBench, a benchmark designed for comprehensive creative‑writing assessment, is used as the primary evaluation suite. Claude Sonnet 3.7 serves as the judge model. Zhi‑Create‑Qwen3‑32B scores 82.08, a 3.11‑point improvement over the base Qwen3‑32B’s 78.97, confirming the effectiveness of the fine‑tuning.
Performance across six WritingBench domains is shown below:
Local Deployment
The model can be deployed on a single H20/A800/H800 GPU or a dual‑RTX 4090 setup. Quantized versions are provided: FP8 (Zhi‑Create‑Qwen3‑32B‑FP8) for dual RTX 4090, and Q4_K_M for a single RTX 4090.
from modelscope import AutoModelForCausalLM, AutoTokenizer
from transformers.generation import GenerationConfig
MODEL_NAME = "zhihu/Zhi-Create-Qwen3-32B"
tokenizer = AutoTokenizer.from_pretrained(MODEL_NAME, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(
MODEL_NAME,
device_map="auto",
trust_remote_code=True
).eval()
generate_configs = {
"temperature": 0.6,
"do_sample": True,
"top_p": 0.95,
"max_new_tokens": 4096,
}
prompt = "请你以鲁迅的口吻,写一篇介绍西湖醋鱼的文章"
messages = [{"role": "user", "content": prompt}]
text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
model_inputs = tokenizer([text], return_tensors="pt").to(model.device)
generated_ids = model.generate(**model_inputs, **generate_configs)
response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
print(response)Deployment via ZhiLight, vLLM, SGLang, and Ollama is also supported, with example commands provided for each framework.
Usage Recommendations
For optimal performance, set temperature to 0.5‑0.7 (recommended 0.6) and top‑p to 0.95 to balance creativity and coherence while avoiding repetitive or incoherent output.
Open‑Source Information
The model weights are released on Hugging Face (https://huggingface.co/Zhihu‑ai) and ModelScope (https://modelscope.cn/organization/zhihu). Users are encouraged to download and experiment with the model.
Contact
For questions, reach out to the Zhihu AI team at [email protected].
Zhihu Tech Column
Sharing Zhihu tech posts and exploring community technology innovations.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
