Artificial Intelligence 11 min read

Boost Creative Writing with Zhi-Create-Qwen3-32B: Training, Eval & Deployment

This article introduces the open‑source Zhi‑Create‑Qwen3‑32B model, detailing its fine‑tuned training on creative‑writing data, the multi‑domain dataset strategy, curriculum‑learning based SFT, evaluation on WritingBench, and practical deployment options across various hardware and inference frameworks.

Zhihu Tech Column

Jul 25, 2025

Boost Creative Writing with Zhi-Create-Qwen3-32B: Training, Eval & Deployment

Introduction

Zhihu recently open‑sourced the Zhi‑Create‑Qwen3‑32B model, a large language model specifically fine‑tuned for creative‑writing tasks based on the Qwen3‑32B base. Using a refined training strategy and the WritingBench benchmark, the model achieves an 82.08 score, surpassing the base model’s 78.97.

Training Process

2.1 Dataset

The training corpus consists of three categories: carefully filtered open‑source datasets, synthetic chain‑of‑thought (CoT) data, and high‑quality Zhihu Q&A content. The data mix includes:

Open‑source datasets: Dolphin‑r1, Congliu/Chinese‑DeepSeek‑R1‑Distill‑data‑110k, a‑m‑team/AM‑DeepSeek‑R1‑0528‑Distilled, etc.

Professional content: curated Zhihu Q&A.

Synthetic data: CoT reasoning corpora generated by models such as deepseek‑ai/DeepSeek‑R1‑0528.

All data pass a Reward Model filter. Creative‑writing data accounts for 23% of the corpus, while the remaining 77% balances mathematics, code, and general knowledge to preserve overall capabilities.

Data distribution is illustrated below:

2.2 Training Methods

Supervised Fine‑Tuning (SFT) : Curriculum Learning is applied to progressively train on samples of increasing difficulty, mitigating catastrophic forgetting while enhancing creative‑writing ability.

Samples are stratified by inference complexity and context length.

Samples that performed poorly in early rounds are prioritized for further optimization.

Difficulty‑increasing training schedule ensures steady performance gains.

Reinforcement‑style Fine‑Tuning : A combination of RAFT (Reward Ranking Fine‑Tuning) and DPO is used to build a mixed evaluation system, addressing issues such as mixed Chinese‑English code, repetitive outputs, and improving logical reasoning.

Evaluation Results

WritingBench, a benchmark designed for comprehensive creative‑writing assessment, is used as the primary evaluation suite. Claude Sonnet 3.7 serves as the judge model. Zhi‑Create‑Qwen3‑32B scores 82.08, a 3.11‑point improvement over the base Qwen3‑32B’s 78.97, confirming the effectiveness of the fine‑tuning.

Performance across six WritingBench domains is shown below:

Local Deployment

The model can be deployed on a single H20/A800/H800 GPU or a dual‑RTX 4090 setup. Quantized versions are provided: FP8 (Zhi‑Create‑Qwen3‑32B‑FP8) for dual RTX 4090, and Q4_K_M for a single RTX 4090.

from modelscope import AutoModelForCausalLM, AutoTokenizer
from transformers.generation import GenerationConfig
MODEL_NAME = "zhihu/Zhi-Create-Qwen3-32B"

tokenizer = AutoTokenizer.from_pretrained(MODEL_NAME, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(
    MODEL_NAME,
    device_map="auto",
    trust_remote_code=True
).eval()

generate_configs = {
    "temperature": 0.6,
    "do_sample": True,
    "top_p": 0.95,
    "max_new_tokens": 4096,
}

prompt = "请你以鲁迅的口吻，写一篇介绍西湖醋鱼的文章"
messages = [{"role": "user", "content": prompt}]
text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
model_inputs = tokenizer([text], return_tensors="pt").to(model.device)
generated_ids = model.generate(**model_inputs, **generate_configs)
response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
print(response)

Deployment via ZhiLight, vLLM, SGLang, and Ollama is also supported, with example commands provided for each framework.

Usage Recommendations

For optimal performance, set temperature to 0.5‑0.7 (recommended 0.6) and top‑p to 0.95 to balance creativity and coherence while avoiding repetitive or incoherent output.

Open‑Source Information

The model weights are released on Hugging Face (https://huggingface.co/Zhihu‑ai) and ModelScope (https://modelscope.cn/organization/zhihu). Users are encouraged to download and experiment with the model.

Contact

For questions, reach out to the Zhihu AI team at [email protected].

deployment model fine-tuning Large Language Model evaluation creative writing

Written by

Zhihu Tech Column

Sharing Zhihu tech posts and exploring community technology innovations.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.