Artificial Intelligence 10 min read

Unlock Data+AI Fusion: Fine‑Tune Multimodal Models on DataWorks with GPU‑Ready Notebooks

This tutorial shows how to use Alibaba Cloud DataWorks' serverless GPU resource groups together with the open‑source LLaMA‑Factory framework to fine‑tune the Qwen2‑VL‑2B multimodal model for tourism‑domain Q&A, covering environment setup, dataset preparation, parameter configuration, training, and interactive inference.

Alibaba Cloud Big Data AI Platform

Feb 24, 2025

Unlock Data+AI Fusion: Fine‑Tune Multimodal Models on DataWorks with GPU‑Ready Notebooks

Breaking the Data+AI Integration Bottleneck: DataWorks Supports GPU Resources

In the era of rapid AI advancement, combining massive data with powerful compute is essential. DataWorks, a one‑stop intelligent data development and governance platform, now offers serverless GPU resource groups, enabling on‑demand, elastic, and cost‑effective AI workloads.

Seamless GPU‑Powered Notebook in DataWorks

Developers can select GPU‑type resources when creating personal notebook environments, allowing end‑to‑end data cleaning, feature engineering, model training, and inference on a single platform without data migration.

Prerequisite Resources

Enable the DataWorks product (link: https://x.sm.cn/5rJd28D).

Create a workspace via DataWorks console > Workspace.

Create a Serverless resource group and bind it to the workspace (link: https://x.sm.cn/7M9T68p). Free trial or discount packages are available.

Bind GPU Instance

Recommended GPU: 24 GB A10 (ecs.gn7i-c8g1.2xlarge) or higher.

Image: modelscope:1.18.0‑pytorch2.3.0‑gpu‑py310‑cu121‑ubuntu22.04.

Step 1: Install LLaMA‑Factory

!git clone --depth 1 https://github.com/hiyouga/LLaMA-Factory.git
%cd LLaMA-Factory
!pip uninstall -y accelerate vllm matplotlib
!pip install llamafactory==0.9.0
!llamafactory-cli version

Step 2: Download Dataset

!wget https://atp-modelzoo-sh.oss-cn-shanghai.aliyuncs.com/release/llama_factory/Qwen2-VL-History.zip
!mv data rawdata && unzip Qwen2-VL-History.zip -d data

The provided dataset contains 261 single‑turn dialogues, each with a system prompt, a user instruction (including an placeholder), and a model response that mimics a tour guide.

[
  {
    "conversations": [
      {"from": "system", "value": "You are a tour guide, answer visitors vividly."},
      {"from": "human", "value": "Tell me about this <image>"},
      {"from": "gpt", "value": "...response..."}
    ],
    "images": ["images/instance_1579398113581395972.jpg"]
  }
]

Step 3: Model Fine‑Tuning

3.1 Launch Web UI

!USE_MODELSCOPE_HUB=1 llamafactory-cli webui

Setting USE_MODELSCOPE_HUB=1 downloads the model from ModelScope instead of HuggingFace.

3.2 Configure Parameters

Select the Qwen2VL‑2B‑Chat model, choose the full‑parameter fine‑tuning method, set learning rate to 1e‑4, epochs to 10, compute type to pure_bf16, gradient accumulation steps to 2, and save interval to 1000 to save disk space.

3.3 Start Fine‑Tuning

Set the output directory to train_qwen2vl and click “Start”. The training process takes about 14 minutes and finishes with a “Training completed” message.

Step 4: Model Chat

Load the checkpoint from train_qwen2vl, upload a test image, set the system prompt to “You are a tour guide, answer visitors vividly,” and interact via the Web UI. The fine‑tuned model generates responses that correctly reference the uploaded image and tourism knowledge.

Conclusion

This tutorial demonstrates how to leverage DataWorks’ serverless GPU resources together with LLaMA‑Factory to fully fine‑tune the Qwen2‑VL‑2B multimodal model for tourism‑domain question answering, and suggests extending the workflow to custom business datasets for building domain‑specific multimodal AI solutions.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

model fine-tuning GPU DataWorks LLaMA-Factory Qwen2

Written by

Alibaba Cloud Big Data AI Platform

The Alibaba Cloud Big Data AI Platform builds on Alibaba’s leading cloud infrastructure, big‑data and AI engineering capabilities, scenario algorithms, and extensive industry experience to offer enterprises and developers a one‑stop, cloud‑native big‑data and AI capability suite. It boosts AI development efficiency, enables large‑scale AI deployment across industries, and drives business value.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.