Artificial Intelligence 10 min read

How a 9B‑parameter Qwen3.5 model achieves full‑auto data analysis on a consumer GPU

The open‑source CoPaw‑Flash‑9B‑DataAnalyst‑LoRA model, fine‑tuned via LoRA, can autonomously load, explore, statistically analyze, visualize, and generate structured reports for CSV/Excel/JSON datasets, achieving a 90% success rate with an average of 26 iteration rounds, and it runs on a single consumer‑grade GPU using vLLM and the Data Analyst framework.

Old Zhang's AI Learning

Apr 10, 2026

How a 9B‑parameter Qwen3.5 model achieves full‑auto data analysis on a consumer GPU

Model Overview

CoPaw‑Flash‑9B‑DataAnalyst‑LoRA is a LoRA‑fine‑tuned version of Alibaba’s open‑source CoPaw‑Flash‑9B (Qwen3.5‑9B architecture). The LoRA adapter is hosted at huggingface.co/jason1966/CoPaw-Flash-9B-DataAnalyst-LoRA . After fine‑tuning the model can autonomously load CSV/Excel/JSON datasets, perform statistical analysis, generate visualizations, write and execute Python scripts, and produce structured analysis reports without human clicks.

Performance Evaluation

Evaluation used 29 real Kaggle datasets with the Data Analyst framework (max 50 rounds, 128 K context). Results:

Average iteration rounds: 1.2 → 26.0 (≈21.7× increase)

Generated Python files: 0 → >100

Generated charts: 0 → >290

Total token consumption: ~5 K → 18.5 M (≈3700×)

Natural completion rate: 0 % → 89.7 %

Usable outputs: 0/29 (0 %) → 26/29 (90 %)

Human intervention: required at every step → fully autonomous

Demonstration

The agent autonomously analyzes a CSV, writes Python code, executes it, and produces box plots, scatter plots, bar charts, heatmaps, and a final report covering data overview, key findings, dimensional analysis, and conclusions. Sample visualizations (e.g., Toyota used‑car dataset) are shown in the original article.

Deployment Guide

Step 1 – Serve the model with vLLM

export HF_TOKEN=your_huggingface_token

CUDA_VISIBLE_DEVICES=0,1 vllm serve agentscope-ai/CoPaw-Flash-9B \
  --enable-lora \
  --lora-modules agent-lora=jason1966/CoPaw-Flash-9B-DataAnalyst-LoRA \
  --max-lora-rank 64 \
  --tensor-parallel-size 2 \
  --gpu-memory-utilization 0.85 \
  --max-model-len 131072 \
  --gdn-prefill-backend triton \
  --trust-remote-code \
  --reasoning-parser qwen3 \
  --enable-auto-tool-choice \
  --tool-call-parser qwen3_xml \
  --port 8000

Key flags: --enable-lora + --lora-modules: load the LoRA adapter (core) --max-lora-rank 64: must match the adapter --reasoning-parser qwen3: expose the model’s reasoning process --enable-auto-tool-choice: automatic tool selection for agent scenarios

Hardware Requirements

Dual‑GPU (bf16, TP=2): ≈11 GB per GPU

Single‑GPU (bf16): ≈22 GB

8‑bit quantization: ≈12 GB

4‑bit quantization: ≈6 GB (consumer‑grade GPU sufficient)

Official test environment: 2 × NVIDIA H200 GPUs with vLLM 0.19.1.

Step 2 – Install the Data Analyst framework

git clone https://github.com/IIIIQIIII/data-analyst.git
cd data-analyst
bun install

Configure .env:

CLAUDE_CODE_USE_OPENAI=1
OPENAI_BASE_URL=http://localhost:8000/v1
OPENAI_API_KEY=unused
OPENAI_MODEL=agent-lora

Step 3 – Run analysis

bun run start

Issue a natural‑language request, e.g.:

Analyze the CSV file in the current directory and find sales trends

The agent loads the data, writes and runs Python code, creates visualizations, and generates a full report automatically.

Model–Framework Relationship

The model acts as the “brain”; the Data Analyst framework provides six tools that translate model intentions into file I/O and code execution. Without the framework the model’s analysis ability cannot be exercised; without LoRA fine‑tuning the original Qwen3.5‑9B model stalls after each tool call, producing no useful output.

Key Takeaways

True autonomy: the agent runs fully automatically, not a step‑by‑step “press‑continue” pseudo‑agent.

9 B parameters are sufficient: consumer‑grade hardware can handle the workload.

All components (model, framework, evaluation data) are released under Apache 2.0.

Empirical results: 90 % success on 29 real datasets demonstrate practical viability.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

LoRA vLLM Agent open-source GPU Data Analyst qwen3.5

Written by

Old Zhang's AI Learning

AI practitioner specializing in large-model evaluation and on-premise deployment, agents, AI programming, Vibe Coding, general AI, and broader tech trends, with daily original technical articles.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.

Model Overview

Performance Evaluation

Demonstration

Deployment Guide

Step 1 – Serve the model with vLLM

Hardware Requirements

Step 2 – Install the Data Analyst framework

Step 3 – Run analysis

Model–Framework Relationship

Key Takeaways

Old Zhang's AI Learning

How this landed with the community

Was this worth your time?

0 Comments

Step 1 – Serve the model with vLLM

Step 2 – Install the Data Analyst framework

Step 3 – Run analysis