Artificial Intelligence 14 min read

Twinkle – A Lightweight, Fully Chinese Large‑Model Training Framework from ModelScope

Twinkle is a lightweight client‑server training framework open‑sourced by ModelScope that abstracts away Ray clusters, data and model parallelism, offers three run modes (torchrun, Ray, HTTP), multi‑tenant LoRA training, dual back‑ends (Transformers and Megatron), and a serverless Training‑as‑a‑Service gateway for enterprise and individual developers.

Old Zhang's AI Learning

Mar 8, 2026

Twinkle – A Lightweight, Fully Chinese Large‑Model Training Framework from ModelScope

Twinkle is a lightweight client‑server training framework released by the ModelScope (formerly ms‑swift) team to simplify large‑model training infrastructure. It wraps training logic into a standardized API, allowing the same code to run locally with torchrun, on a Ray cluster, or via HTTP without modification.

Core Highlights

Decoupled architecture: client and server are separated with a backward‑compatible Tinker API.

Three execution modes: torchrun, Ray, and HTTP for local debugging, cluster training, or remote API service.

Multi‑backend support: both Transformers and Megatron back‑ends, handling dense and MoE models.

Multi‑tenant LoRA: concurrent LoRA training tasks on a single base model, isolated per tenant.

Multi‑tenant Example

In a scenario with an A100 cluster shared by four teams, Twinkle lets each team train its own LoRA on the same base model simultaneously:

Tenant A : private dataset, LoRA rank=8, SFT.

Tenant B : open‑source dataset from Hub, LoRA rank=32, pre‑training.

Tenant C : reinforcement learning with GRPO loss.

Tenant D : inference only, computing log‑probabilities.

All four tasks run concurrently because Twinkle treats the model and sampler as task‑independent components, and checkpoints are automatically pushed to ModelScope or HuggingFace repositories.

Supported Models

Twinkle currently supports a wide range of mainstream large models, including the full Qwen3 series (0.6B‑32B), Qwen3 MoE (30B‑235B), Qwen3.5 MoE (35B‑122B), Qwen3.5 Dense (2B‑27B), Qwen2/2.5 (0.5B‑72B), DeepSeek V2 series, DeepSeek R1 series, and others. Megatron support is available for the Qwen and DeepSeek MoE models, while GLM and InternLM are limited to the Transformers back‑end.

Installation

Install with a single pip command: pip install 'twinkle-kit' For source installation:

git clone https://github.com/modelscope/twinkle.git
cd twinkle
pip install -e .

Requirements: Python ≥ 3.11, PyTorch ≥ 2.0. To use the Megatron back‑end, install Megatron‑LM via the provided INSTALL_MEGATRON.sh script.

Usage Example (Ray LoRA Training)

from peft import LoraConfig
import twinkle
from twinkle import DeviceMesh, DeviceGroup
from twinkle.dataloader import DataLoader
from twinkle.dataset import Dataset, DatasetMeta
from twinkle.model import TransformersModel
from twinkle.preprocessor import SelfCognitionProcessor

# Define device group and mesh
device_group = [DeviceGroup(name='default', ranks=8, device_type='cuda')]
device_mesh = DeviceMesh.from_sizes(fsdp_size=4, dp_size=2)

twinkle.initialize(mode='ray', groups=device_group, global_device_mesh=device_mesh)

def train():
    base_model = 'ms://Qwen/Qwen3.5-4B'
    dataset = Dataset(dataset_meta=DatasetMeta('ms://swift/self-cognition', data_slice=range(1000)))
    dataset.set_template('Template', model_id=base_model)
    dataset.map(SelfCognitionProcessor('twinkle LLM', 'ModelScope Community'))
    dataset.encode()
    dataloader = DataLoader(dataset=dataset, batch_size=8, min_batch_size=8)
    model = TransformersModel(model_id=base_model, remote_group='default')
    lora_config = LoraConfig(r=8, lora_alpha=32, target_modules='all-linear')
    model.add_adapter_to_model('default', lora_config, gradient_accumulation_steps=2)
    model.set_optimizer(optimizer_cls='AdamW', lr=1e-4)
    model.set_lr_scheduler(scheduler_cls='CosineWarmupScheduler', num_warmup_steps=5, num_training_steps=len(dataloader))
    for step, batch in enumerate(dataloader):
        model.forward_backward(inputs=batch)
        model.clip_grad_and_step()
        if step % 20 == 0:
            metric = model.calculate_metric(is_training=True)
            print(f'Step {step}/{len(dataloader)}, metric: {metric}')
    model.save('last-checkpoint')

if __name__ == '__main__':
    train()

The API is Pythonic: loading data, setting templates, defining the model, configuring LoRA, and running the training loop are all straightforward. Prefixes ms:// and hf:// enable seamless switching between ModelScope and HuggingFace model sources.

Training‑as‑a‑Service (TaaS)

Twinkle also provides a serverless training service on ModelScope (Beta). After joining the Twinkle‑Explorers organization, you can invoke training via the Tinker‑compatible API:

from tinker import ServiceClient, types
from twinkle import init_tinker_client

base_url = 'https://www.modelscope.cn/twinkle'
api_key = 'your-api-key'

init_tinker_client()
service_client = ServiceClient(base_url=base_url, api_key=api_key)
training_client = service_client.create_lora_training_client(base_model='Qwen/Qwen3-30B-A3B-Instruct-2507', rank=16)

for epoch in range(3):
    for step, batch in enumerate(dataloader):
        input_datum = [input_feature_to_datum(feat) for feat in batch]
        fwdbwd_future = training_client.forward_backward(input_datum, "cross_entropy")
        optim_future = training_client.optim_step(types.AdamParams(learning_rate=1e-4))
        fwdbwd_future.result()
        optim_future.result()
    training_client.save_state(f"twinkle-lora-{epoch}").result()

This enables developers without GPUs to train a 30B MoE model’s LoRA remotely, which is especially valuable for individual developers and small teams.

Modular Ecosystem

Twinkle’s design comprises 20 standard modules grouped into four layers:

Data layer : Dataset, Template, DataLoader, Preprocessor, InputProcessor – handling data loading, encoding, distribution, and ETL.

Model layer : Model, Sampler, Loss, Metric, Reward, Advantage – covering large‑model inference, sampling, loss computation, and evaluation.

Engineering layer : CheckpointEngine, Patch, Module, Kernel – responsible for weight synchronization, model repair, and component integration.

Service layer : Server, Client, Infra, Plugin, Hub – abstracting cluster startup, client interaction, infrastructure plugins, and hub integration.

Each module is highly cohesive and can be replaced or extended independently, e.g., implementing a custom loss by adhering to the Loss interface.

Rich Cookbook

Twinkle ships with ready‑to‑run scripts covering various training scenarios:

FSDP fine‑tuning (Transformers back‑end) – full‑parameter fine‑tuning.

FSDP MoE fine‑tuning – specialized for MoE architectures.

Expert Parallelism + FSDP – expert parallelism combined with data parallelism.

Sequence Parallelism + FSDP – enables ultra‑long context training.

Tensor‑Parallel (TP) training – Megatron back‑end for tensor parallelism.

TP MoE training – MoE with tensor parallelism.

Tinker/Twinkle client training – remote API‑style training supported by both frameworks.

These scripts demonstrate that whether you use Transformers or Megatron, dense or MoE models, local or remote execution, Twinkle provides a ready‑made solution.

Pros and Cons

Elegant, highly modular architecture with strong extensibility.

Multi‑tenant LoRA training is a distinctive advantage.

Supports both Transformers and Megatron back‑ends.

Serverless TaaS lets developers without GPUs train large models.

Interoperability with the ms‑swift ecosystem.

Clear, Pythonic API design.

Project is newly open‑sourced (Feb 2026) and the ecosystem is still maturing.

Multi‑tenant concurrency currently limited to LoRA optimization.

Model coverage, while broad, is not as exhaustive as ms‑swift.

Huawei Ascend NPU support is still under development.

Official Links

GitHub: https://github.com/modelscope/twinkle

Documentation (Chinese): https://twinkle-kit.readthedocs.io/zh-cn/latest/

PyPI: https://pypi.org/project/twinkle-kit/

Serverless training service: join the Twinkle‑Explorers organization to try it.

Twinkle architecture: decoupled client‑server design

LoRA Multi‑Tenant large model training client-server ModelScope TaaS Twinkle

Written by

Old Zhang's AI Learning

AI practitioner specializing in large-model evaluation and on-premise deployment, agents, AI programming, Vibe Coding, general AI, and broader tech trends, with daily original technical articles.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.