Twinkle – A Lightweight, Fully Chinese Large‑Model Training Framework from ModelScope
Twinkle is a lightweight client‑server training framework open‑sourced by ModelScope that abstracts away Ray clusters, data and model parallelism, offers three run modes (torchrun, Ray, HTTP), multi‑tenant LoRA training, dual back‑ends (Transformers and Megatron), and a serverless Training‑as‑a‑Service gateway for enterprise and individual developers.
Twinkle is a lightweight client‑server training framework released by the ModelScope (formerly ms‑swift) team to simplify large‑model training infrastructure. It wraps training logic into a standardized API, allowing the same code to run locally with torchrun, on a Ray cluster, or via HTTP without modification.
Core Highlights
Decoupled architecture: client and server are separated with a backward‑compatible Tinker API.
Three execution modes: torchrun, Ray, and HTTP for local debugging, cluster training, or remote API service.
Multi‑backend support: both Transformers and Megatron back‑ends, handling dense and MoE models.
Multi‑tenant LoRA: concurrent LoRA training tasks on a single base model, isolated per tenant.
Multi‑tenant Example
In a scenario with an A100 cluster shared by four teams, Twinkle lets each team train its own LoRA on the same base model simultaneously:
Tenant A : private dataset, LoRA rank=8, SFT.
Tenant B : open‑source dataset from Hub, LoRA rank=32, pre‑training.
Tenant C : reinforcement learning with GRPO loss.
Tenant D : inference only, computing log‑probabilities.
All four tasks run concurrently because Twinkle treats the model and sampler as task‑independent components, and checkpoints are automatically pushed to ModelScope or HuggingFace repositories.
Supported Models
Twinkle currently supports a wide range of mainstream large models, including the full Qwen3 series (0.6B‑32B), Qwen3 MoE (30B‑235B), Qwen3.5 MoE (35B‑122B), Qwen3.5 Dense (2B‑27B), Qwen2/2.5 (0.5B‑72B), DeepSeek V2 series, DeepSeek R1 series, and others. Megatron support is available for the Qwen and DeepSeek MoE models, while GLM and InternLM are limited to the Transformers back‑end.
Installation
Install with a single pip command: pip install 'twinkle-kit' For source installation:
git clone https://github.com/modelscope/twinkle.git
cd twinkle
pip install -e .Requirements: Python ≥ 3.11, PyTorch ≥ 2.0. To use the Megatron back‑end, install Megatron‑LM via the provided INSTALL_MEGATRON.sh script.
Usage Example (Ray LoRA Training)
from peft import LoraConfig
import twinkle
from twinkle import DeviceMesh, DeviceGroup
from twinkle.dataloader import DataLoader
from twinkle.dataset import Dataset, DatasetMeta
from twinkle.model import TransformersModel
from twinkle.preprocessor import SelfCognitionProcessor
# Define device group and mesh
device_group = [DeviceGroup(name='default', ranks=8, device_type='cuda')]
device_mesh = DeviceMesh.from_sizes(fsdp_size=4, dp_size=2)
twinkle.initialize(mode='ray', groups=device_group, global_device_mesh=device_mesh)
def train():
base_model = 'ms://Qwen/Qwen3.5-4B'
dataset = Dataset(dataset_meta=DatasetMeta('ms://swift/self-cognition', data_slice=range(1000)))
dataset.set_template('Template', model_id=base_model)
dataset.map(SelfCognitionProcessor('twinkle LLM', 'ModelScope Community'))
dataset.encode()
dataloader = DataLoader(dataset=dataset, batch_size=8, min_batch_size=8)
model = TransformersModel(model_id=base_model, remote_group='default')
lora_config = LoraConfig(r=8, lora_alpha=32, target_modules='all-linear')
model.add_adapter_to_model('default', lora_config, gradient_accumulation_steps=2)
model.set_optimizer(optimizer_cls='AdamW', lr=1e-4)
model.set_lr_scheduler(scheduler_cls='CosineWarmupScheduler', num_warmup_steps=5, num_training_steps=len(dataloader))
for step, batch in enumerate(dataloader):
model.forward_backward(inputs=batch)
model.clip_grad_and_step()
if step % 20 == 0:
metric = model.calculate_metric(is_training=True)
print(f'Step {step}/{len(dataloader)}, metric: {metric}')
model.save('last-checkpoint')
if __name__ == '__main__':
train()The API is Pythonic: loading data, setting templates, defining the model, configuring LoRA, and running the training loop are all straightforward. Prefixes ms:// and hf:// enable seamless switching between ModelScope and HuggingFace model sources.
Training‑as‑a‑Service (TaaS)
Twinkle also provides a serverless training service on ModelScope (Beta). After joining the Twinkle‑Explorers organization, you can invoke training via the Tinker‑compatible API:
from tinker import ServiceClient, types
from twinkle import init_tinker_client
base_url = 'https://www.modelscope.cn/twinkle'
api_key = 'your-api-key'
init_tinker_client()
service_client = ServiceClient(base_url=base_url, api_key=api_key)
training_client = service_client.create_lora_training_client(base_model='Qwen/Qwen3-30B-A3B-Instruct-2507', rank=16)
for epoch in range(3):
for step, batch in enumerate(dataloader):
input_datum = [input_feature_to_datum(feat) for feat in batch]
fwdbwd_future = training_client.forward_backward(input_datum, "cross_entropy")
optim_future = training_client.optim_step(types.AdamParams(learning_rate=1e-4))
fwdbwd_future.result()
optim_future.result()
training_client.save_state(f"twinkle-lora-{epoch}").result()This enables developers without GPUs to train a 30B MoE model’s LoRA remotely, which is especially valuable for individual developers and small teams.
Modular Ecosystem
Twinkle’s design comprises 20 standard modules grouped into four layers:
Data layer : Dataset, Template, DataLoader, Preprocessor, InputProcessor – handling data loading, encoding, distribution, and ETL.
Model layer : Model, Sampler, Loss, Metric, Reward, Advantage – covering large‑model inference, sampling, loss computation, and evaluation.
Engineering layer : CheckpointEngine, Patch, Module, Kernel – responsible for weight synchronization, model repair, and component integration.
Service layer : Server, Client, Infra, Plugin, Hub – abstracting cluster startup, client interaction, infrastructure plugins, and hub integration.
Each module is highly cohesive and can be replaced or extended independently, e.g., implementing a custom loss by adhering to the Loss interface.
Rich Cookbook
Twinkle ships with ready‑to‑run scripts covering various training scenarios:
FSDP fine‑tuning (Transformers back‑end) – full‑parameter fine‑tuning.
FSDP MoE fine‑tuning – specialized for MoE architectures.
Expert Parallelism + FSDP – expert parallelism combined with data parallelism.
Sequence Parallelism + FSDP – enables ultra‑long context training.
Tensor‑Parallel (TP) training – Megatron back‑end for tensor parallelism.
TP MoE training – MoE with tensor parallelism.
Tinker/Twinkle client training – remote API‑style training supported by both frameworks.
These scripts demonstrate that whether you use Transformers or Megatron, dense or MoE models, local or remote execution, Twinkle provides a ready‑made solution.
Pros and Cons
Elegant, highly modular architecture with strong extensibility.
Multi‑tenant LoRA training is a distinctive advantage.
Supports both Transformers and Megatron back‑ends.
Serverless TaaS lets developers without GPUs train large models.
Interoperability with the ms‑swift ecosystem.
Clear, Pythonic API design.
Project is newly open‑sourced (Feb 2026) and the ecosystem is still maturing.
Multi‑tenant concurrency currently limited to LoRA optimization.
Model coverage, while broad, is not as exhaustive as ms‑swift.
Huawei Ascend NPU support is still under development.
Official Links
GitHub: https://github.com/modelscope/twinkle
Documentation (Chinese): https://twinkle-kit.readthedocs.io/zh-cn/latest/
PyPI: https://pypi.org/project/twinkle-kit/
Serverless training service: join the Twinkle‑Explorers organization to try it.
Old Zhang's AI Learning
AI practitioner specializing in large-model evaluation and on-premise deployment, agents, AI programming, Vibe Coding, general AI, and broader tech trends, with daily original technical articles.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
