How Alibaba’s Low‑Carbon M6 Model Trains a Trillion‑Parameter AI with 80% Less Energy
Alibaba’s DAMO Academy unveiled the low‑carbon M6 multimodal model, a trillion‑parameter AI trained on just 480 V100 GPUs, achieving over 80% energy reduction and 11‑fold speedup compared to prior trillion‑parameter efforts, and already powering e‑commerce and manufacturing design tools.
Low‑Carbon Training of the M6 Trillion‑Parameter Multimodal Model
On 25 June 2024 Alibaba DAMO Academy released a “low‑carbon” version of its M6 giant model. The model has about 1 trillion parameters (≈10× the number of human neurons) and supports multimodal tasks such as image generation, text generation, and visual‑language understanding.
Hardware and Compute Efficiency
Training used 480 NVIDIA V100 32 GB GPUs (≈480 cards) in the EFLOPS cluster.
Energy consumption was reduced by >80 % compared with prior trillion‑parameter trainings that required 3072 A100 GPUs (Nvidia) or 2048 TPU v3 pods (Google).
Effective training speedup ≈11× relative to those baselines.
Algorithmic Optimizations
The efficiency gains stem from three main improvements to the Mixture‑of‑Experts (MoE) framework:
Expert‑parallel strategy : parallelizes the routing of tokens to multiple expert sub‑networks, increasing the model’s capacity without proportionally increasing compute.
Accelerated linear‑algebra kernels : custom kernels for dense and sparse matrix multiplications that exploit GPU tensor cores.
Mixed‑precision training and half‑precision communication : uses FP16/ BF16 for forward/backward passes and reduces communication bandwidth, while preserving model quality (loss < 0.1 % on standard benchmarks).
Training Procedure
Key hyper‑parameters (as reported):
model_size = 1e12 # parameters
batch_size = 2048
learning_rate = 1e-4
precision = "fp16"
optimizer = "AdamW"
num_epochs = 30Training was performed on Alibaba Cloud PAI platform with the EFLOPS cluster, leveraging distributed data parallelism combined with the expert‑parallel MoE routing.
Performance and Applications
Benchmarks show comparable or slightly better accuracy on multimodal tasks (e.g., VQAv2, COCO caption) relative to larger‑scale models.
Deployed as an AI‑assistant designer on the “Rhino Manufacturing” platform for rapid fashion design and virtual try‑on, reducing design cycle time.
Integrated into Alipay and Taobao for cross‑modal search, copywriting, and image generation.
Future Directions
DAMO Academy plans to further lower carbon footprints, expand real‑world deployments, and investigate theoretical aspects of general‑purpose large models.
ITPUB
Official ITPUB account sharing technical insights, community news, and exciting events.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
