How Alibaba’s Mars Engine Brings Tensor‑Based Scientific Computing to Distributed Scale

Alibaba’s open‑source Mars engine extends NumPy‑style tensor operations to distributed environments, leveraging GPU acceleration, sparse matrices, and flexible scheduling to dramatically boost scientific and AI workloads beyond single‑machine limits.

Alibaba Cloud Developer
Alibaba Cloud Developer
Alibaba Cloud Developer
How Alibaba’s Mars Engine Brings Tensor‑Based Scientific Computing to Distributed Scale

Overview

Alibaba recently open‑sourced the distributed scientific computing engine Mars on GitHub, enabling developers to access its code and contribute.

Unlike traditional big‑data engines that focus on relational algebra, Mars introduces distributed technology to scientific and numerical computing, greatly expanding scale and efficiency. It is already used in Alibaba and its cloud customers' production scenarios.

Scientific Computing Background

Scientific (numerical) computing underpins fields such as image processing, machine learning, and deep learning. NumPy, with its concise syntax and strong performance, has become the foundation of a large ecosystem, but it remains limited to single‑machine execution.

Current distributed engines are not designed for scientific workloads, leading to mismatched interfaces and sub‑optimal performance.

Mars Design Goals

Developed by Alibaba’s MaxCompute team, Mars bridges the gap between big‑data and scientific computing, offering a tensor‑based unified distributed framework.

With Mars, large‑scale scientific tasks can be expressed in a few lines of code instead of thousands, achieving significant performance gains. The current release supports about 70% of common NumPy interfaces, and upcoming versions will fully support pandas.

Core Capabilities

NumPy‑compatible Interface : Mars’s tensor module provides a drop‑in replacement for NumPy; simply changing the import allows code to run on distributed resources with orders‑of‑magnitude larger scale and tens‑fold speedup.

GPU Acceleration : By specifying gpu=True when creating tensors, computations run on GPUs, leveraging existing GPU‑based scientific libraries.

Sparse Matrices : Mars supports 2‑D sparse matrices; setting sparse=True creates memory‑efficient structures such as identity matrices.

System Design

Mars automatically tiles tensors into smaller chunks, enabling parallel execution across various scheduling modes.

Tile (Divide‑and‑Conquer) : Tensors are split along dimensions into chunks; operators automatically handle chunk‑level parallelism.

Lazy Execution & Fusion : Code is executed only after an explicit execute call, allowing the engine to fuse multiple operations into a single optimized kernel.

Scheduling Modes :

Multi‑threaded local execution for single‑node acceleration.

Single‑node cluster mode using multiple processes for development and debugging.

Distributed mode with multiple schedulers and workers forming a consistent‑hash ring.

The distributed architecture launches several schedulers and workers; a client session creates a graph that is tiled into chunks, which are then assigned to workers for execution.

Scaling is seamless: the same Mars code can scale in to a multi‑core single machine or scale out to thousands of workers.

Benchmark Results

In a real‑world scenario, Mars multiplied two 2.25 TB matrices (each with billions of elements) using only 5 lines of code on 1 600 compute units (200 workers × 8 cores) in 2.5 hours, whereas a comparable MapReduce solution required 9 000 units and 10 hours.

Additional tests show Mars outperforming NumPy by several times on a single machine and achieving near‑linear speedup as workers increase, handling datasets up to 115 GB that NumPy cannot process.

Getting Started

The Mars project is available on GitHub at https://github.com/mars-project/mars . The team welcomes contributions and plans to continue development openly.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

GPU Accelerationdistributed computingMarsscientific computingTensor
Alibaba Cloud Developer
Written by

Alibaba Cloud Developer

Alibaba's official tech channel, featuring all of its technology innovations.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.