Artificial Intelligence 10 min read

How AI Platforms Turn Dreams into Reality: Scaling, Efficiency, and Usability

In this talk from the 2022 Yunqi Conference, Jia Yangqing explains how Alibaba's AI platform addresses efficiency, scale, and usability challenges by moving the Damo Academy to the cloud, open‑sourcing ModelScope, and delivering large‑model training, deployment, and inference services at massive scale.

Alibaba Cloud Big Data AI Platform

Nov 4, 2022

How AI Platforms Turn Dreams into Reality: Scaling, Efficiency, and Usability

Speaker: Jia Yangqing

Topic: Artificial Intelligence – When Dreams Become Reality

Event: 2022 Yunqi Conference – Alibaba Lingjie AI Forum

At the main technical forum this year, Alibaba released the open‑source model sharing community ModelScope and gave a brief overview of AI platform work that connects creativity and productivity. In this sub‑forum, the discussion covers the journey from laboratory research to industrial deployment from technical, engineering, and product perspectives.

The most popular concept last year was "large models"; this year the focus has shifted to AIGC. Behind these emerging dreams and possibilities lie three key challenges: efficiency, scale, and usability.

Efficiency : Modern large models such as GPT‑3 and M6 reach hundreds of millions to billions of parameters, improving algorithmic performance but increasing costs faster than gains, leading to a decline in value per unit of computation.

Scale : To achieve better algorithms, data volumes keep growing. Applications like autonomous driving, search advertising, and AI for Science generate massive datasets that require larger compute resources and mature data platforms.

Usability : Deployment is critical, and experts from different fields need intuitive access to models. Platforms must make algorithms easy to use and applications easy to build.

The speaker illustrates these points with the story of Damo Academy moving to the cloud. Initially, each team built its own tools and infrastructure, similar to many research labs. As AI scale grew, standardization and division of labor became essential to improve development efficiency.

The AI development workflow consists of three stages: development, training, and deployment.

Development : Previously, each team managed a small cluster, custom images, and Jupyter notebooks. Now, the entire academy uses PAI's Notebook service (DSW), which is fully compatible with Jupyter, automates repetitive tasks, integrates storage (NAS, OSS), provides multi‑tenant environments, and shares underlying compute resources.

Training : Collaborative teams generate many training jobs, often facing resource discovery and low utilization issues. PAI's cloud‑native training service (DLC) offers a large shared pool, supports distributed training, handles task priority and fairness, and significantly improves resource availability.

Deployment : Simple model deployment involves launching a web server with Flask or similar frameworks. Production services require inference optimization, blue‑green deployments, model mixing, version control, and hardware‑software cost‑performance selection. PAI's EAS service provides these capabilities.

Today, 93% of Damo Academy's development, training, and deployment run on the cloud, dramatically boosting efficiency. For example, OCR inference services achieved over 80% efficiency improvement through model mixing and optimization on PAI.

From an architectural perspective, the cloud‑native AI platform PAI supports breakthroughs in scale (supporting pre‑training models with 10 trillion parameters), usability (ModelScope open‑sources most academy models and seamlessly connects to deployment on PAI), and efficiency (OpenMind AI open service handles roughly 10 trillion daily calls).

Key AI elements—compute, algorithm, and data—are addressed through cloud‑native resource management, notebook‑driven development, and integration of AI workloads with traditional data warehouses, data lakes, real‑time computing, OLAP, and vector search.

Last year at the Yunqi Conference, Alibaba introduced the Lingjie brand for integrated big‑data and AI solutions. This year, advancements include new hardware (Yitian‑710), an integrated big‑data‑AI platform, OpenMind AI open services, and OpenTrek industrial intelligence engine, all aimed at solving the final mile of AI deployment.

Through this AI engineering ecosystem, Alibaba serves a wide range of cloud customers—from city brain initiatives to autonomous driving, scientific research, and content recommendation.

The forum concludes with a live demo showing how a single engineer can, on the PAI platform, explore, fine‑tune, and publish a custom model as a production service, highlighting the platform's ability to handle scheduling and collaboration so engineers can focus on actual work.

Attendees are encouraged to explore PAI further and continue the dialogue on unlocking AI's creative and productive potential.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Efficiency Platform AI Engineering Model Scaling

Written by

Alibaba Cloud Big Data AI Platform

The Alibaba Cloud Big Data AI Platform builds on Alibaba’s leading cloud infrastructure, big‑data and AI engineering capabilities, scenario algorithms, and extensive industry experience to offer enterprises and developers a one‑stop, cloud‑native big‑data and AI capability suite. It boosts AI development efficiency, enables large‑scale AI deployment across industries, and drives business value.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.