Artificial Intelligence 7 min read

Alibaba’s New Multimodal ChatGPT Rival: How the Tongyi Model Achieves Unified AI

Alibaba’s internal‑test “Damoyuan‑version ChatGPT” showcases a multimodal AI that combines text, image, code, and creative generation, built on the Tongyi large model’s unified architecture, while other Chinese tech giants rush to launch their own ChatGPT‑style products.

Programmer DD

Feb 18, 2023

Alibaba’s New Multimodal ChatGPT Rival: How the Tongyi Model Achieves Unified AI

Based on Tongyi Fusion Upgrade

The newly announced product can handle knowledge Q&A, AI drawing, code generation, novel continuation, copywriting, poetry, and more, essentially matching all ChatGPT capabilities while adding AI painting.

This multi‑task, cross‑modal performance stems from Alibaba’s Tongyi large‑model foundation.

The upgraded model processes over 30 cross‑modal tasks, including vision‑language, speech, and action understanding, without adding new structures; a single Transformer encoder‑decoder handles pre‑training and fine‑tuning for all tasks.

When asked about NBA topics, the model demonstrated conversational fluency, even revealing a personal bias toward Michael Jordan.

The system also excels at multimodal tasks such as image description, visual grounding, text‑to‑image, visual entailment, and document summarization, handling more than ten single‑ and cross‑modal tasks with a single model.

After the upgrade, it can manage over 30 cross‑modal tasks, including audio and motion.

The “unified” technology relies on three principles:

Architecture unification: a single Transformer encoder‑decoder architecture for pre‑training and fine‑tuning, eliminating task‑specific layers.

Modality unification: the same framework and training approach for NLP, CV, and multimodal tasks.

Task unification: all tasks are expressed as seq2seq generation, allowing nearly identical inputs.

In practice, the Tongyi model has improved performance by 2%‑10% across more than 200 scenarios such as e‑commerce cross‑modal search, AI‑assisted design, legal document analysis, medical text understanding, and open‑domain dialogue.

Alibaba has been researching these technologies since around 2020. In 2021, it released the 27‑billion‑parameter “PLUG” (a Chinese GPT‑3‑like model) for universal writing tasks, and the AliceMind model later topped the CLUE benchmark, surpassing human scores on several language‑understanding tasks.

Domestic Companies Chase ChatGPT

Alibaba’s move reflects a broader trend: Baidu is internally testing “Wenxin Yiyan,” NetEase Youdao plans a ChatGPT‑based product for online education, and JD.com’s vice president sees AIGC and ChatGPT as key to scaling AI applications in China.

“Building an ecosystem for a Chinese ChatGPT is a narrow goal; Alibaba aims to lead the development trend of Chinese large models.”

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

multimodal AI ChatGPT Large Language Model Tongyi

Written by

Programmer DD

A tinkering programmer and author of "Spring Cloud Microservices in Action"

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.