Artificial Intelligence 14 min read

Multimodal Large Model Platform: History, Architecture, Practices, and Future Outlook by Jiuzhang Yunji DataCanvas

This article reviews the evolution of multimodal large models, introduces Jiuzhang Yunji DataCanvas' multimodal model platform—including AI foundation software, model tools, serving, and prompt management—shares practical building methods, memory‑augmented models, ETL pipelines, knowledge‑base applications, and offers a forward‑looking perspective on enterprise data management and intelligent agents.

DataFunSummit

Jan 5, 2024

Multimodal Large Model Platform: History, Architecture, Practices, and Future Outlook by Jiuzhang Yunji DataCanvas

01 Multimodal Large Model History

The first AI workshop in 1956 at Dartmouth marked the birth of artificial intelligence, focusing on symbolic logic, but its limitations led to AI winters until recent breakthroughs with large language models demonstrated the power of neural networks.

Understanding images like the Obama‑scale joke requires both visual perception and logical reasoning, highlighting the need for multimodal models that combine perception and cognition.

Key milestones include the 2020 Vision Transformer (ViT) introducing Transformers to vision, OpenAI's CLIP enabling cross‑modal generalization, and 2023’s surge of multimodal models such as PaLM‑E, Whisper, ImageBind, SAM, and Microsoft’s Kosmos‑2.

What can multimodal large models do?

They enable video summarization, program classification, viewership analytics, and text‑to‑image generation, and when coupled with embodied agents they can plan and adapt paths in novel scenarios.

2. Multimodal Large Model Platform (Jiuzhang Yunji DataCanvas)

AI Foundation Software (AIFS) : Provides AI foundation libraries, GPU clusters, high‑performance storage/network, and tools for data annotation, model training, and sandbox experimentation, supporting both open‑source and proprietary multimodal models.

Model Tool – LMOPS : Offers a full‑lifecycle pipeline (data preparation, model development, evaluation, quantization, distillation, and inference) with distributed optimizations (data, tensor, pipeline parallelism) and visual control.

Large Model Builder (LMB) : Handles distributed training optimizations and supports various fine‑tuning strategies, including continued training, supervised tuning, RLHF, and automatic Chinese vocabulary expansion.

Large Model Serving (LMS) : Optimizes model serving via quantization, knowledge distillation, and pruning (structured and sparse) to reduce compute cost and accelerate Transformers.

Prompt Manager : A toolkit for designing, version‑controlling, and deploying prompts, serving both developers and non‑technical users with AI model, scene, and template management.

3. Practice of Multimodal Large Models

Memory‑augmented multimodal models add a memory module to improve reasoning without sacrificing parameters, enabling better inference and recall.

The platform’s DingoDB multimodal vector database integrates ETL capabilities, offering optimized operators, parallel processing, and caching for unstructured data management.

Model building follows three stages: (1) freeze language model and modality encoders for alignment; (2) optional multimodal retrieval training; (3) optional instruction fine‑tuning for specific tasks.

Knowledge‑base construction leverages the memory architecture to store encoded professional knowledge in DingoDB, enabling efficient multimodal retrieval and reasoning without modifying model weights.

Retrieval uses memory attention mechanisms, improving recall by ~10% and supporting joint text‑image‑table search.

4. Future Thoughts and Outlook

Enterprises have ~85% unstructured data; multimodal large models and knowledge bases can dramatically increase its utilization, potentially delivering tenfold value growth.

Knowledge bases serve as foundations for various agents (R&D, customer service, sales, legal, HR, ops), enabling them to fetch multimodal information, make informed decisions, and continuously improve through feedback loops.

Jiuzhang Yunji DataCanvas aims to help enterprises realize intelligent agents and unlock the full potential of multimodal AI.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Multimodal AI platform architecture Knowledge Base large models AI Foundation Software

Written by

DataFunSummit

Official account of the DataFun community, dedicated to sharing big data and AI industry summit news and speaker talks, with regular downloadable resource packs.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.