Exploring AIGC in Enterprise: From OT Algorithms to 3D AI Innovations
The A2M Internet Architecture and AI Summit in Shanghai showcased 24 topics and 84 talks, highlighting AIGC applications across enterprise collaboration, large‑scale model integration, 3D visual content, music entertainment, and the broader impact of LLMs on software development and digital transformation.
Conference Overview
A2M, organized by msup, held an Internet Architecture and AI Technology Summit in Shanghai on May 26‑27, focusing on AI implementation in the AIGC era, data intelligence, and infrastructure evolution. The event featured 24 topics and 84 talks covering AIGC, metaverse, data security, computer vision, intelligent voice, full‑chain data governance, Web3, digital transformation, and more.
Highlights
Given the large number of sessions and tight schedule, only selected interesting AIGC topics are summarized.
Exploring AIGC in Enterprise Collaboration
Overview
Speaker Li Yisong, head of Alibaba DingTalk document suite backend, presented AIGC applications in text, image, audio, video tools, and intelligent integration systems for enterprise office scenarios. DingTalk implements AIGC on three levels:
Online document collaborative intelligence: from the original OT algorithm to GeneralOT (client‑side operation transformation) to Context‑based OT (client‑server dual transformation) to support larger collaboration groups.
Transformational intelligence: integration of ASR (end‑to‑end speech recognition), audio synthesis, video synthesis, and keyword extraction models, enabling text and video flash‑note capabilities.
Generative intelligence: building on the above, DingTalk connects to Alibaba's Tongyi large model to provide text‑to‑text, text‑to‑image, text‑to‑table, and text‑to‑code generation.
Additionally, DingTalk designed stability mechanisms for AIGC calls, preserving client messages during weak network conditions and discarding duplicate or out‑of‑order messages after reconnection.
Takeaways
This talk is valuable for those interested in online document collaboration, offering insights into OT/GeneralOT/Context‑based OT algorithms and real‑world AIGC deployment scenarios.
TaskMatrix: A New Paradigm for AIGC with Large‑Scale Pre‑Training Models
Overview
Microsoft Researcher Wu Chenfei introduced the fundamentals of large language models (LLMs) and vision foundation models (VFMs) such as Stable Diffusion, explaining why LLMs alone cannot directly generate images and require an intermediate layer to invoke VFMs via APIs and prompt conversion. TaskMatrix implements this LLM‑to‑VFM orchestration, enabling applications like image recognition, multimodal content generation, office automation, and IoT/robotics. Limitations include strong dependence on ChatGPT‑type models, high‑quality prompt requirements, token limits, latency, and content safety concerns.
Takeaways
The session highlighted the importance of both LLMs and the orchestration layer for effective AIGC deployment.
Generative AI in 3D Visual Content: Challenges and Opportunities
Overview
LUMOS’s Zhang Xinying analyzed the emerging 3D AIGC market, noting a split between native 3D training pipelines and 2D‑based upscaling approaches (e.g., Imagen, Stable Diffusion). LUMOS adopts a text‑to‑image plus 2D upscaling strategy, using an autoregressive transformer to address object deformation, layout issues, and unnatural mixing. Training involved a two‑stage pipeline: a discrete variational auto‑encoder compresses 256×256 RGB images into 32×32 tokens, then a transformer models the joint distribution of 256 text BPE tokens and 1024 image tokens. Mixed‑precision 16‑bit training, parameter sharding, and distributed optimization reduced memory consumption and achieved strong results.
AIGC Applications in the Music and Entertainment Industry
Overview
Wu Bin from Tencent Music showcased AI‑generated content across music, karaoke, live streaming, and virtual avatars, including automatic cover‑art generation, lyric‑insertion, and high‑resolution image creation. Prompt fine‑tuning and the Muse UI service package the generation model for non‑technical staff, establishing a new collaborative mode. Performance optimizations—operator merging, data reshaping, FlashAttention, I/O improvements, CUDA‑custom kernels, KV‑cache, tensor parallelism, and memory reuse—reduced Stable Diffusion costs to one‑tenth and LLM inference costs to three percent of the unoptimized baseline.
Tech‑Business Fusion: Harnessing AIGC for Digital Transformation
Overview
He Mian from Youchuan Information discussed LLM impacts on software development, arguing that natural‑language interaction will surpass GUIs. Future software vendors may become API providers integrated into LLMs or platforms aggregating various models. Development will shift to LLM‑driven Q&A with domain knowledge, and delivery will focus on full‑stack, business‑topic‑oriented solutions.
Conclusion
The summit demonstrated that major Chinese tech firms have begun deploying AIGC in text‑to‑text and text‑to‑image scenarios, delivering tangible business value. Coupled with Apple’s MR announcements and advances in 3D AIGC and LLM inference, future human‑machine interaction may resemble NPC dialogue in games or conversations with realistic virtual humans. Researchers should embrace AIGC to boost productivity, focusing not only on technology but also on commercial problem‑solving to unlock business value.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Kuaishou E-commerce Frontend Team
Kuaishou E-commerce Frontend Team, welcome to join us
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
