SenseTime Unveils Multimodal ‘SenseNova’ Large Model System and Its Industry Applications

SenseTime introduced its visual‑centric multimodal large‑model platform SenseNova, detailing model scaling, extensive AI infrastructure, diverse industry deployments such as autonomous driving and generative content, and the challenges of compute efficiency and data acquisition in the race for advanced AI.

DataFunSummit
DataFunSummit
DataFunSummit
SenseTime Unveils Multimodal ‘SenseNova’ Large Model System and Its Industry Applications

Recently, SenseTime showcased several of its large models, highlighting a technology route that places vision at the core and integrates language and other modalities to enable multimodal AI capabilities.

At a technical exchange event on April 10 in Shanghai, the company announced the SenseNova model system, a name derived from the classic phrase “日日新” to emphasize daily model updates and continuous unlocking of AGI possibilities.

Although the exact technical architecture was not disclosed, SenseTime’s co‑founder Chen Yuheng explained that over 80% of human information is visual, and by leveraging its visual expertise together with language and code, the company can train superior multimodal models, differentiating it from peers like Baidu and Alibaba.

SenseTime has already developed a 320‑billion‑parameter universal visual model, achieving state‑of‑the‑art results in object detection, image segmentation, and multi‑object recognition, and has applied visual large models to autonomous driving (BEV perception with 3,000‑class high‑precision recognition), digital humans, 2D/3D content generation, and AI‑driven image creation.

The company also demonstrated its Chinese language model “SenseChat” (≈1 trillion parameters) and generative AI tools such as the “SenseMirage” text‑to‑image platform, showing rapid LoRA fine‑tuning to adapt to niche styles like 80s Hong‑Kong fashion.

To support industry adoption, SenseTime highlighted a feedback‑loop “flywheel” that connects user data with model iteration, announced more than 20 B‑to‑B scenarios, and introduced the “Mingmu” automated data‑labeling platform for autonomous driving, smart city, and other domains.

Behind the scenes, SenseTime operates a massive AI super‑computing facility in Lingang with 5,000 server racks and 27,000 GPUs, capable of training up to 20 ChatGPT‑scale models simultaneously and running single‑task training on up to 4,000 GPUs continuously for over a week.

Chen also identified two key challenges for large‑model training: effective utilization of multi‑GPU parallelism and maintaining long‑term system stability, as well as the looming scarcity of high‑quality data despite the massive visual datasets available.

Looking forward, SenseTime plans to offer a full toolchain for model fine‑tuning, private‑deployment options, Model‑as‑a‑Service APIs, and knowledge‑distillation techniques to lower inference costs, while continuing to expand its AI infrastructure across major Chinese cities.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

Computer Visionlarge modelsAI Infrastructure
DataFunSummit
Written by

DataFunSummit

Official account of the DataFun community, dedicated to sharing big data and AI industry summit news and speaker talks, with regular downloadable resource packs.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.