Artificial Intelligence 7 min read

Advances in Multimodal Large Models and Document Understanding Presented at the 2024 Global Machine Learning Conference (Beijing)

At the 2024 Global Machine Learning Conference in Beijing, 360 AI Research Institute showcased cutting‑edge multimodal large‑model research, fine‑grained open‑world object detection, and document understanding technologies, highlighting open‑source releases, real‑world deployments, and competitive achievements in AI competitions.

360 Tech Engineering
360 Tech Engineering
360 Tech Engineering
Advances in Multimodal Large Models and Document Understanding Presented at the 2024 Global Machine Learning Conference (Beijing)

On November 14‑15, 2024, the Global Machine Learning Conference (Beijing) gathered leading AI experts, including two senior researchers from 360 AI Research Institute, who presented on multimodal large models and document understanding.

Dr. Leng Dawei, Head of Vision at 360 AI Research Institute, presented the latest research on multimodal large models, discussing scaling laws, the IAA architecture, and the 360VL model, and introduced fine‑grained open‑world object detection as a new research direction.

He demonstrated 360VL’s multimodal recognition capabilities, showing its application in smart watches for English learning, video surveillance for open‑world object detection, and SaaS visual cloud platforms serving over fifty thousand enterprises.

Dr. Liu Huanyong, Head of Knowledge Graph and Document Understanding, shared practices on document parsing, layout analysis, table and formula recognition, and RAG‑enhanced large models, highlighting the open‑source 360Layout‑Analysis model and its competitive performance, winning the ICPR 2024 multi‑line mathematical expression recognition competition.

Both presentations underscored 360’s contributions to AI research, open‑source releases on GitHub and Hugging Face, and the deployment of these technologies in enterprise products such as 360 Smart Document Cloud, providing a competitive edge in document understanding and multimodal AI applications.

Model repositories mentioned include the 360VL multimodal model (GitHub: https://github.com/360CVGroup/360VL, Hugging Face: https://huggingface.co/qihoo360/360VL-70B) and the 360Layout‑Analysis layout analysis model (GitHub: https://github.com/360AILAB-NLP/360LayoutAnalysis, Hugging Face: https://huggingface.co/qihoo360/360LayoutAnalysis).

multimodal AIOpen-sourceLarge ModelsAI researchknowledge graphDocument Understanding
360 Tech Engineering
Written by

360 Tech Engineering

Official tech channel of 360, building the most professional technology aggregation platform for the brand.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.