Artificial Intelligence 9 min read

Multimodal AI Breakthroughs Unveiled at NLPCC 2020 Workshop

The article recaps the inaugural Multimodal Natural Language Processing workshop at NLPCC 2020, highlighting breakthroughs in multimodal summarization, pre‑training models, AI‑driven art, visual‑language interaction, and multimodal dialogue systems, and showcases research from leading institutions and industry partners.

JD Cloud Developers

Nov 4, 2020

Multimodal AI Breakthroughs Unveiled at NLPCC 2020 Workshop

In recent years, artificial intelligence (AI) has achieved major breakthroughs in single-modality tasks such as speech, natural language, and vision, reaching human‑level performance on specific datasets. Researchers now recognize that higher‑level AI tasks often involve processing information across multiple modalities, making multimodal modeling and learning essential.

The first Multimodal Natural Language Processing Workshop was held at NLPCC 2020 by JD Cloud. Leading scholars from NLP, multimodal, and image processing fields presented research on cross‑language and cross‑modality information processing, multimodal pre‑training, AI & art, visual‑language interaction, and multimodal dialogue systems, sparking lively discussions.

Multimodal representation has become a primary form for news and information exchange. Multimodal automatic summarization aims to compress information from multiple modalities into concise summaries, with applications in news feeds and e‑commerce recommendation.

During the workshop, Zhang Jiajun from the Institute of Automation, Chinese Academy of Sciences, presented his group’s progress on multimodal summarization methods and evaluation. He traced the evolution from extractive to generative approaches, culminating in image‑text attention‑based generation, and described evaluation metrics that jointly consider text and image importance as well as their relevance.

The rise of pre‑training models has transformed NLP from manual tuning to large‑scale, reproducible industrial deployment, enabling rapid expansion from single‑language to multimodal research.

Dr. Dan Nan, senior researcher at Microsoft Research Asia, reviewed typical pre‑training models and introduced three latest multimodal pre‑training models: Unicoder for cross‑language understanding and generation, Unicoder‑VL for video‑language tasks, and CodeBERT for code‑related language tasks. She discussed current challenges and future directions.

AI + art is an interdisciplinary field where artists’ imagination inspires scientists, and AI tools enable new artistic creation. Designers use neural networks as creative assistants.

Postdoctoral researcher Gao Feng from Tsinghua University’s Future Lab presented the “Daozi” intelligent painting system, which applies style‑transfer to convert natural images into artistic works and even generates ink‑wash paintings, extending to fashion, industrial, and installation design.

Multimodal intelligent analysis is a hot research topic. Professor Liu Si from Beihang University discussed visual‑language interaction, covering visual relationship detection, human‑object relationship segmentation, and video relationship detection, as well as visual coreference expression and segmentation, proposing solutions that better integrate language context.

Dialogue systems are a key research area in natural language understanding, and multimodal dialogue systems represent an important direction.

Dr. Yang Haiqin from Ping An Life Insurance’s AI R&D team presented applications of multimodal dialogue systems in policy revisit services and video teller customer service, detailing core technologies, deployment experience, and benefits such as reduced operational costs and improved user experience.

The workshop attracted active participation and heated discussion from many scholars. JD AI Research Scientist Dr. Wu Youzheng delivered a talk on intelligent human‑machine interaction and its applications. JD AI Research also presented a paper titled “Enhancing Multi‑turn Dialogue Modeling with Intent Information for E‑Commerce Customer Service,” which was accepted for oral presentation at the conference.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

.ai Multimodal NLP pretraining summarization dialogue

Written by

JD Cloud Developers

JD Cloud Developers (Developer of JD Technology) is a JD Technology Group platform offering technical sharing and communication for AI, cloud computing, IoT and related developers. It publishes JD product technical information, industry content, and tech event news. Embrace technology and partner with developers to envision the future.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.