Artificial Intelligence 6 min read

ACL 2023 Multi‑lingual Document‑grounded Dialogue Competition Overview

The ACL 2023 Multi‑lingual Document‑grounded Dialogue Competition, hosted by Alibaba DAMO Academy and Nanjing University, introduces the first multilingual document‑dialogue dataset, provides a baseline system, offers a $7,000 prize pool, and invites participants to submit papers to the Doc2dial Workshop for Best Paper awards.

DataFunTalk
DataFunTalk
DataFunTalk
ACL 2023 Multi‑lingual Document‑grounded Dialogue Competition Overview

The 61st Annual Meeting of the Association for Computational Linguistics (ACL 2023) will be held in Toronto, Canada from July 9‑14, 2023.

The ACL 2023 Multi‑lingual Document‑grounded Dialogue Competition, organized by Alibaba DAMO Academy’s Dialogue Intelligence Team and co‑hosted by Nanjing University, opens the first multilingual document‑dialogue dataset, provides a baseline model, and offers a $7,000 prize pool. Winners will submit papers to the ACL 2023 Doc2dial Workshop and compete for Best Paper and Best Student Paper awards.

Competition details can be found at: https://tianchi.aliyun.com/competition/entrance/532063/information

According to Gartner 2020, over 80% of enterprise data is unstructured, with documents (e.g., manuals, specifications, policies, regulations) being the most prevalent. Enabling dialogue systems to effectively retrieve and use knowledge from such documents is a critical challenge for intelligent information services.

Existing document‑dialogue research has focused mainly on English (EMNLP 2020, 2021) and Chinese (EMNLP 2022), leaving other languages under‑explored. This competition addresses the gap by releasing Vietnamese and French document‑dialogue data (6,954 dialogue turns) and aggregating existing Chinese and English data (32,266 turns), encouraging participants to leverage cross‑lingual similarities.

Dataset and Baseline : A baseline method splits the task into three stages—retrieval, ranking, and generation. The retrieval module selects the top‑N candidate documents based on dialogue history; the ranking module picks the K most relevant documents; the generation module produces the response. Pre‑trained models for each module are provided for four languages (Chinese, English, French, Vietnamese). Evaluation uses the sum of token‑level F1, SacreBLEU, and ROUGE‑L (max 300 points); the baseline scores 156, leaving ample room for improvement.

Prizes :

1st place: $3000

2nd place: $1600

3rd place: $1000

4th place: $800

5th place: $600

The top five teams will submit papers to the Doc2dial Workshop and be eligible for the workshop’s Best Paper and Best Student Paper awards.

Contact :

DingTalk group (QR code provided in original announcement)

WeChat group (QR code provided)

Google Group: https://groups.google.com/g/dialdoc

Workshop website: https://doc2dial.github.io/workshop2023/#shared-task

Organizers :

Yu Haiyang, Algorithm Expert, Alibaba DAMO Academy

Cam‑Tu Nguyen, Associate Professor, Nanjing University

Yu Bowen, Algorithm Expert, Alibaba DAMO Academy

Li Yongbin, Senior Algorithm Expert, Alibaba DAMO Academy

Huang Fei, Researcher, Alibaba DAMO Academy

Sponsor : ModelScope Community (https://modelscope.cn/), the first Chinese AI model open‑source community jointly launched by DAMO Academy and the CCF Open‑Source Development Committee, offering over 535 state‑of‑the‑art models and datasets for AI research.

NLPcompetitiondatasetmultilingualACL2023document dialogue
DataFunTalk
Written by

DataFunTalk

Dedicated to sharing and discussing big data and AI technology applications, aiming to empower a million data scientists. Regularly hosts live tech talks and curates articles on big data, recommendation/search algorithms, advertising algorithms, NLP, intelligent risk control, autonomous driving, and machine learning/deep learning.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.