Artificial Intelligence 5 min read

Constrained Sequence-to-Tree Generation for Hierarchical Text Classification

At SIGIR 2022, the authors present a constrained Seq2Tree model that transforms hierarchical label taxonomies into preorder sequences and applies dynamic‑dictionary decoding to ensure label consistency, achieving superior hierarchical text classification performance on benchmark datasets and real‑world deployment within Alibaba Entertainment’s AI Brain.

Youku Technology
Youku Technology
Youku Technology
Constrained Sequence-to-Tree Generation for Hierarchical Text Classification

SIGIR (International ACM SIGIR Conference) is the most authoritative international conference in the field of intelligent information retrieval and is recommended by the China Computer Federation (CCF) as an A‑class academic conference. Its scope includes information retrieval, recommendation systems, information extraction, knowledge graphs, and many related areas.

SIGIR 2022 is scheduled for July 11‑15, 2022 in Madrid, Spain. The conference received 794 long‑paper submissions and 667 short‑paper submissions, with acceptance rates of 20.4% and 24.6% respectively.

Paper Title: Constrained Sequence-to-Tree Generation for Hierarchical Text Classification

Abstract: Hierarchical Text Classification (HTC) is a common task in text representation and classification, representing a special multi‑label problem where the category labels of a document have a hierarchical relationship. Existing mainstream methods often suffer from label inconsistency under the same path constraint, which limits their applicability in real scenarios. This paper proposes a Seq2Tree model based on an encoder‑decoder framework to address this issue. Seq2Tree extends the traditional Seq2Seq architecture by introducing a label‑category serialization module and a constrained decoding strategy. The serialization module uses a depth‑first traversal algorithm to convert the tree‑structured label taxonomy into a preorder sequence, guaranteeing consistency of labels along the same branch. The constrained decoding strategy employs a dynamic dictionary to restrict the decoding space at each timestep, further improving performance. Experiments on several mainstream HTC datasets demonstrate the effectiveness of the Seq2Tree approach. The method has already been deployed in multiple business scenarios of Alibaba Entertainment’s AI Brain (Beidouxing).

Authors: Yu Chao, Shen Yi, Mao Yue (all from Alibaba Entertainment AI Brain – Beidouxing team)

Alibaba Entertainment’s Beidouxing AI Brain leverages big data and AI to mine user needs, establishing capabilities such as structured content acquisition evaluation, adaptive casting, AI‑driven video quality inspection, scheduling, and digital promotion, thereby supporting decision‑making throughout the content lifecycle and achieving cost reduction and efficiency gains for the platform.

See also other top‑conference papers:

[Good News] Alibaba Entertainment Generative Information Extraction Paper Selected by ACL

Technical Announcement! Alibaba Entertainment AI Brain (Beidouxing) Dialogue Emotion Recognition Paper Selected by AAAI

Another Milestone! Alibaba Entertainment AI Brain (Beidouxing) Paper Accepted by NeurIPS 2021

artificial intelligenceNLPEncoder-DecoderHierarchical Text Classificationseq2tree
Youku Technology
Written by

Youku Technology

Discover top-tier entertainment technology here.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.