Artificial Intelligence 7 min read

Three iQIYI AI Papers Break New Ground at ACL 2025 & INTERSPEECH 2025

iQIYI’s AI research team secured three paper acceptances—two at ACL 2025 (including a main conference and a Findings paper) and one at INTERSPEECH 2025—covering long‑context large language model evaluation, Chinese novel summarization, and efficient Thai speech recognition, with links to each work.

iQIYI Technical Product Team

Jul 3, 2025

Three iQIYI AI Papers Break New Ground at ACL 2025 & INTERSPEECH 2025

iQIYI AI Team Papers Accepted at ACL 2025 and INTERSPEECH 2025

Recently, ACL 2025 and INTERSPEECH 2025 announced their accepted papers. iQIYI’s AI team has three papers selected: two at ACL (one main conference paper and one Findings paper) and one at INTERSPEECH.

Main Conference Paper (ACL)

Title: IFBENCH: Evaluating the Instruction Following Performance and Stability of Large Language Models in Long-Context Scenarios

Collaboration: East China Normal University

Abstract: Can large language models accurately follow complex instructions in long texts (inputs over 32K tokens)? Real applications require models to retain more knowledge and respond precisely to intricate prompts. To address this, iQIYI and East China Normal University introduced LIFBench, a benchmark for long‑text scenarios containing three types of long‑text contexts and eleven tasks, automatically generating 2,766 diverse instructions (varying in length, expression, and variables) to comprehensively evaluate model capabilities and assess scalability in real applications. The proposed LIFEval scoring system requires no human annotation or reliance on other models, enabling automated scoring and multi‑angle analysis of performance and stability. Experiments on 20 mainstream large models across different text lengths reveal current strengths and weaknesses, helping developers understand real performance on long texts such as lengthy reports or complex dialogues, promoting more stable AI development, and simulating real‑world complex instructions (e.g., organizing long documents, multi‑step operations) to better serve film, literature, entertainment, and education.

Link: https://arxiv.org/abs/2411.07037

Findings Paper (ACL)

Title: CNNSum: Exploring Long-Context Summarization with Large Language Models in Chinese Novels

Collaboration: East China Normal University

Abstract: While large language models have many studies on long‑text tasks, a high‑quality Chinese dataset specifically for long‑document summarization has been lacking. CNNSum is a benchmark based on Chinese novels, comprising 695 samples ranging from 16,000 to 128,000 characters, each paired with a human‑written high‑quality summary. Findings include: (1) Powerful models like GPT‑4 tend to “add drama,” inserting subjective commentary that blurs the focus; (2) Smaller models offer better cost‑performance, e.g., 7‑billion‑parameter models; (3) Prompt design is critical—different prompts (e.g., “summarize in three sentences” vs. “list key characters and events”) can lead to vastly different results, though fine‑tuning can mitigate this; (4) Training tricks matter—models using RoPE‑based scaling trained on short texts can improve long‑text summarization, while other optimizations should be applied cautiously. CNNSum reflects real capabilities of models in “reading novels, long reports” scenarios and can quickly extract core content from long novels.

Link: https://arxiv.org/abs/2412.02819

INTERSPEECH 2025 Paper

Title: Weakly Supervised Data Refinement and Flexible Sequence Compression for Efficient Thai ASR

Collaboration: Northwestern Polytechnical University

Abstract: Insufficient high‑quality annotated data and limited compute are major challenges for Thai speech recognition in low‑resource scenarios. To tackle these issues, we propose EThai‑ASR. First, two annotation models iteratively improve transcription accuracy, addressing data scarcity and producing an optimized Thai speech encoder. Building on this, we construct a speech‑recognition architecture comprising a speech encoder, modality adapter, and large language model, achieving state‑of‑the‑art Thai ASR performance. Finally, we design a similarity‑based redundancy removal module that reduces computational cost, significantly lowering training and deployment expenses without sacrificing performance.

Link: http://arxiv.org/abs/2505.22063

large language models long context AI research Speech Recognition summarization ACL 2025 INTERSPEECH 2025