Artificial Intelligence 14 min read

DataFun Data Science Summit: Cutting‑Edge Research on Causal Inference, Retrieval‑Augmented Generation, and LLM Content Detection

The DataFun Data Science Summit on May 25 brings together leading experts to present cutting‑edge research on pairwise data causal inference, Retrieval‑Augmented Generation applications, large language model content detection, user growth analytics, and advanced machine‑learning techniques across finance, e‑commerce, and AI domains.

DataFunSummit

May 16, 2024

DataFun Data Science Summit: Cutting‑Edge Research on Causal Inference, Retrieval‑Augmented Generation, and LLM Content Detection

The DataFun Data Science Summit, held on May 25, gathers eight distinguished speakers from academia and industry to share the latest advances in data science, covering topics such as causal inference with pairwise data, Retrieval‑Augmented Generation (RAG), large language model (LLM) content detection, and user growth analytics.

Speaker Highlights:

Li Yilin (Tencent Data Scientist) – Presents a novel framework for A/B experiments using pairwise data, addressing interference in networked settings and providing unbiased estimators under various randomization designs.

Li Yixuan (China Unicom Data Science) – Discusses practical applications of RAG technology, its advantages in improving accuracy and traceability, and challenges encountered when augmenting knowledge with proprietary data.

Han Yunfei (Volcano Engine, A/B Testing Lead) – Explores the truth behind user growth, emphasizing data‑driven strategies, entropy reduction, and experimental evaluation to sustain growth.

Cheng Wei (NEC Labs America) – Reviews the state of LLM‑generated content, the necessity of detection, and outlines current methods, AI‑based techniques, and future research directions.

Sun Yuewen (MBZUAI Post‑doc) – Introduces causal representation learning, its theoretical foundations, and its role in decision‑making under complex environments.

Chen Sirui (Tongji University) – Describes the CaLM benchmark for evaluating causal reasoning in large language models, including dataset construction and empirical findings across 28 models.

Chen Meiqi (Peking University) – Proposes a causal framework to quantify single‑modal bias in multimodal LLMs, presents the MORE dataset, and suggests mitigation strategies.

Zhang Yalin (Ant Group) – Shares weak‑supervision modeling techniques for financial risk control, covering domain adaptation, noisy label handling, and real‑world impact.

The summit also provides QR codes for free registration and live streaming, encouraging participants to engage with the presented research and explore collaborative opportunities.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

machine learning AI Retrieval Augmented Generation causal inference LLM detection

Written by

DataFunSummit

Official account of the DataFun community, dedicated to sharing big data and AI industry summit news and speaker talks, with regular downloadable resource packs.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.