Fundamentals 8 min read

Data‑Driven Cross‑Language Program Analysis with Datalog: CodeFuse‑Query and Its ICSE 2025 Publication

The article introduces a data‑driven, Datalog‑based cross‑language program analysis technique presented in an ICSE 2025 paper, describes the open‑source CodeFuse‑Query platform, its technical innovations, and multiple production scenarios such as code evaluation, precise testing, dead‑code detection, and large‑scale code data cleaning.

AntTech
AntTech
AntTech
Data‑Driven Cross‑Language Program Analysis with Datalog: CodeFuse‑Query and Its ICSE 2025 Publication

The 2025 IEEE/ACM International Conference on Software Engineering (ICSE) will be held in Ottawa, Canada, receiving 662 submissions with a sub‑10% acceptance rate; among the accepted papers is a collaborative work by Ant Group’s program analysis team and Nanjing University on data‑driven, cross‑language program analysis.

The paper originates from an Alipay app outage caused by inconsistent code changes across heterogeneous micro‑services written in Java, XML, JS/TS, and Objective‑C, highlighting the lack of tools that can quickly and precisely locate change impact across languages.

Technically, the proposed approach uses Datalog to uniformly represent software services written in different programming and configuration languages; existing Datalog solvers enable fast analysis tasks such as change‑impact determination. The technique is part of Ant Group’s open‑source CodeFuse‑Query project, available on GitHub.

CodeFuse‑Query reimagines static code analysis as a data‑centric computation, supporting over 100 billion lines of code per day and more than 300 analysis tasks. It employs a two‑layer data model (COREF) and a domain‑specific language (Godel) that translates high‑level analysis specifications into Datalog rules.

The platform addresses five key technical challenges: (1) multi‑language support for nine languages (Java, XML, Swift, JS/TS, C/C++/Obj‑C, Go, Python, SQL, Properties); (2) incremental, unified data‑model generation; (3) simplified definition of analysis tasks via Godel; (4) reusable analysis capabilities; and (5) cross‑language, cross‑repository joint analysis.

In production, CodeFuse‑Query powers several Ant Group scenarios, including code quality evaluation, precise testing (identifying affected methods, entry points, call chains, and database operations), dead‑code detection for reducing app bundle size, and large‑scale code data cleaning for the CodeFuse large‑language model.

The article concludes that cross‑language program analysis, as enabled by CodeFuse‑Query, is essential for ensuring software quality, security, and performance in the era of AI‑driven, heterogeneous systems.

software engineeringStatic AnalysisDatalogcross-languageCodeFuse-QueryICSE2025program analysis
AntTech
Written by

AntTech

Technology is the core driver of Ant's future creation.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.