How YASA Enables Scalable Multi‑Language Taint Analysis with a Unified AST
The article introduces YASA, a unified multi‑language static taint analysis framework built on a novel Unified Abstract Syntax Tree (UAST), explains its design, core components, open‑source releases, and benchmark results that demonstrate superior coverage, precision, and performance over existing single‑ and multi‑language tools.
Background and Motivation
Large‑scale applications frequently use multiple programming languages (over 80% of big systems use more than one language), which creates difficulties for static application security testing (SAST). Existing taint‑analysis tools are either single‑language (e.g., FlowDroid, PySA) requiring separate engines per language, or multi‑language (e.g., CodeQL, Joern, WALA) that lose precision or are hard to extend.
YASA Overview
YASA (Yet Another Static Analyzer) is an industrial‑grade framework that provides unified multi‑language static taint analysis. Its core is the Unified Abstract Syntax Tree (UAST) , an offline intermediate representation that normalizes syntax across languages while preserving language‑specific semantics.
Key Design Elements
1. UAST Specification
The UAST defines 54 node types grouped into:
General semantic nodes (35) : constructs shared by at least two languages (e.g., a unified RangeStatement that covers JavaScript for‑of, Python for‑in, Go range).
Language‑specific nodes (19) : constructs that cannot be losslessly unified (e.g., Python YieldExpression, Go ChanType).
Reducible nodes : syntactic sugar such as list comprehensions or arrow functions, which are desugared into equivalent UAST sequences.
Conversion from each language’s AST to UAST follows three steps: direct mapping, structural transformation, and desugaring.
2. Point‑to Analyzer
The analyzer implements inter‑procedural pointer analysis that is:
Context‑sensitive : state cloning with a bounded call‑stack distinguishes different call sites.
Path‑sensitive
Path‑sensitive : execution forks at conditional branches and merges after both true/false paths.
Field‑sensitive : object fields are tracked individually to avoid over‑approximation.
The core analysis is language‑agnostic (52 shared semantic handlers, 77.3% of total handlers). Language‑specific extensions handle Python inheritance, JavaScript prototype chains, Java Lombok code generation, and Go interface satisfaction.
3. Taint Checker
The checker consists of a base taint‑propagation engine plus plug‑ins for frameworks. Base rules cover assignment, container‑field, and function‑call propagation, as well as language‑specific cases such as JavaScript prototype/Promise propagation and Go channel propagation. Plug‑ins are provided for 11 major frameworks (Flask, Django, FastAPI, Express, Egg.js, Node.js, Spring, Gin, gRPC, Beego, Gorilla Mux), modeling routing, middleware, and data binding without modifying the core engine.
Open‑Source Release
Repositories:
UAST specification and parsers: https://github.com/antgroup/YASA-UAST Analysis engine:
https://github.com/antgroup/YASA-EngineEvaluation and Results
Evaluation used the xAST industrial benchmark (851 test cases,
https://github.com/alipay/ant-application-security-testing-benchmark) covering Java, JavaScript, Python, and Go. Compared with six single‑language tools and two multi‑language frameworks (CodeQL, Joern), YASA achieved:
4%–22% higher syntax‑feature support pass rate.
14%–26% higher taint‑analysis sensitivity.
77.3% reuse of language‑agnostic semantic rules.
Only 16%–27% incremental effort to add a new language.
Average scanning speed of 31.8 KLOC/min, 3.4× faster than CodeQL and 1.9× faster than Joern.
All discovered vulnerabilities were responsibly reported and fixed.
Figures
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
