Information Security 14 min read

Detecting Apache Commons Text RCE (CVE-2022-42889) with the Doop Static Analysis Framework

The Vivo Internet Security Team demonstrates how to extend the Doop static analysis framework with custom Datalog rules to detect the Apache Commons Text CVE‑2022‑42889 remote code execution vulnerability by tracing taint from StringSubstitutor.replace to ScriptEngine.eval, producing source‑sink CSV reports and showcasing Doop’s extensibility for security research.

vivo Internet Technology
vivo Internet Technology
vivo Internet Technology
Detecting Apache Commons Text RCE (CVE-2022-42889) with the Doop Static Analysis Framework

This article, authored by the Vivo Internet Security Team, explains how to extend the Doop static analysis framework to detect the Apache Commons Text remote code execution vulnerability (CVE-2022-42889). It is divided into four parts: an introduction to Doop, a brief overview of the Commons Text vulnerability, the design of custom Doop rules for taint‑flow detection, and a summary.

1. Doop static analysis framework

Doop is an open‑source static analysis platform developed by the Plast‑Lab at the University of Athens. It combines a Groovy‑based front‑end that generates facts with a Datalog engine (originally LogicBlox, now Soufflé) to perform context‑insensitive, 1‑call‑site‑sensitive, and heap‑sensitive analyses. The architecture consists of a fact generator, Jimple IR generation (via Soot/WALA), and a Datalog rule set that can be extended with add‑ons for information‑flow, Spring, reflection, etc.

2. Commons Text RCE vulnerability

Apache Commons Text 1.9 contains a vulnerable ScriptStringLookup interpolator that executes code via ScriptEngine.eval when a crafted string is passed to StringSubstitutor.replace . The exploit chain is shallow but involves casting, lexical analysis of the input string, and several string‑handling methods, making it a suitable target for static analysis.

3. Custom Doop rules for taint‑flow detection

The analysis selects StringSubstitutor.replace as the source and ScriptEngine.eval as the sink. Doop is run in --app‑only mode to ignore JDK methods and improve performance. The article first introduces Doop’s Datalog syntax (relations, facts, and rules) and then shows how to add missing taint‑transfer rules, such as for String.split and custom annotations.

Key command line used:

-a context‑insensitive --app‑only --information‑flow spring --fact‑gen‑cores 4 -i docs/commons-text.jar --platform java_8 --stats none

Example of adding a new transfer rule:

BaseToRetTaintTransferMethod("<java.lang.String: java.lang.String[] split(java.lang.String,int)&gt.")

Custom entry‑point and mock‑object definitions are added to recognize the new annotations:

EntryPointClass(?type) :- Type_Annotation(?type, "org.apache.commons.text.TestctxTaintedClassAnnotation").

MockObject(?mockObj, ?type) :- Type_Annotation(?type, "org.apache.commons.text.TestctxTaintedClassAnnotation").

The sink rule is defined as:

LeakingSinkMethodArg("default", 0, method) :- isMethod(method), match("<javax.script.ScriptEngine: java.lang.Object eval\(.*\)>", method).

Additional rules ( OptTaintedtransMethodInvocationBase , MaytaintedInvocationInfo , VarIsTaintedFromVar , LeakingSinkVariable ) are introduced to propagate taint through method invocations and to capture the final source‑sink pairs.

Running the modified Doop produces two CSV files of interest:

LeakingTaintedInformation.csv – lists source‑sink flows, e.g., org.apache.commons.text.StringSubstitutor.replace → javax.script.ScriptEngine.eval .

AppTaintedVar.csv – shows tainted variables inside the application (e.g., variableName , buf , resolver ).

These results confirm that Doop can successfully identify the Commons Text RCE taint flow.

4. Summary

Doop is a powerful, extensible static analysis framework capable of precise taint‑flow detection when custom Datalog rules are added. Although it requires substantial resources for large projects and its output is not always user‑friendly, its algorithmic depth and similarity to tools like CodeQL make it suitable for advanced security research and DevSecOps integration.

information securityStatic AnalysisApache Commons TextCVE-2022-42889DatalogdoopTaint Analysis
vivo Internet Technology
Written by

vivo Internet Technology

Sharing practical vivo Internet technology insights and salon events, plus the latest industry news and hot conferences.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.