What is XML Injection?
XML injection occurs when user-supplied data is embedded in an XML document without proper encoding of XML special characters (<, >, &, ", '). An attacker who can inject these characters can alter the XML structure — adding new elements, overwriting existing data, breaking the document schema, or in some contexts manipulating application logic that parses the XML.
XML injection is distinct from XXE (XML External Entity) injection, which targets the XML parser. XML injection targets the XML document’s data layer — it’s closer in nature to SQL injection but operates on XML structure rather than SQL syntax.
Common targets include SOAP web services, XML-based configuration storage, XML datastores, and logging systems that write data in XML format.
How exploitation works
A user registration system stores user data in an XML format:
<user>
<name>Alice</name>
<role>user</role>
</user>
The application constructs this by concatenating user input:
String xml = "<user><name>" + userName + "</name><role>user</role></user>";
An attacker submits: Alice</name><role>admin</role><name>Alice
The resulting XML becomes:
<user>
<name>Alice</name>
<role>admin</role> <!-- Injected — overrides the intended role -->
<name>Alice<role>user</role></name>
</user>
Depending on how the application parses this, the attacker may gain admin role assignment.
Vulnerable code examples
Java — XML construction via concatenation
// VULNERABLE: User input concatenated into XML string
public String buildUserXml(String name, String email) {
return "<user>" +
"<name>" + name + "</name>" + // Injection point
"<email>" + email + "</email>" +
"<role>user</role>" +
"</user>";
}
PHP — SOAP request construction
// VULNERABLE: User input in SOAP XML body without encoding
$xml = "<soapenv:Body>
<searchUser>
<username>" . $_POST['username'] . "</username>
</searchUser>
</soapenv:Body>";
Secure code examples
Java — DOM-based XML construction
// SECURE: Build XML via DOM API — no string concatenation, auto-escapes values
public String buildUserXml(String name, String email) throws Exception {
DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
Document doc = factory.newDocumentBuilder().newDocument();
Element root = doc.createElement("user");
doc.appendChild(root);
Element nameEl = doc.createElement("name");
nameEl.setTextContent(name); // Auto-escaped — <>&'" are encoded
root.appendChild(nameEl);
Element emailEl = doc.createElement("email");
emailEl.setTextContent(email);
root.appendChild(emailEl);
// Serialize to string
TransformerFactory tf = TransformerFactory.newInstance();
StringWriter writer = new StringWriter();
tf.newTransformer().transform(new DOMSource(doc), new StreamResult(writer));
return writer.toString();
}
C# — XElement construction
// SECURE: XElement auto-encodes text content — injection impossible via SetValue
public string BuildUserXml(string name, string email)
{
var doc = new XElement("user",
new XElement("name", name), // name is stored as text — < > & are encoded
new XElement("email", email),
new XElement("role", "user")
);
return doc.ToString();
}
What Offensive360 detects
- String concatenation in XML construction — User input concatenated into XML strings rather than set via DOM/API methods
- Missing XML encoding — Absence of
StringEscapeUtils.escapeXml(),SecurityElement.Escape(), or equivalent before embedding in XML - SOAP parameter injection — User input embedded in SOAP envelope XML without encoding
- XML template string building — Template literals or format strings constructing XML with tainted data
Remediation guidance
-
Use DOM/API-based XML construction — Never build XML by concatenating strings. Use
DocumentBuilder,XElement,lxml.etree, orSimpleXMLElementwhich handle encoding automatically. -
Encode XML special characters — If string-based XML construction is unavoidable, encode
<→<,>→>,&→&,"→",'→'in all user input. -
Validate against an XML schema — Use XSD schema validation after constructing XML to detect structural anomalies before processing.
-
Use typed SOAP clients — Generate SOAP clients from WSDL definitions (JAX-WS, WCF) so parameter serialization is handled by the framework rather than manual XML construction.
-
Apply input validation — Reject inputs containing
<,>, or&at the API boundary when these characters have no legitimate use in the field.