XML attack and defense

TLDR

On Python, use defusedxml[1] for safe XML processing.

On .NET, try to use frameworks v4.5.2 and later [2], [3]. For previous frameworks, try to customize XML parser (XmlReader,  XmlResolver…etc) with these restricted settings:

  • Turn off DTD on XmlReader
  • Limit size + set a timeout for GetEntity(Uri) on XmResolver
  • No localhost resource lookup on XmlResolver

The incident

So, it is a Saturday night and you’re indulging yourself with a good book (like this one) when suddenly, your phone rings.

Ding! The chime from your phone breaks the serene quietness of your home office.

The server watchdog sends you some alert! You quickly and swiftly get to your desk. A few keystrokes and an SSH session later, you’re scanning the system logs for CPU, memory and network connection spike.

CPU usage? It’s good, less than 20%. Number of HTTP connections? Only a few.

But, server memory usage is off the chart. All of those precious GB of RAM has been filled up. Why? You start to sweat.

Another deeper look into the logs and you notice on the backend clusters, the component to parse XML inputs, is choking up the server. It is using almost 99% of the memory.

Now you realize you are a victim of the XML DoS attack!

Ok, this might not be a true story (who use XML processing nowadays, right? All onboard the JSON hyper-train!

But joke aside, it is totally a possible scenario on systems that parse XML content. Traditionally, most OOP language has built-in XML parser: Python has xml.etree, sax, dom & xmlrpc .NET has XmlReader.

They are convenient until, well, this scenario happens. In this post, I’ll focus on the vulnerabilities of XML parsing with these languages, how to defend against them.

Why does it happen?

XML Attack vectors

So XML has this cool feature of inline document type definition (DTD) which helps to define an entity inside the same XML document. Because of that, XML content can be crafted to create Denial of Attack on serverside by leveraging XML DTD to create memory bombs and by using XML External Entity expansion (XEE) to cause an extended delay in getting the response and also can cause memory exhaustion [4].

OWASP has a much more detailed analysis of the possible XML attack vectors. 

Some forms of XML-based DoS attack are:

  • Billion laughs attack & its variation (quadratic blowup): imagine the XML tree with root has multiple branches, each branch, in turn, is another root that has multiple branches. At the leaf level, it might contain only “lol” (hence laugh attacks). It only needs 9 levels with each level contains 10 entities to create a “billion laughs”.Example: this XML chunk will expand to 1 billion “lol” if the parser is not careful enough:
<!DOCTYPE lolz [
 <!ENTITY lol "lol">
 <!ELEMENT lolz (#PCDATA)>
 <!ENTITY lol1 "&lol;&lol;&lol;&lol;&lol;&lol;&lol;&lol;&lol;&lol;">
 <!ENTITY lol2 "&lol1;&lol1;&lol1;&lol1;&lol1;&lol1;&lol1;&lol1;&lol1;&lol1;">
 <!ENTITY lol3 "&lol2;&lol2;&lol2;&lol2;&lol2;&lol2;&lol2;&lol2;&lol2;&lol2;">
 <!ENTITY lol4 "&lol3;&lol3;&lol3;&lol3;&lol3;&lol3;&lol3;&lol3;&lol3;&lol3;">
 <!ENTITY lol5 "&lol4;&lol4;&lol4;&lol4;&lol4;&lol4;&lol4;&lol4;&lol4;&lol4;">
 <!ENTITY lol6 "&lol5;&lol5;&lol5;&lol5;&lol5;&lol5;&lol5;&lol5;&lol5;&lol5;">
 <!ENTITY lol7 "&lol6;&lol6;&lol6;&lol6;&lol6;&lol6;&lol6;&lol6;&lol6;&lol6;">
 <!ENTITY lol8 "&lol7;&lol7;&lol7;&lol7;&lol7;&lol7;&lol7;&lol7;&lol7;&lol7;">
 <!ENTITY lol9 "&lol8;&lol8;&lol8;&lol8;&lol8;&lol8;&lol8;&lol8;&lol8;&lol8;">
]>
<lolz>&lol9;</lolz>

More from Wikipedia and OWASP.

  • XML external entity (XEE) attack: image the XML content has an external definition of the entity, and the URI which is configured in the DTD is malformed, which could cause an extremely slow response, or cause the response to be overloaded with a gigantic amount of data.
    More from Wikipedia, and  OWASP

    Example: 
    This XML chunk will cause the parser to access a local resource which might not return:
<?xml version="1.0" encoding="ISO-8859-1"?>
<!DOCTYPE foo [ 
<!ELEMENT foo ANY> 
<!ENTITY xxe SYSTEM "file:///dev/random">
]>
<foo>&xxe;</foo>

Language vulnerabilities

On Python

This is a summary of existing XML library on Python and its vulnerability against some attacks

Attack vector sax etree minidom pulldom xmlrpc
Billion laughs Vulnerable Vulnerable Vulnerable Vulnerable Vulnerable
Quadratic blowup Vulnerable Vulnerable Vulnerable Vulnerable Vulnerable
External entity expansion Safe (d) Safe (a) Safe (b) Safe (d) Safe (c)
DTD retrieval Safe (d) Safe Safe Safe (d) Safe
Decompression bomb Safe Safe Safe Safe Vulnerable

  1. xml.etree.ElementTree doesn’t expand external entities and raises a ParserError when an entity occurs.
  2. xml.dom.minidom doesn’t expand external entities and simply returns the unexpanded entity verbatim.
  3. xmlrpclib doesn’t expand external entities and omits them.
  4. Since Python 3.7.1, external general entities are no longer processed by default.

On .NET

Most .NET frameworks prior to 4.5.2 are vulnerable to XML attacks which require extra handling[5]. For more detailed analysis, read this OWASP cheatsheet.

Attack vector Frameworks prior to 4.5.2 .NET 4.5.2 and later
Billion laughs Vulnerable by default,
Safe with extra handling.
Safe
Quadratic blowup Vulnerable by default,
Safe with extra handling.
Safe
External entity expansion (XEE) Vulnerable by default,
Safe with extra handling.
Safe
DTD retrieval Vulnerable by default,
Safe with extra handling.
Safe
Decompression bomb Vulnerable by default,
Safe with extra handling.
Safe

And other XML Parsers

XML Parser Safe by default?
LINQ to XML Yes
XmlDictionaryReader Yes
XmlDocument
…prior to 4.5.2 No
…in versions 4.5.2+ Yes
XmlNodeReader Yes
XmlReader Yes
XmlTextReader
…prior to 4.5.2 No
…in versions 4.5.2+ Yes
XPathNavigator
…prior to 4.5.2 No
…in versions 4.5.2+ Yes
XslCompiledTransform Yes

Defense Mechanism

With these possible attack, it’s always good to follow the rule of thumb in XML processing:

  • Always treat customer free-text input as malicious (sorry but not sorry). Therefore, always treat external XML as untrusted content -> always validate before processing.
  • Should treat internal XML as semi-trusted -> should validate before use.
  • May treat XML created by your own serverside code as safe -> may validate before use.

As a matter of fact, this rule of thumb applies to not just XML parsing but to any kind of textual content parsing: string, HTML, even JSON. If it comes from users, you have to treat it as malicious.

Even names, or else this happens:

sql injection hack
Source: xkcd

 

 

 

 

 

 

 

So, now that I have to treat XML content as malicious and do validation before parsing it, what’s next?

For Python

This comes straight from Python SDK: for safe XML parsing, use a safe Python library such as defusedxml.

For .NET

.NET 4.5.2 and later

Most of these vulnerabilities have been patched in .NET 4.5.2 and later, so whenever you can, consider upgrade the runtime to newer versions.

Before .NET 4.5.2

Defense against XML bombs

In .NET FX 3.5 and earlier: simply disable altogether the use of inline DTD schemas in your XML parsing objects [2]

XmlTextReader reader = new XmlTextReader(stream); 
reader.ProhibitDtd = true;

or

XmlReaderSettings settings = new XmlReaderSettings(); 
settings.ProhibitDtd = true; 
XmlReader reader = XmlReader.Create(stream, settings);

From .NET 4.0 and later, these behaviors (DTD lookup, Resolvers have been disabled by default.

Defense against XEE (XML External Entity) Attacks [3]

  • Use a custom XmlResolver and set request timeout for GetEntity() method
  • Limit the response size of XmlResolver
  • Restrict localhost resource retrievals.

 

Other LANGUAGES

OWASP cheatsheet against XEE is very well-documented and it has some recommendations for C/C++,  Java, PHP, and iOS.

Conclusion

XML and XML schema specifications include multiple security flaws, but these specs also offer ways to circumvent these flaws.

In our practice, we write our own wrappers on top of these XML parsers to handle the quirks and call them with unique Safe names (XmlReader -> XmlReaderSafe, XmlResolver -> XmlResolverSafe). At pull request time, code reviewer will be able to spot unsafe uses of the default libraries and call them out. At compiling & runtime, there will be code analysis and scanning tools to look at unsafe existence and generate warnings.

These processes and tools work really well when combined with the first-principle thinking and if you are reading this line, you probably read my Rule of Thumb in processing XML above.

And now, back to the JSON hyper-train. Oh wait!

References

[1]: Python 3 XML processing module, https://docs.python.org/3/library/xml.html

[2]: Security Briefs – XML Denial of Service Attacks and Defenses, https://docs.microsoft.com/en-us/archive/msdn-magazine/2009/november/xml-denial-of-service-attacks-and-defenses

[3]: Resolving External Resources, https://docs.microsoft.com/en-us/dotnet/standard/data/xml/resolving-external-resources?view=netframework-4.7

[4]: OWASP – XML-based attacks, https://www.owasp.org/images/5/58/XML_Based_Attacks_-_OWASP.pdf

[5]: OWASP – XML External Entity Prevention Cheat Sheet, https://owasp.org/www-project-cheat-sheets/cheatsheets/XML_External_Entity_Prevention_Cheat_Sheet

1 thought on “XML attack and defense

  1. Dio Phung Post authorReply

    Will share the Safe libraries on Github soon. Hang on tight.

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.