Web application Unicode character transforms

Iain Lewis 03 Jan 2014

Who remembers the old Microsoft IIS “Unicode” vulnerability? Getting into IIS servers used to be so easy!

This could be easily checked for by using the following URL:\

The MS00-057 “File Permission Canonicalization vulnerability existed if the “/” character was encoded in Unicode as “%c0%af”, the URL will pass the directory traversal security check, as it did not contain any “../” patterns.

In 2001, the fixed Unicode decode operation was defeated by the presence of another decoding operation after the security check. So, if you encoded your slash character, and then encoded the encoded version of the slash, you could fool the security check with what looks to be harmless characters!

The use of Unicode characters to bypass filtering techniques is not new. However we have recently seen a rise in its use, specifically for exploiting cross site scripting and SQL injection vulnerabilities.

As web applications now have to cater for a global audience to open up as many markets as possible. We have seen a number of applications being designed and implemented for US keyboard layouts and then being rushed into accounting for Unicode characters. This opens up several new security filtering issues that potentially may have not been implemented correctly.

The recent Spotify vulnerability is an example of this, and demonstrates a very clever way to gain access to an existing user account. By creating an account with a Unicode converted existing username and requested a password reset, meant that the password reset would be sent from the existing account – due to a canonicalization issue.

The recently released “Unicode Security Guide” by Chris Weber is a good overview of the other potential pitfalls and recommendations.

For the sake of this blog post, I’ll just mention one of the examples he mentions: a “best-fit” Unicode conversion XSS issue:

    1. An input validation filter rejects characters such as <, >, ‘, and ” in a Web-application accepting UTF-8 encoded text.
    2. An attacker sends in a U+FF1C FULLWIDTH LESS-THAN SIGN < in place of the ASCII <.
    3. The attacker’s input looks like: <script>
    4. After passing through the XSS filter unchanged, the input moves deeper into the application.
    5. Another API, perhaps at the data access layer, is configured to use a different character set such as windows-1252.
    6. On receiving the input, a data access layer converts the multi-byte UTF-8 text to the single-byte windows-1252 code page, forcing a best-fit conversion to the dangerous characters the original XSS filter was trying to block.
    7. The attacker’s input successfully persists to the database.

Other potential vulnerabilities include normalization, canonicalization, over-consumption and buffer overflows.

Application frameworks can help to a point, but their defences should never be relied upon, particularly as these defences are very much in their infancy. Throw some Unicode characters at your application, see what happens!