Cross-site scripting (XSS) is a common web application vulnerability that allows attackers to inject malicious scripts into web pages. XSS can be used for various purposes, such as stealing cookies, session hijacking, phishing, defacement, and more.
However, many web applications have implemented XSS filters to prevent or mitigate XSS attacks. XSS filters are mechanisms that scan user input and output for suspicious strings or characters that may indicate an XSS attempt. They may block, encode, sanitize, or remove such input or output.
But XSS filters are not perfect. They can be bypassed by using various techniques that exploit their weaknesses or limitations. In this blog post, we will show you a practical example of how to bypass an XSS filter using character encoding tricks.
Character Encoding Tricks
To bypass filters that rely on scanning text for specific suspicious strings, attackers can encode any number of characters in a variety of ways:
These encoding methods can be combined or nested to create more complex encodings that may evade simple filters.
Example Suppose we have a web application that allows users to submit comments on blog posts. The application has an XSS filter that scans user input for the following strings:
- `<script>`
— `<img`
— `<div`
— `<iframe`
— `<svg`
If any of these strings are found in the user input, the filter will block the submission and display an error message. We want to inject a simple alert script into our comment:
```html
<script>alert(1)</script>
If we submit this comment directly, the filter will catch it and block it. So we need to encode it somehow.
One possible way is to use HTML encoding for some of the characters:
<script>alert(1)</script>
This may work if the filter only looks for exact matches of the strings. However, some filters may decode HTML entities before scanning them. So we need something more sophisticated.
Another possible way is to use URL encoding for some of the characters:
%3Cscript%3Ealert(1)%3C/script%3E
This may work if the filter only looks for plain text strings. However, some filters may decode URL-encoded values before scanning them. So we need something more advanced.
A third possible way is to use hex encoding for some of the characters:
\\x3cscript\\x3ealert(1)\\x3c/script\\x3e
This may work if the filter only looks for alphanumeric strings. However, some filters may decode hex-encoded values before scanning them. So we need something more clever.
\\u003cscript\\u003ealert(1)\\u003c/script\\u003e
This may work if the filter only looks for ASCII strings. However, some filters may decode Unicode-encoded values before scanning them. So we need something more tricky.
A fifth possible way is to use a combination of different encodings for different parts of our script:
%26#x6c;t;\\x73cript>\\u0061lert(1)%26#x6c;t;/\\x73cript>
This comment uses combination of different encoding techniques to represent the characters of the original string in a way that may evade detection by security filters.
It begins with the ampersand character “&” represented in URL encoding as %26
. Then, the string "l" is represented in hexadecimal HTML entity encoding as #x6c;
. The next part of the string is the word "script", where the "s" is represented in hexadecimal ASCII encoding with a backslash escape character as \\x73
, and the remaining characters are represented in HTML entity encoding as cript>
.
The next character in the string is the lowercase letter “a”, which is represented in Unicode encoding with a backslash escape character as \\u0061
. The rest of the string, "lert(1)", is represented in ASCII encoding.
The string then ends with the ampersand character “&” again represented in URL encoding as %26
, followed by the string "t;/" represented in ASCII encoding, and the string "script>" represented in HTML entity encoding.
This type of encoding can be used by attackers to bypass security filters that scan for specific strings or characters in user input. However, it is important to note that using these techniques to exploit vulnerabilities in web applications is illegal and unethical. To protect against XSS attacks, web developers should implement secure coding practices, such as input validation, output sanitization, and parameterized queries.