Skip to main content

Bypass XSS filters using data URIs

Data URI, defined by RFC 2397, is a smart way of embedding small files in line in HTML documents. Instead of linking to a file stored locally on the server, the file is provided within the URL itself as a base64-encoded string of data preceded by a mime-type. In this article, we will discuss how data URIs can be effectively used to perform Cross-Site Scripting (XSS) attacks. The information in this article is not new. This is our attempt to explore different ways by which data URI can be used to perform XSS.

Introduction to data URIs

Data URIs are self-contained links. They contain document data and metadata entirely encapsulated in the URI. As ‘data:’ URIs are completely self-contained, they do not include a filename. When presented with ‘data:’ URIs with MIME types that trigger the save dialog, like ‘application/octet-stream’, browser attempts to save the URI content as a file on the local file system.

The mediatype is a MIME-type string, such as "image/jpeg" for a JPEG image file. If omitted, it defaults to text/plain;charset=US-ASCII. If the data is textual, you can simply embed the text (using the appropriate entities or escapes based on the enclosing document’s type). Otherwise, you can specify base64 to embed base64-encoded binary data.

Images within HTML documents are traditionally linked with a tag such as this one:
<img src="images/myimage.gif">

In this case, the image tag src attribute specifies an external resource. While rendering the page, the browser sends an HTTP request for every external resource. With Data URIs, the image data becomes part of the HTML document itself, as exemplified by the tag below:
<img src=" 

You can see the same image rendered by the browser from the base64-encoded string in the picture shown below.

Limitations of data URI

The "data:" URI scheme is only useful for short values. Note that some applications that use URLs may impose a length limit. For example, although Mozilla supports data URIs of essentially unlimited length, browsers are not required to support any particular maximum length of data. The Opera browser limits data URIs to around 4100 characters.

Web browser support

Data URIs are supported by most browsers, including (but not limited to) Firefox (and other Gecko-based browsers), Apple’s Safari, Konqueror and Opera. Microsoft’s Internet Explorer 7 and below, however, do not currently support it. Internet Explorer 8 and above only support data URIs for images in CSS.

Why data URIs are a good idea

There are several circumstances where a Data URI may be useful, as opposed to traditional external resource referencing:
  • While browsing a secure HTTPS website, web browsers commonly require that all elements of a web page be downloaded over secure connections; otherwise the user will be notified of reduced security due to a mixture of secure and insecure elements. HTTPS requests have significant advantages over common HTTP requests, so embedding data in data URIs may improve speed in this case.
  • Web browsers are usually configured to make only a certain number of concurrent HTTP connections to a domain; thus, inline data frees up a download connection for other content.
  • The size of the image is so small that the overhead of a whole HTTP request is saved by placing the image inline within the HTML.
  • Used in environments where access to external resources is troublesome or limited.
  • Images are dynamically generated by a server-side program on a per-visit basis.
Data URIs also have their disadvantages. Data URIs can be maliciously used to exploit web application vulnerabilities. For example, we can use this feature to carry out Cross-Site Scripting on the web application. This aspect of data URIs is discussed in detail in this article.

XSS using data URI

XSS is a type of security vulnerability found in web applications that enables malicious attackers to inject client-side script into web pages viewed by other users. An exploited Cross-Site Scripting vulnerability can be used by attackers to bypass access controls such as the same origin policy, steal sensitive information, install Trojans, etc.

They can achieve this by crafting malicious web pages containing either HTML or script code that utilizes the ‘data:’ URI scheme. Using a Data URI is an efficient and viable alternative, as we shall see in the example given below.

Let us imagine that we have a user input that gets reflected in the response within a bold tag.
<b>Welcome USER_INPUT</b>

The web application has blacklisted the use of the following keywords in the user input.
  • script, javascript, alert, round brackets, double quotes, colon.
To execute a javascript, we must use the <script> tag. Since the application validates user input against specific keywords, we cannot use the <script> tag. Hence, in order to execute a javascript, let’s try using a data URI. We can now inject the following payload:
<object data="data:text/html;base64,PHNjcmlwdD5hbGVydCgiSGV


The execution of the injected script is shown below.


Here, we used the data URI payload as a value assigned to the ‘data’ attribute of the ‘object’ tag. The <object> tag is used to include objects such as images, audio, videos, Java applets, ActiveX, PDF, and Flash. The ‘data’ attribute of the object tag defines a URL that refers to the object’s data.

Data in the "data:" URI is encoded as a base64 string:
  • Base64-encoded payload: HNjcmlwdD5hbGVydCgiSGVsbG8iKTs8L3NjcmlwdD4=
  • Base64-decoded payload: <script>alert("Hello");</script>
When the browser loads the object tag, it loads an object (in our case, it’s a javascript) assigned to its data attribute. This causes execution of our javascript. We were able to bypass the blacklist filter because of the base64-encoded payload. Can we use the IFRAME tag instead of the Object?

This technique also allows the dynamic creation of files of different MIME types. An attacker can create DOC and PDF files that may contain malicious payload for exploiting various overflow vulnerabilities. An attacker may also create a backdoor, which will either initiate a new connection or listens for a new connection. Generating Netcat might be an option. For example, the following link opens a self-contained word document.


The execution of the script is shown below.

When to use data URI

  • One of the solutions implemented to protect web applications against XSS is the keyword blacklisting. Web application developers blacklist special keywords such as javascript, script, alert, round brackets, etc. Data URI allows us to use base64-encoded string as our injection payload. This helps us to bypass filters based on the blacklist approach. In this case, the use of a Data URL is entirely beneficial.
  • Web applications also verify the user input for various encodings. Data URI allows us to specify the character encoding of the data. We can encode data with the charset for which the web application has not applied any validation. This helps us to bypass the validation of user input applied to specific character encodings only.
    Generate POP-UP using Charset UTF -7:
    The execution of the script is shown below.
  • Data URI allows us to specify the character encoding of the data. Since the files are generated on the fly at the time of link execution, this may also be used to bypass the users running antivirus protection as the data content is already on the user’s browser.

Testing scope for XSS using data URIs

Data URI can be used as a payload with very few HTML tags. We are limited to only those tags that load the external resources by generating a new request.
A few HTML tags that can carry data URI payload are listed below:
  • Anchor Tag
  • IFRAME Tag
  • Object Tag
  • Image Tag
Although malicious files can be created on the fly, users must save and execute those files. The effect of using long "data" URLs in applications is not yet tested.


Although the data URI can be used with few HTML tags only, it helps us to bypass blacklist-based XSS filters. Data URI allows the dynamic creation of different MIME-type files. Sites which use firewall proxies to disallow the retrieval of certain media types (such as application script languages or types with known security problems) will find it difficult to screen against the inclusion of such types using the "data" URL scheme.