Share !

I'm Tav, a 29yr old from London. I enjoy working on large-scale social, economic and technological systems.

Follow
Contact
Sanitising JSONP Callback Identifiers For Security

Code: http://github.com/tav/scripts/blob/master/validate_jsonp.py

Whilst web developers have started to take XSS and CSRF seriously, a tiny detail often goes unnoticed — JSONP callback values.

The idea behind JSONP is to allow for composable client-side apps by allowing for cross-domain data fetching. It's a brilliantly simple idea and the callback parameter is used to specify the name of a callback function.

For example, FriendFeed would return the following when you supply it's APIs with a callback=handleFriendFeedData parameter:

handleFriendFeedData({..JSON-Data...})

Unfortunately, most services will also pass through pretty much anything — allowing attackers to inject malicious Javascript, e.g.

This would allow for arbitrary Javascript to be executed in the context of the caller site. So, what's the big deal you ask? Surely you only make JSONP calls to sites that you trust?

Well, imagine a client-side web application which allows arbitrary data sources to be added. It's not too implausible a future where instead of adding RSS feeds to an RSS Reader, one adds JSON feeds to such an application…

In this context, if the user is tricked into adding a maliciously crafted URL, then everything from their identity to their data is accessible — not to mention being able to abuse the account to spread a worm even!

I believe the possibilities of using JSONP haven't been explored much yet, and for it to go further in the context of interesting mashups/applications, it's important that it not be a security hole.

And, combined with other vulnerabilities, the seemingly innocuous callback value could have devastating effects!

My apologies for picking on FriendFeed — it could've just as well have been a thousand other services. As for the few sites that do restrict the callback value, one is often left wondering if the developers knew what they were doing.

For example, delicious restricts it to the character set 0-9 a-z A-Z ()[],._-+=/|\~?!#$^*: '". But, besides being totally arbitrary and insecure, it disallows perfectly valid characters in Javascript identifiers such as Unicode letters.

So, failing to find any decent JSONP callback sanitisation code out there, I ended up writing one myself:

It accepts callback values of the following formats:

  • A valid Javascript Identifier: <identifier>
  • Dot notation: <identifier>.<identifier>
  • Array index lookup: <identifier>[<integer>]

Here are some examples:

>>> is_valid_jsonp_callback_value('somefunction')
True

>>> is_valid_jsonp_callback_value('someobject.somemethod')
True

>>> is_valid_jsonp_callback_value('somearray[12345]')
True

It accepts all valid Javascript identifiers as defined in the ECMAScript specification:

>>> is_valid_jsonp_callback_value(u'Stra\u00dfe')
True

>>> is_valid_jsonp_callback_value(r'\u0062oo')
True

>>> is_valid_jsonp_callback_value('$42.ajaxHandler')
True

But disallows everything else:

>>> is_valid_jsonp_callback_value('alert(document.cookie)')
False

>>> is_valid_jsonp_callback_value(r'\u0020')
False

Here's the current source code — which, as usual, I've placed into the public domain. Let me know if you find it useful or would like to make any changes. Thanks!

# -*- coding: utf-8 -*-

# Placed into the Public Domain by tav <tav@espians.com>

"""Validate Javascript Identifiers for use as JSON-P callback parameters."""

import re

from unicodedata import category

# ------------------------------------------------------------------------------
# javascript identifier unicode categories and "exceptional" chars
# ------------------------------------------------------------------------------

valid_jsid_categories_start = frozenset([
'Lu', 'Ll', 'Lt', 'Lm', 'Lo', 'Nl'
])

valid_jsid_categories = frozenset([
'Lu', 'Ll', 'Lt', 'Lm', 'Lo', 'Nl', 'Mn', 'Mc', 'Nd', 'Pc'
])

valid_jsid_chars = ('$', '_')

# ------------------------------------------------------------------------------
# regex to find array[index] patterns
# ------------------------------------------------------------------------------

array_index_regex = re.compile(r'\[[0-9]+\]$')

has_valid_array_index = array_index_regex.search
replace_array_index = array_index_regex.sub

# ------------------------------------------------------------------------------
# javascript reserved words -- including keywords and null/boolean literals
# ------------------------------------------------------------------------------

is_reserved_js_word = frozenset([

'abstract', 'boolean', 'break', 'byte', 'case', 'catch', 'char', 'class',
'const', 'continue', 'debugger', 'default', 'delete', 'do', 'double',
'else', 'enum', 'export', 'extends', 'false', 'final', 'finally', 'float',
'for', 'function', 'goto', 'if', 'implements', 'import', 'in', 'instanceof',
'int', 'interface', 'long', 'native', 'new', 'null', 'package', 'private',
'protected', 'public', 'return', 'short', 'static', 'super', 'switch',
'synchronized', 'this', 'throw', 'throws', 'transient', 'true', 'try',
'typeof', 'var', 'void', 'volatile', 'while', 'with',

# potentially reserved in a future version of the ES5 standard
# 'let', 'yield'

]).__contains__

# ------------------------------------------------------------------------------
# the core validation functions
# ------------------------------------------------------------------------------

def is_valid_javascript_identifier(identifier, escape=r'\u', ucd_cat=category):
"""Return whether the given ``id`` is a valid Javascript identifier."""

if not identifier:
return False

if not isinstance(identifier, unicode):
try:
identifier = unicode(identifier, 'utf-8')
except UnicodeDecodeError:
return False

if escape in identifier:

new = []; add_char = new.append
split_id = identifier.split(escape)
add_char(split_id.pop(0))

for segment in split_id:
if len(segment) < 4:
return False
try:
add_char(unichr(int('0x' + segment[:4], 16)))
except Exception:
return False
add_char(segment[4:])

identifier = u''.join(new)

if is_reserved_js_word(identifier):
return False

first_char = identifier[0]

if not ((first_char in valid_jsid_chars) or
(ucd_cat(first_char) in valid_jsid_categories_start)):
return False

for char in identifier[1:]:
if not ((char in valid_jsid_chars) or
(ucd_cat(char) in valid_jsid_categories)):
return False

return True


def is_valid_jsonp_callback_value(value):
"""Return whether the given ``value`` can be used as a JSON-P callback."""

for identifier in value.split(u'.'):
while '[' in identifier:
if not has_valid_array_index(identifier):
return False
identifier = replace_array_index(u'', identifier)
if not is_valid_javascript_identifier(identifier):
return False

return True

# ------------------------------------------------------------------------------
# test
# ------------------------------------------------------------------------------

def test():
"""
The function ``is_valid_javascript_identifier`` validates a given identifier
according to the latest draft of the ECMAScript 5 Specification:

>>> is_valid_javascript_identifier('hello')
True

>>> is_valid_javascript_identifier('alert()')
False

>>> is_valid_javascript_identifier('a-b')
False

>>> is_valid_javascript_identifier('23foo')
False

>>> is_valid_javascript_identifier('foo23')
True

>>> is_valid_javascript_identifier('$210')
True

>>> is_valid_javascript_identifier(u'Stra\u00dfe')
True

>>> is_valid_javascript_identifier(r'\u0062') # u'b'
True

>>> is_valid_javascript_identifier(r'\u62')
False

>>> is_valid_javascript_identifier(r'\u0020')
False

>>> is_valid_javascript_identifier('_bar')
True

>>> is_valid_javascript_identifier('some_var')
True

>>> is_valid_javascript_identifier('$')
True

But ``is_valid_jsonp_callback_value`` is the function you want to use for
validating JSON-P callback parameter values:

>>> is_valid_jsonp_callback_value('somevar')
True

>>> is_valid_jsonp_callback_value('function')
False

>>> is_valid_jsonp_callback_value(' somevar')
False

It supports the possibility of '.' being present in the callback name, e.g.

>>> is_valid_jsonp_callback_value('$.ajaxHandler')
True

>>> is_valid_jsonp_callback_value('$.23')
False

As well as the pattern of providing an array index lookup, e.g.

>>> is_valid_jsonp_callback_value('array_of_functions[42]')
True

>>> is_valid_jsonp_callback_value('array_of_functions[42][1]')
True

>>> is_valid_jsonp_callback_value('$.ajaxHandler[42][1].foo')
True

>>> is_valid_jsonp_callback_value('array_of_functions[42]foo[1]')
False

>>> is_valid_jsonp_callback_value('array_of_functions[]')
False

>>> is_valid_jsonp_callback_value('array_of_functions["key"]')
False

Enjoy!

"""

if __name__ == '__main__':
import doctest
doctest.testmod()

You can clone the latest version from my scripts repository at GitHub:

git clone git://github.com/tav/scripts.git

Feel free to also port it into other programming languages.

Enjoy and let me know what you think, thanks!