Code: http://github.com/tav/scripts/blob/master/validate_jsonp.py
Whilst web developers have started to take XSS and CSRF seriously, a tiny detail often goes unnoticed — JSONP callback values.
The idea behind JSONP is to allow for composable client-side apps by allowing for cross-domain data fetching. It's a brilliantly simple idea and the callback parameter is used to specify the name of a callback function.
For example, FriendFeed would return the following when you supply it's APIs with a callback=handleFriendFeedData parameter:
handleFriendFeedData({..JSON-Data...})
Unfortunately, most services will also pass through pretty much anything — allowing attackers to inject malicious Javascript, e.g.
This would allow for arbitrary Javascript to be executed in the context of the caller site. So, what's the big deal you ask? Surely you only make JSONP calls to sites that you trust?
Well, imagine a client-side web application which allows arbitrary data sources to be added. It's not too implausible a future where instead of adding RSS feeds to an RSS Reader, one adds JSON feeds to such an application…
In this context, if the user is tricked into adding a maliciously crafted URL, then everything from their identity to their data is accessible — not to mention being able to abuse the account to spread a worm even!
I believe the possibilities of using JSONP haven't been explored much yet, and for it to go further in the context of interesting mashups/applications, it's important that it not be a security hole.
And, combined with other vulnerabilities, the seemingly innocuous callback value could have devastating effects!
My apologies for picking on FriendFeed — it could've just as well have been a thousand other services. As for the few sites that do restrict the callback value, one is often left wondering if the developers knew what they were doing.
For example, delicious restricts it to the character set 0-9 a-z A-Z ()[],._-+=/|\~?!#$^*: '". But, besides being totally arbitrary and insecure, it disallows perfectly valid characters in Javascript identifiers such as Unicode letters.
So, failing to find any decent JSONP callback sanitisation code out there, I ended up writing one myself:
It accepts callback values of the following formats:
- A valid Javascript Identifier: <identifier>
- Dot notation: <identifier>.<identifier>
- Array index lookup: <identifier>[<integer>]
Here are some examples:
>>> is_valid_jsonp_callback_value('somefunction')
True
>>> is_valid_jsonp_callback_value('someobject.somemethod')
True
>>> is_valid_jsonp_callback_value('somearray[12345]')
True
It accepts all valid Javascript identifiers as defined in the ECMAScript specification:
>>> is_valid_jsonp_callback_value(u'Stra\u00dfe')
True
>>> is_valid_jsonp_callback_value(r'\u0062oo')
True
>>> is_valid_jsonp_callback_value('$42.ajaxHandler')
True
But disallows everything else:
>>> is_valid_jsonp_callback_value('alert(document.cookie)')
False
>>> is_valid_jsonp_callback_value(r'\u0020')
False
Here's the current source code — which, as usual, I've placed into the public domain. Let me know if you find it useful or would like to make any changes. Thanks!
# -*- coding: utf-8 -*-
# Placed into the Public Domain by tav <tav@espians.com>
"""Validate Javascript Identifiers for use as JSON-P callback parameters."""
import re
from unicodedata import category
# ------------------------------------------------------------------------------
# javascript identifier unicode categories and "exceptional" chars
# ------------------------------------------------------------------------------
valid_jsid_categories_start = frozenset([
'Lu', 'Ll', 'Lt', 'Lm', 'Lo', 'Nl'
])
valid_jsid_categories = frozenset([
'Lu', 'Ll', 'Lt', 'Lm', 'Lo', 'Nl', 'Mn', 'Mc', 'Nd', 'Pc'
])
valid_jsid_chars = ('$', '_')
# ------------------------------------------------------------------------------
# regex to find array[index] patterns
# ------------------------------------------------------------------------------
array_index_regex = re.compile(r'\[[0-9]+\]$')
has_valid_array_index = array_index_regex.search
replace_array_index = array_index_regex.sub
# ------------------------------------------------------------------------------
# javascript reserved words -- including keywords and null/boolean literals
# ------------------------------------------------------------------------------
is_reserved_js_word = frozenset([
'abstract', 'boolean', 'break', 'byte', 'case', 'catch', 'char', 'class',
'const', 'continue', 'debugger', 'default', 'delete', 'do', 'double',
'else', 'enum', 'export', 'extends', 'false', 'final', 'finally', 'float',
'for', 'function', 'goto', 'if', 'implements', 'import', 'in', 'instanceof',
'int', 'interface', 'long', 'native', 'new', 'null', 'package', 'private',
'protected', 'public', 'return', 'short', 'static', 'super', 'switch',
'synchronized', 'this', 'throw', 'throws', 'transient', 'true', 'try',
'typeof', 'var', 'void', 'volatile', 'while', 'with',
# potentially reserved in a future version of the ES5 standard
# 'let', 'yield'
]).__contains__
# ------------------------------------------------------------------------------
# the core validation functions
# ------------------------------------------------------------------------------
def is_valid_javascript_identifier(identifier, escape=r'\u', ucd_cat=category):
"""Return whether the given ``id`` is a valid Javascript identifier."""
if not identifier:
return False
if not isinstance(identifier, unicode):
try:
identifier = unicode(identifier, 'utf-8')
except UnicodeDecodeError:
return False
if escape in identifier:
new = []; add_char = new.append
split_id = identifier.split(escape)
add_char(split_id.pop(0))
for segment in split_id:
if len(segment) < 4:
return False
try:
add_char(unichr(int('0x' + segment[:4], 16)))
except Exception:
return False
add_char(segment[4:])
identifier = u''.join(new)
if is_reserved_js_word(identifier):
return False
first_char = identifier[0]
if not ((first_char in valid_jsid_chars) or
(ucd_cat(first_char) in valid_jsid_categories_start)):
return False
for char in identifier[1:]:
if not ((char in valid_jsid_chars) or
(ucd_cat(char) in valid_jsid_categories)):
return False
return True
def is_valid_jsonp_callback_value(value):
"""Return whether the given ``value`` can be used as a JSON-P callback."""
for identifier in value.split(u'.'):
while '[' in identifier:
if not has_valid_array_index(identifier):
return False
identifier = replace_array_index(u'', identifier)
if not is_valid_javascript_identifier(identifier):
return False
return True
# ------------------------------------------------------------------------------
# test
# ------------------------------------------------------------------------------
def test():
"""
The function ``is_valid_javascript_identifier`` validates a given identifier
according to the latest draft of the ECMAScript 5 Specification:
>>> is_valid_javascript_identifier('hello')
True
>>> is_valid_javascript_identifier('alert()')
False
>>> is_valid_javascript_identifier('a-b')
False
>>> is_valid_javascript_identifier('23foo')
False
>>> is_valid_javascript_identifier('foo23')
True
>>> is_valid_javascript_identifier('$210')
True
>>> is_valid_javascript_identifier(u'Stra\u00dfe')
True
>>> is_valid_javascript_identifier(r'\u0062') # u'b'
True
>>> is_valid_javascript_identifier(r'\u62')
False
>>> is_valid_javascript_identifier(r'\u0020')
False
>>> is_valid_javascript_identifier('_bar')
True
>>> is_valid_javascript_identifier('some_var')
True
>>> is_valid_javascript_identifier('$')
True
But ``is_valid_jsonp_callback_value`` is the function you want to use for
validating JSON-P callback parameter values:
>>> is_valid_jsonp_callback_value('somevar')
True
>>> is_valid_jsonp_callback_value('function')
False
>>> is_valid_jsonp_callback_value(' somevar')
False
It supports the possibility of '.' being present in the callback name, e.g.
>>> is_valid_jsonp_callback_value('$.ajaxHandler')
True
>>> is_valid_jsonp_callback_value('$.23')
False
As well as the pattern of providing an array index lookup, e.g.
>>> is_valid_jsonp_callback_value('array_of_functions[42]')
True
>>> is_valid_jsonp_callback_value('array_of_functions[42][1]')
True
>>> is_valid_jsonp_callback_value('$.ajaxHandler[42][1].foo')
True
>>> is_valid_jsonp_callback_value('array_of_functions[42]foo[1]')
False
>>> is_valid_jsonp_callback_value('array_of_functions[]')
False
>>> is_valid_jsonp_callback_value('array_of_functions["key"]')
False
Enjoy!
"""
if __name__ == '__main__':
import doctest
doctest.testmod()
You can clone the latest version from my scripts repository at GitHub:
git clone git://github.com/tav/scripts.git
Feel free to also port it into other programming languages.
Enjoy and let me know what you think, thanks!