This report is out-of-date.

The state of things has changed dramatically, for the better, since I first wrote this in early 2008. Although my test cases are still quite useful, any information regarding specific python packages is likely to be inaccurate. I am leaving these pages here primarily for historic interest.

Dealing with whitespace in JSON

The JSON format generally allows any amount of whitespace before or after the basic lexemes (tokens) of the language; where the allowed whitespace includes sequences of zero of more of any of the following four characters:

Whitespace is never required in JSON, except what you choose to put inside string literal values. So when generating JSON it is possible to produce a very tightly compacted one-line string (or zero-line if you will, as there is no need for a newline character at the end of the data). This compaction can be useful to save on bandwidth when transmitting JSON data in AJAX applications.

Whitespace in JavaScript: It is worth pointing out that JavaScript allows a much larger variety of whitespace than does JSON. Generally any character is considered to be whitespace if it has the Unicode category of Zs, Zl, or Zp; or it is one of the control characters for horizontal tab, line feed, vertical tab, carriage return, form feed, or next line. The following are all of the whitespace characters:

U+0009 U+0085 U+2002 U+2008 U+205F
U+000A U+00A0 U+2003 U+2009 U+3000
U+000B U+1680 U+2004 U+200A
U+000C U+180E U+2005 U+2028
U+000D U+2000 U+2006 U+2029
U+0020 U+2001 U+2007 U+202F

End of line: JSON needs no concept of an end-of-line, but JavaScript does, if nothing more so that comments may be parsed correctly. However some JSON modules may also wish to detect end-of-lines for help in creating error messages with line numbers. In JavaScript the following sequences of whitespace characters are to be treated as end-of-line indicators (the longest match occurs first):

Converting Python to JSON

When producing JSON, different choices can be made in how whitespace is incorporated. There are two extremes; the first being to omit all whitespace for the most compact representation, and the later being to introduce copious whitespace primarily for indentation and pretty-printing purposes.

In the examples which follow, we use the following Python input data:

# Some arbitrary python object used in the following examples
pydata = {'one': True,
          'three': ['red', 'yellow',
                    ['blue', 'azure', 'cobalt', 'teal'], 'orange'],
          'two': 19.5}

demjson

The demjson module can output JSON with two whitespace options:

It has no further options or control over the generation of whitespace in the output.

demjson.encode( pydata )
{"one":true,"three":["red","yellow",["blue","azure","cobalt","teal"],"orange"],"two":19.5}

demjson.encode( pydata, compactly=False )
{ "one" : true,
  "three" : [ "red",
      "yellow",
      [ "blue",
        "azure",
        "cobalt",
        "teal"
      ],
      "orange"
    ],
  "two" : 19.5
}

jsonlib

The jsonlib module can output JSON with two whitespace options:

There is no way to create an optimally compact representation.

jsonlib.write( pydata )
{"three": ["red", "yellow", ["blue", "azure", "cobalt", "teal"], "orange"], "two": 19.5, "one": true}

jsonlib.write( pydata, indent='   ' )
{
   "three": [
      "red",
      "yellow",
      [
         "blue",
         "azure",
         "cobalt",
         "teal"
      ],
      "orange"
   ],
   "two": 19.5,
   "one": true
}

python-cjson

The python-cjson module has no options on how it emits whitespace. It generally adds spaces after punctuation, but does not perform pretty-printing.

cjson.encode( pydata )
{"three": ["red", "yellow", ["blue", "azure", "cobalt", "teal"], "orange"], "two": 19.5, "one": true}

python-json

The python-json module has no options on how it emits whitespace. It creates compact JSON, not adding any whitespace.

json.write( pydata )
{"three":["red","yellow",["blue","azure","cobalt","teal"],"orange"],"two":19.500000,"one":true}

simplejson

The simplejson module provides the most options on how it emits whitespace. It can generally ouput JSON compactly, or in pretty-printed indented mode. The caller can control the amount of indentation used. Additionally the caller can control additional spaces after punctuation.

simplejson.dumps( pydata )
{"three": ["red", "yellow", ["blue", "azure", "cobalt", "teal"], "orange"], "two": 19.5, "one": true}

simplejson.dumps( pydata, separators=(',',':') )
{"three":["red","yellow",["blue","azure","cobalt","teal"],"orange"],"two":19.5,"one":true}

simplejson.dumps( pydata, indent=4 )
{
    "three": [
        "red", 
        "yellow", 
        [
            "blue", 
            "azure", 
            "cobalt", 
            "teal"
        ], 
        "orange"
    ], 
    "two": 19.5, 
    "one": true
}

Parsing whitespace in JSON input

All the tested modules correctly handle any amount of whitespace at any legal location; including at the beginning of the data or at the end.

However, only the demjson module, when operating in strict-mode, will reject any whitespace which is not one of the four JSON whitespace characters; for example, the presence of form feeds.

Table 1: Parsing whitespace in JSON input
Test# Type of whitespace demjson/strict demjson/loose jsonlib python-cjson python-json simplejson
1–1 ws at start yes yes yes yes yes yes
1–2 ws at end yes yes yes yes yes yes
1–3 Tab U+0009 yes yes yes yes yes yes
1–4 Space U+0020 yes yes yes yes yes yes
1–5 LF U+000A yes yes yes yes yes yes
1–6 CR U+000C yes yes yes yes yes yes
1–7 VT U+000B yes: error yes:allows yes:allows yes:allows yes:allows yes:allows
1–8 FF U+000D yes: error yes:allows yes:allows yes:allows yes:allows yes:allows
1–9 NBSP U+00A0 yes: error yes:allows yes:allows yes: error yes: error yes: error
1–10 ENSP U+2002 yes: error yes:allows yes:allows yes: error yes: error yes: error
1–11 LS U+2028 yes: error no: error yes:allows yes: error yes: error yes: error
1–12 PS U+2029 yes: error no: error yes:allows yes: error yes: error yes: error

JavaScript comments

Although not strict JSON, some modules allow parsing input that has a more JavaScript flavor. Of these demjson and python-json handle JavaScript comments. For demjson, it must be used in a non-strict mode.

JavaScript has two kinds of comments, both similar to C.

Both demjson and python-json appear to correctly handle these comments according to the JavaScript rules; with the exception that only demjson recognizes all Unicode end-of-line characters for parsing the // comment, and not just linefeed or carriage-return.

Table 2: JavaScript comments
Test# JavaScript comment demjson/strict demjson/loose jsonlib python-cjson python-json simplejson
2–1 /* ... */ yes: error yes:allows yes: error yes: error yes:allows yes: error
2–2 // ... yes: error yes:allows yes: error yes: error yes:allows yes: error

Format control characters

In addition to whitespace, JavaScript (or more technically ECMAScript) allows any format control character to appear anywhere within the source text with no apparent effect. These special characters can even appear in the middle of keyword such as true. This does not apply to JSON, in which any format control characters should be treated as any ordinary character—which should result in a parsing error unless they are inside quoted string literals.

Only the demjson module, when operating in non-strict mode, allows any format control character to appear anywhere in the JSON input stream.

Table 3: Format control characters in JavaScript input
Test# JavaScript input demjson/strict demjson/loose jsonlib python-cjson python-json simplejson
3–1 format ctl char yes: error yes:allows yes: error yes: error yes: error yes: error

Format control characters are any Unicode character that has a category of Cf. There are currently about 138 such characters; including:

U+00AD U+202A U+206F U+E0020 U+E002E U+E003C U+E004A U+E0058 U+E0066 U+E0074
U+0600 U+202B U+FEFF U+E0021 U+E002F U+E003D U+E004B U+E0059 U+E0067 U+E0075
U+0601 U+202C U+FFF9 U+E0022 U+E0030 U+E003E U+E004C U+E005A U+E0068 U+E0076
U+0602 U+202D U+FFFA U+E0023 U+E0031 U+E003F U+E004D U+E005B U+E0069 U+E0077
U+0603 U+202E U+FFFB U+E0024 U+E0032 U+E0040 U+E004E U+E005C U+E006A U+E0078
U+06DD U+2060 U+1D173 U+E0025 U+E0033 U+E0041 U+E004F U+E005D U+E006B U+E0079
U+070F U+2061 U+1D174 U+E0026 U+E0034 U+E0042 U+E0050 U+E005E U+E006C U+E007A
U+17B4 U+2062 U+1D175 U+E0027 U+E0035 U+E0043 U+E0051 U+E005F U+E006D U+E007B
U+17B5 U+2063 U+1D176 U+E0028 U+E0036 U+E0044 U+E0052 U+E0060 U+E006E U+E007C
U+200B U+206A U+1D177 U+E0029 U+E0037 U+E0045 U+E0053 U+E0061 U+E006F U+E007D
U+200C U+206B U+1D178 U+E002A U+E0038 U+E0046 U+E0054 U+E0062 U+E0070 U+E007E
U+200D U+206C U+1D179 U+E002B U+E0039 U+E0047 U+E0055 U+E0063 U+E0071 U+E007F
U+200E U+206D U+1D17A U+E002C U+E003A U+E0048 U+E0056 U+E0064 U+E0072
U+200F U+206E U+E0001 U+E002D U+E003B U+E0049 U+E0057 U+E0065 U+E0073

Go to the next page: Numbers