Abstract

This specification defines a new form encoding algorithm that enables the transmission of form data as JSON. Instead of capturing form data as essentially an array of key-value pairs which is the bread and butter of existing form encodings, it relies on a simple name attribute syntax that makes it possible to capture rich data structures as JSON directly.

Status of This Document

This section describes the status of this document at the time of its publication. Other documents may supersede this document. A list of current W3C publications and the latest revision of this technical report can be found in the W3C technical reports index at http://www.w3.org/TR/.

Beware. This specification is no longer in active maintenance and the HTML Working Group does not intend to maintain it further.

This specification is an extension specification to HTML.

This document was published by the HTML Working Group as a Working Group Note. If you wish to make comments regarding this document, please send them to public-html@w3.org (subscribe, archives). All comments are welcome.

Publication as a Working Group Note does not imply endorsement by the W3C Membership. This is a draft document and may be updated, replaced or obsoleted by other documents at any time. It is inappropriate to cite this document as other than work in progress.

This document was produced by a group operating under the 5 February 2004 W3C Patent Policy. W3C maintains a public list of any patent disclosures made in connection with the deliverables of the group; that page also includes instructions for disclosing a patent. An individual who has actual knowledge of a patent which the individual believes contains Essential Claim(s) must disclose the information in accordance with section 6 of the W3C Patent Policy.

This document is governed by the 1 September 2015 W3C Process Document.

Table of Contents

1. Introduction

This section is non-normative.

JSON is commonly used as an exchange format between Web client and backend services. Enabling HTML forms to submit JSON directly simplifies implementation as it enables backend services to operate by accepting a single input format that is what's more able to encode richer structure than other form encodings (where structure has traditional had to be emulated).

User agents that implement this specification will transmit JSON data from their forms whenever the form's enctype attribute is set to application/json. During the transition period, user agents that do not support this encoding will fall back to using application/x-www-form-urlencoded. This can be detected on the server side, and the conversion algorithm described in this specification can be used to convert such data to JSON.

The path format used in input names is straightforward. To begin with, when no structuring information is present, the information will simply be captured as keys in a JSON object:

Example 1: Basic Keys
<form enctype='application/json'>
  <input name='name' value='Bender'>
  <select name='hind'>
    <option selected>Bitable</option>
    <option>Kickable</option>
  </select>
  <input type='checkbox' name='shiny' checked>
</form>

// produces
{
  "name":   "Bender"
, "hind":   "Bitable"
, "shiny":  true
}

If a path is repeated, its value is captured as an array:

Example 2: Multiple Values
<form enctype='application/json'>
  <input type='number' name='bottle-on-wall' value='1'>
  <input type='number' name='bottle-on-wall' value='2'>
  <input type='number' name='bottle-on-wall' value='3'>
</form>

// produces
{
  "bottle-on-wall":   [1, 2, 3]
}

Deeper structures can be produced using sub-keys in the path, using either string keys for objects or integer keys for arrays:

Example 3: Deeper Structure
<form enctype='application/json'>
  <input name='pet[species]' value='Dahut'>
  <input name='pet[name]' value='Hypatia'>
  <input name='kids[1]' value='Thelma'>
  <input name='kids[0]' value='Ashley'>
</form>

// produces
{
    "pet":  {
        "species":  "Dahut"
    ,   "name":     "Hypatia"
    }
,   "kids":   ["Ashley", "Thelma"]
}

As you can see above, the keys for array values can be in any order. If the array is somehow sparse, then null values are inserted:

Example 4: Sparse Arrays
<form enctype='application/json'>
  <input name='hearbeat[0]' value='thunk'>
  <input name='hearbeat[2]' value='thunk'>
</form>

// produces
{
    "hearbeat":   ["thunk", null, "thunk"]
}

Paths can cause structures to nest to arbitrary depths:

Example 5: Even Deeper
<form enctype='application/json'>
  <input name='pet[0][species]' value='Dahut'>
  <input name='pet[0][name]' value='Hypatia'>
  <input name='pet[1][species]' value='Felis Stultus'>
  <input name='pet[1][name]' value='Billie'>
</form>

// produces
{
    "pet":  [
        {
            "species":  "Dahut"
        ,   "name":     "Hypatia"
        }
    ,   {
            "species":  "Felis Stultus"
        ,   "name":     "Billie"
        }
    ]
}

Really, any depth you might need.

Example 6: Such Deep
<form enctype='application/json'>
  <input name='wow[such][deep][3][much][power][!]' value='Amaze'>
</form>

// produces
{
    "wow":  {
        "such": {
            "deep": [
                null
            ,   null
            ,   null
            ,   {
                    "much": {
                        "power": {
                            "!":  "Amaze"
                        }
                    }
                }
            ]
        }
    }
}

The algorithm does not lose data in that every piece of information ends up being submitted. But given the path syntax, it is possible to introduce clashes such that one may attempt to set an object, an array, and a scalar value on the same key.

As seen in a previous example, trying to set multiple scalars on the same key will convert the value into an array. Trying to set a scalar value at a path that also contains an object will cause the scalar to be set on that object with the empty string key. Trying to set an array value at a path that also contains an object will cause the non-null values of that array to be set on the object using their array indices as keys. This is exemplified below:

Example 7: Merge Behaviour
<form enctype='application/json'>
  <input name='mix' value='scalar'>
  <input name='mix[0]' value='array 1'>
  <input name='mix[2]' value='array 2'>
  <input name='mix[key]' value='key key'>
  <input name='mix[car]' value='car key'>
</form>

// produces
{
    "mix":  {
        "":     "scalar"
    ,   "0":    "array 1"
    ,   "2":    "array 2"
    ,   "key":  "key key"
    ,   "car":  "car key"
    }
}

This may seem somewhat convoluted but it should be considered as a resilience mechanism meant to ensure that data is not lost rather than the normal usage of the JSON encoding.

As we have seen above, multiple values with the same key are upgraded to an array, and it is also possible to directly use array offsets. However there are cases in which when generating a form from existing data, one may not know if there will be one or more instances of a given key (so that without using indices one will get back at times a scalar, at times an array) and it can be slightly cumbersome to properly generate array indices (especially if the field may be modified on the client side, which would mean maintaining array indices properly there). In order to indicate that a given path must contain an array irrespective of the number of its items, and without resorting to indices, one may use the append notation (only as the final step in a path):

Example 8: Append
<form enctype='application/json'>
  <input name='highlander[]' value='one'>
</form>

// produces
{
    "highlander":  ["one"]
}

The JSON encoding also supports file uploads. The values of files are themselves structured as objects and contain a type field indicating the MIME type, a name field containing the file name, and a body field with the file's content as base64.

Example 9: Files
<form enctype='application/json'>
  <input type='file' name='file' multiple>
</form>

// assuming the user has selected two text files, produces:
{
    "file": [
        {
            "type": "text/plain",
            "name": "dahut.txt",
            "body": "REFBQUFBQUFIVVVVVVVVVVVVVCEhIQo="
        },
        {
            "type": "text/plain",
            "name": "litany.txt",
            "body": "SSBtdXN0IG5vdCBmZWFyLlxuRmVhciBpcyB0aGUgbWluZC1raWxsZXIuCg=="
        }
    ]
}

Still in the spirit of not losing information, whenever a path makes use of an invalid syntax, it is simply used whole as if it were just a key with no structure:

Example 10: Bad input
<form enctype='application/json'>
  <input name='error[good]' value='BOOM!'>
  <input name='error[bad' value='BOOM BOOM!'>
</form>

// Produces:
{
    "error": {
        "good":   "BOOM!"
    }
,   "error[bad":  "BOOM BOOM!"
}

2. Conformance

As well as sections marked as non-normative, all authoring guidelines, diagrams, examples, and notes in this specification are non-normative. Everything else in this specification is normative.

The key word MUST is to be interpreted as described in [RFC2119].

3. Terminology

The following terms are defined in the HTML specification. [html51]

The following terms are defined in ECMAScript. [ECMA-262]

4. The application/json encoding algorithm

For the purposes of the algorithms below, an Object corresponds to the in-memory representation for a JSONObject and an Array corresponds to the in-memory representation for a JSONArray.

The following algorithm encodes form data as application/json. It operates on the form data set obtained from constructing the form data set.

  1. Let resulting object be a new Object.
  2. For each entry in the form data set, perform these substeps:
    1. If the entry's type is file, set the is file flag.
    2. Let steps be the result of running the steps to parse a JSON encoding path on the entry's name.
    3. Let context be set to the value of resulting object.
    4. For each step in the list of steps, run the following subsubsteps:
      1. Let the current value be the value obtained by getting the step's key from the current context.
      2. Run the steps to set a JSON encoding value with the current context, the step, the current value, the entry's value, and the is file flag.
      3. Update context to be the value returned by the steps to set a JSON encoding value ran above.
  3. Let result be the value returned from calling the stringify operation with resulting object as its first parameter and the two remaining parameters left undefined.
  4. Encode result as UTF-8 and return the resulting byte stream.
Note

The algorithm above deliberately ignores any charset information (e.g. from accept-charset) and always encodes the resulting JSON as UTF-8. This is an intentionally sane behaviour.

The steps to parse a JSON encoding path are as follows:

  1. Let path be the path we are to parse.
  2. Let original be a copy of path.
  3. Let steps be an empty list of steps.
  4. Let first key be the result of collecting a sequence of characters that are not U+005B LEFT SQUARE BRACKET ("[") from the path.
  5. If first key is empty, jump to the step labelled failure below.
  6. Otherwise remove the collected characters from path and push a step onto steps with its type set to "object", its key set to the collected characters, and its last flag unset.
  7. If the path is empty, set the last flag on the last step in steps and return steps.
  8. Loop: While path is not an empty string, run these substeps:
    1. If the first two characters in path are U+005B LEFT SQUARE BRACKET ("[") followed by U+005D RIGHT SQUARE BRACKET ("]"), run these subsubsteps:
      1. Set the append flag on the last step in steps.
      2. Remove those two characters from path.
      3. If there are characters left in path, jump to the step labelled failure below.
      4. Otherwise jump to the step labelled loop above.
    2. If the first character in path is U+005B LEFT SQUARE BRACKET ("["), followed by one or more ASCII digits, followed by U+005D RIGHT SQUARE BRACKET ("]"), run these subsubsteps:
      1. Remove the first character from path.
      2. Collect a sequence of characters being ASCII digits, remove them from path, and let numeric key be the result of interpreting them as a base-ten integer.
      3. Remove the following character from path.
      4. Push a step onto steps with its type set to "array", its key set to the numeric key, and its last flag unset.
      5. Jump to the step labelled loop above.
    3. If the first character in path is U+005B LEFT SQUARE BRACKET ("["), followed by one or more characters that are not U+005D RIGHT SQUARE BRACKET, followed by U+005D RIGHT SQUARE BRACKET ("]"), run these subsubsteps:
      1. Remove the first character from path.
      2. Collect a sequence of characters that are not U+005D RIGHT SQUARE BRACKET, remove them from path, and let object key be the result.
      3. Remove the following character from path.
      4. Push a step onto steps with its type set to "object", its key set to the object key, and its last flag unset.
      5. Jump to the step labelled loop above.
    4. If this point in the loop is reached, jump to the step labelled failure below.
  9. For each step in steps, run the following substeps:
    1. If the step is the last step, set its last flag.
    2. Otherwise, set its next type to the type of the next step in steps.
  10. Return steps.
  11. Failure: return a list of steps containing a single step with its type set to "object", its key set to original, and its last flag set.

The steps to set a JSON encoding value are as follows:

  1. Let context be the context this algorithm is called with.
  2. Let step be the step of the path this algorithm is called with.
  3. Let current value be the current value this algorithm is called with.
  4. Let entry value be the entry value this algorithm is called with.
  5. Let is file be the is file flag this algorithm is called with.
  6. If is file is set then replace entry value with an Object have its "name" property set to the file's name, its "type" property set to the file's type, and its "body" property set to the Base64 encoding of the file's body. [RFC2045]
  7. If step has its last flag set, run the following substeps:
    1. If current value is undefined, run the following subsubsteps:
      1. If step's append flag is set, set the context's property named by the step's key to a new Array containing entry value as its only member.
      2. Otherwise, set the context's property named by the step's key to entry value.
    2. Else if current value is an Array, then get the context's property named by the step's key and push entry value onto it.
    3. Else if current value is an Object and the is file flag is not set, then run the steps to set a JSON encoding value with context set to the current value; a step with its type set to "object", its key set to the empty string, and its last flag set; current value set to the current value's property named by the empty string; the entry value; and the is file flag. Return the result.
    4. Otherwise, set the context's property named by the step's key to an Array containing current value and entry value, in this order.
    5. Return context.
  8. Otherwise, run the following substeps:
    1. If current value is undefined, run the following subsubsteps:
      1. If step's next type is "array", set the context's property named by the step's key to a new empty Array and return it.
      2. Otherwise,set the context's property named by the step's key to a new empty Object and return it.
    2. Else if current value is an Object, then return the value of the context's property named by the step's key.
    3. Else if current value is an Array, then rub the following subsubsteps:
      1. If step's next type is "array", return current value.
      2. Otherwise, run the following subsubsubsteps:
        1. Let object be a new empty Object.
        2. For each item and zero-based index i in current value, if item is not undefined then set a property of object named i to item.
        3. Otherwise, set the context's property named by the step's key to object.
        4. Return object.
    4. Otherwise, run the following subsubsteps:
      1. Let object be a new Object with a property named by the empty string set to current value.
      2. Set the context's property named by the step's key to object.
      3. Return object.

5. Form Submission

Given that there exist deployed services using JSON and ambient authentication, and given that form requests are not protected by the same-origin policy by default, if this encoding were left wide open then a number of attacks would become possible. Because of this, when using the application/json form encoding the same-origin policy is enforced.

When the form submission algorithm is invoked in order to Submit as entity with enctype set to application/json and entity body set to the result of applying the application/json encoding algorithm, causing the browsing context to navigate, the user agent MUST invoke the fetch algorithm with the force same-origin flag.

6. Acknowledgements

Thanks to Philippe Le Hégaret for serving as a sounding board for the first version of the encoding algorithm.

A. References

A.1 Normative references

[ECMA-262]
Allen Wirfs-Brock. ECMA-262 6th Edition, The ECMAScript 2015 Language Specification. June 2015. Standard. URL: http://www.ecma-international.org/ecma-262/6.0/
[RFC2045]
N. Freed; N. Borenstein. Multipurpose Internet Mail Extensions (MIME) Part One: Format of Internet Message Bodies. November 1996. Draft Standard. URL: https://tools.ietf.org/html/rfc2045
[RFC2119]
S. Bradner. Key words for use in RFCs to Indicate Requirement Levels. March 1997. Best Current Practice. URL: https://tools.ietf.org/html/rfc2119
[html51]
Ian Hickson; Robin Berjon; Steve Faulkner; Travis Leithead; Erika Doyle Navara; Edward O'Connor; Tab Atkins Jr.; Simon Pieters; Yoav Weiss; Marcos Caceres; Mathew Marquis. HTML 5.1. 31 August 2015. W3C Working Draft. URL: http://www.w3.org/TR/html51/