File API

1. Introduction ¶

This section is informative.

Web applications should have the ability to manipulate as wide as possible a range of user input, including files that a user may wish to upload to a remote server or manipulate inside a rich web application. This specification defines the basic representations for files, lists of files, errors raised by access to files, and programmatic ways to read files. The interfaces and API defined in this specification can be used with other interfaces and APIs exposed to the web platform.

File reads should happen asynchronously on the main thread, with an optional synchronous API used within threaded web applications. An asynchronous API for reading files prevents blocking and UI "freezing" on a user agent's main thread. This specification defines an asynchronous API based on an event model to read and access a file's data. Moreover, this specification defines separate interfaces for files and the objects used to read a file's data. While a File object provides a reference to a single file that a user has selected from a file picker (typically spawned by the HTML input element), a FileReader object provides asynchronous read methods to access that file's data through event handler attributes and the firing of events. The use of events and event handlers allows separate code blocks the ability to monitor the progress of the read (which is useful for remote drives that appear to be local, but behave slower than local drives), error conditions that may arise, and successful reading of a file. An example will be illustrative.

Example

In the example below, different code blocks handle progress, error, and success conditions.

ECMAScript


function startRead() {  
  // obtain input element through DOM 
  
  var file = document.getElementById('file').files[0];
  if(file){
    getAsText(file);
  }
}

function getAsText(readFile) {
        
  var reader = new FileReader();
  
  // Read file into memory as UTF-16      
  reader.readAsText(readFile, "UTF-16");
  
  // Handle progress, success, and errors
  reader.onprogress = updateProgress;
  reader.onload = loaded;
  reader.onerror = errorHandler;
}

function updateProgress(evt) {
  if (evt.lengthComputable) {
    // evt.loaded and evt.total are ProgressEvent properties
    var loaded = (evt.loaded / evt.total);
    if (loaded < 1) {
      // Increase the prog bar length
      // style.width = (loaded * 200) + "px";
    }
  }
}

function loaded(evt) {  
  // Obtain the read file data    
  var fileString = evt.target.result;
  // Handle UTF-16 file dump
  if(utils.regexp.isChinese(fileString)) {
    //Chinese Characters + Name validation
  }
  else {
    // run other charset test
  }
  // xhr.send(fileString)     
}

function errorHandler(evt) {
  if(evt.target.error.code == evt.target.error.NOT_READABLE_ERR) {
    // The file could not be read
  }
}

2. Conformance ¶

Everything in this specification is normative except for examples and sections marked as being informative.

The keywords “MUST”, “MUST NOT”, “REQUIRED”, “SHALL”, “SHALL NOT”, “RECOMMENDED”, “MAY” and “OPTIONAL” in this document are to be interpreted as described in Key words for use in RFCs to Indicate Requirement Levels [RFC2119].

The following conformance classes are defined by this specification:

conforming implementation: A user agent is considered to be a conforming implementation if it satisfies all of the MUST-, REQUIRED- and SHALL-level criteria in this specification that apply to implementations.

3. Terminology and Algorithms ¶

The terms and algorithms <fragment>, <scheme>, document base URL, event handler attributes, event handler event type, Function, origin, resolve a URL, same origin, task, task source, URL, URL character encoding, the "already started" flag for script processing, and queue a task are defined by the HTML 5 specification [HTML5].

This specification includes algorithms (steps) as part of the definition of methods. Conforming implementations (referred to as "user agents" from here on) MAY use other algorithms in the implementation of these methods, provided the end result is the same.

4. The FileList Sequence ¶

This sequence parameterized type exposes the list of files that have been selected.

IDL


    typedef sequence<File> FileList;

Example

Sample usage typically involves DOM access to the <input type="file"> element within a form, and then accessing selected files.

ECMAScript


    // uploadData is a form element
    // fileChooser is input element of type 'file'
    var file = document.forms['uploadData']['fileChooser'].files[0];
    
    if(file)
    {
      // Perform file ops
    }

Note

The HTMLInputElement interface [HTML5] has a readonly FileList attribute, which is what is being accessed in the above example. Some conforming user agents support multiple file selections within HTML forms, in which case the FileList object MUST make available all selected files.

5. The Blob Interface ¶

This interface represents raw data. It provides a method to slice data objects between ranges of bytes into further chunks of raw data. It also provides an attribute representing the size of the chunk of data. The File interface inherits from this interface.

IDL


    interface Blob {
      
      readonly attribute unsigned long long size;
      
      //slice Blob into byte-ranged chunks
      
      Blob slice(in long long start,
                 in long long length); // raises DOMException
    
    };

5.1. Attributes ¶

size: Represents the size of the Blob object in bytes.

5.2. Methods and Parameters ¶

The slice method

Returns a new Blob object between the ranges of bytes specified.

The start parameter is a value for the start point of a slice call.

The length parameter is a value for the end point of a slice call as byte offsets from start.

The slice method MUST clamp on values of size if index arithmetic exceeds the bounds of size. In particular, this means that for a given slice call:

If start + length > size then a user agent MUST return a Blob object as if slice(start, size-start) was called.
If start > size then a user agent MUST return a Blob object of size 0

Editorial note

The alternative is throwing a DOMException with INDEX_SIZE_ERR

6. The File Interface ¶

This interface describes a single file in a FileList and exposes its name, media type and a URN to access the file. It inherits from Blob.

IDL


  interface File : Blob {

      readonly attribute DOMString name;
      readonly attribute DOMString type;
      readonly attribute DOMString urn;
};

6.1. Attributes ¶

name: The name of the file. There are numerous file name variations on different systems; this is merely the name of the file, without path information.
type: The ASCII-encoded string in lower case representing the media type of the file, expressed as an RFC2046 MIME type [RFC2046]. User agents SHOULD return the MIME type of the file, if it is known. If implementations cannot determine the media type of the file, they MUST return empty string. A string is a valid MIME type if it matches the media-type token defined in section 3.7 "Media Types" of RFC 2616 [HTTP].
urn: The URN representing the File object.

Editorial note

The type attribute section should refer to IANA and RFC4288, and the urn attribute may be renamed.

7. The FileReader Interface ¶

This interface provides methods to read files in memory, and to access the data from those files using progress events and event handler attributes [DOM3Events]. It is desirable to read data from file systems asynchronously in the main thread of user agents. This interface provides such an asynchronous API, and is specified to be used within the context of the global object (Window [HTML5]) as well as Web Workers (WorkerUtils [WebWorkers]).

IDL



[Constructor]
interface FileReader {

  // async read methods 
  void readAsBinaryString(in Blob fileBlob);
  void readAsText(in Blob fileBlob, [Optional] in DOMString encoding);
  void readAsDataURL(in File file);

  void abort();

  // states
  const unsigned short EMPTY = 0;
  const unsigned short LOADING = 1;
  const unsigned short DONE = 2;
  
  
  readonly attribute unsigned short readyState;

  // file data
  readonly attribute DOMString result;
  
  readonly attribute FileError error;

  // event handler attributes
  attribute Function onloadstart;
  attribute Function onprogress;
  attribute Function onload;
  attribute Function onabort;
  attribute Function onerror;
  attribute Function onloadend;

};
FileReader implements EventTarget;

7.1. The FileReader Task Source ¶

The FileReader interface enables asynchronous reads on individual files by dispatching events to event handler methods. Unless stated otherwise, the task source that is used in this specification is the FileReader. This task source is used for events tasks that are asynchronously dispatched, or for event tasks that are queued for dispatching.

7.2. Constructors ¶

When the FileReader() constructor is invoked, the user agent MUST return a new FileReader object.

7.3. Event Handler Attributes ¶

The following are the event handler attributes (and their corresponding event handler event types) that user agents MUST support on FileReader as DOM attributes:

event handler attribute	event handler event type
`onloadstart`	`loadstart`
`onprogress`	`progress`
`onabort`	`abort`
`onerror`	`error`
`onload`	`load`
`onloadend`	`loadend`

7.4. FileReader States ¶

The FileReader object can be in one of 3 states. The readyState attribute, on getting, MUST return the current state, which MUST be one of the following values:

EMPTY (numeric value 0): The object has been constructed, and there are no pending reads.
LOADING (numeric value 1): A file is being read. One of the read methods is being processed.
DONE (numeric value 2): The entire file has been read into memory, or a file error occurred during read, or the read was aborted using abort(). The FileReader is no longer reading a file.

7.5. Reading a File ¶

7.5.1. Multiple Reads ¶

The FileReader interface makes available three asynchronous read methods -- readAsBinaryString, readAsText, and readAsDataURL, which read files into memory. If multiple read methods are called on the same FileReader object, user agents MUST only process the last call to a read method, which is the call that occurs last in a script block that has the "already started" flag set [HTML5].

7.5.2. The `result` attribute ¶

On getting, the result attribute returns a file's data in string format, or null, depending on the read method that has been called on the FileReader object, and any errors that may have occurred. It can also return partial file data. Partial file data is the part of the file that has been read into memory currently during processing one of the two read methods, readAsBinaryString and readAsText. The list below is normative for the result attribute:

On getting, if the readyState is EMPTY (no read method has been called) then the result attribute MUST return null.
On getting, if an error in reading the file has occurred (using any read method), then the result attribute MUST return the null.
On getting, if the readAsDataURL read method is used, the result attribute MUST return a Data URL [DataURL] encoding of the file's data.
On getting, if the readAsBinaryString read method is called (and no error in reading the file has occurred), then the result attribute MUST return a string representing the file's data as a binary string, in which every byte is represented by an integer in the range [0..255]. On getting, while processing the readAsBinaryString read method, the result attribute SHOULD return partial file data in binary string format.
On getting, if the readAsText read method is called (and no error in reading the file has occurred), then the result attribute MUST return a string representing the file's data as a text string, and SHOULD decode the string in memory in the format specified by the encoding determination. On getting, while processing the readAsText read method, this attibute SHOULD return partial file data in the format specified by the encoding determination.

7.5.3. The `readAsBinaryString()` method ¶

When the readAsBinaryString(fileBlob) method is called, the user agent MUST run the steps below (unless otherwise indicated).

Set readyState to EMPTY and set result to null.
If an error occurs during file read, set readyState to DONE and set result to null. Proceed to the error steps below.
1. Dispatch a progress event called loadend.
2. Dispatch a progress event called error. Set the error attribute; on getting, the error attribute MUST be a a FileError object with a valid error code that indicates the kind of file error that has occurred.
3. Terminate this overall set of steps.
If no error has occurred, set readyState to LOADING
Queue a task to dispatch a progress event called loadstart.
Make progress notifications. As the bytes from the fileBlob argument are read, user agents SHOULD ensure that on getting, the result attribute returns partial file data representing the number of bytes currently loaded (as a fraction of the total) [ProgressEvents], as a binary string.
When the file has been read into memory fully, set readyState to DONE
Set the result attribute to be fileBlob's data content represented as a binary string; on getting, the result attribute returns the (complete) data of fileBlob as a binary string.
Terminate this overall set of steps.

7.5.4. The `readAsDataURL()` method ¶

When the readAsDataURL(file) method is called, the user agent MUST run the steps below (unless otherwise indicated).

Set readyState to EMPTY and set result to null.
If an error occurs during file read, OR if a user agent's URL length limitations prevent returning data as a Data URL [DataURL], set readyState to DONE and set result to null. Proceed to the error steps below.
1. Dispatch a progress event called loadend.
2. Dispatch a progress event called error. Set the error attribute; on getting, the error attribute MUST be a a FileError object with a valid error code that indicates the kind of file error that has occurred.
3. Terminate this overall set of steps.
If no error has occurred, set readyState to LOADING
Queue a task to dispatch a progress event called loadstart.
Make progress notifications.
When the file has been read into memory fully, set readyState to DONE
Set the result attribute to be file's data content represented as a Data URL [DataURL]; on getting, the result attribute returns the (complete) data of file as a Data URL [DataURL]
Terminate this overall set of steps.

7.5.5. The `readAsText()` method ¶

When the readAsText(fileBlob, encoding) method is called (the encoding argument is optional), the user agent MUST run the steps below (unless otherwise indicated).

Set readyState to EMPTY and set result to null.
If an error occurs during file read, set readyState to DONE and set result to null. Proceed to the error steps below.
1. Dispatch a progress event called loadend.
2. Dispatch a progress event called error. Set the error attribute; on getting, the error attribute MUST be a a FileError object with a valid error code that indicates the kind of file error that has occurred.
3. Terminate this overall set of steps.
If no error has occurred, set readyState to LOADING
Queue a task to dispatch a progress event called loadstart.
Make progress notifications. As the bytes from the fileBlob argument are read, user agents SHOULD ensure that on getting, the result attribute returns partial file data representing the number of bytes currently loaded (as a fraction of the total) [ProgressEvents], decoded in memory according to the encoding determination.
When the file has been read into memory fully, set readyState to DONE
Set the result attribute to be fileBlob's data content represented as a string in a format determined by the encoding determination; on getting, the result attribute returns the (complete) data of fileBlob as a string, decoded in memory according to the encoding determination.
Terminate this overall set of steps.

Editorial note

Issue: if it is determined that the type attribute is one of text/html, text/xml, or application/xml then the specification should allow HTML5 [HTML5] parsing (creation of Document) or XML parsing specified in XML specifications. Should there be normative text for this?

7.5.6. The abort() method ¶

When the abort() method is called, the user agent MUST run the steps below:

Set readyState to DONE and result to null.
Terminate any steps while processing a read method.
Dispatch a progress event called error. Set the error attribute to a FileError object with the appropriate code(in this case, ABORT_ERR; see error conditions).
Dispatch a progress event called abort
Dispatch a progress event called loadend
Stop dispatching any further progress events.

7.5.7. Blob and File Parameters ¶

Each of the read methods take mandatory File or Blob parameters.

file: This is a File object used to invoke the readAsDataURL() method. It will typically be a reference to a single file in a FileList
fileBlob: This is a Blob object used to invoke the readAsText() and readAsBinaryString() methods. For the purposes of this specification, it will typically be a reference to a single file in a FileList

7.5.8. Determining Encoding ¶

When reading files using the readAsText() read method, the optional encoding string parameter MUST be a name or an alias of a character set used on the Internet [IANACHARSET], or else is considered invalid. If the encoding argument supplied is valid, user agents SHOULD decode the fileBlob using that encoding. If the encoding argument is invalid, or the optional encoding argument is not supplied, or the user agent cannot decode the fileBlob using encoding, the following encoding determination algorithm MUST be followed:

User agents SHOULD decode fileBlob data using encoding, if it is provided. If the encoding argument is invalid, or the optional encoding argument is not supplied, or the user agent cannot decode the fileBlob using encoding, then let charset be null.

For each of the rows in the following table, starting with the first one and going down, if the first bytes of fileBlob match the bytes given in the first column, then let charset be the encoding given in the cell in the second column of that row. If there is no match charset remains null.

Bytes in Hexadecimal	Description
FE FF	UTF-16BE BOM
FF FE	UTF-16LE BOM
EF BB BF	UTF-8 BOM

If charset is null let charset be UTF-8.
Return the result of decoding the fileBlob using charset; on getting, the result attribute of the FileReader object returns a string in charset format. The synchronous readAsText method of the FileReaderSync object returns a string in charset format. Replace bytes or sequences of bytes that are not valid according to the charset with a single U+FFFD character [Unicode].

7.5.9. Events ¶

When this specification says to make progress notifications for a read method, the following steps MUST be followed:

While the read method is processing, queue a task to dispatch a progress event called progress about every 50ms or for every byte read into memory, whichever is least frequent.
When the data from the file or fileBlob has been completely read into memory, queue a task to dispatch a progress event called load
When the data from the file or fileBlob has been completely read into memory, queue a task to dispatch a progress event called loadend

When this specification says to dispatch a progress event called e (for some ProgressEvent e [DOM3Events] dispatched on a FileReader reader), the following list MUST be followed:

The progress event e does not bubble. e.bubbles MUST be false [DOM3Events]
The progress event e is NOT cancelable. e.cancelable MUST be false [DOM3Events]
The progress event e is dispatched on the FileReader object (which is the task source in this specification, and the EventTarget). User agents MUST call reader.dispatchEvent(e) [DOM3Events]

7.5.9.1. Event Summary ¶

The following are the events that are dispatched on FileReader objects.

Event name	Interface	Dispatched when…
`loadstart`	`ProgressEvent`	When the read starts.
`progress`	`ProgressEvent`	While reading (and decoding) `file` or `fileBlob` data, and reporting partial file data (`progess.loaded`/`progress.total`)
`abort`	`ProgressEvent`	When the read has been aborted. For instance, by invoking the `abort()` method.
`error`	`ProgressEvent`	When the read has failed (see errors).
`load`	`ProgressEvent`	When the read has successfully completed.
`loadend`	`ProgressEvent`	When the request has completed (either in success or failure).

8. Reading on Threads ¶

Web Workers allow for the use of synchronous file read APIs, since the effect of such read mechanisms on the main thread is mitigated. This section defines a synchronous API, which can be used within Workers [Web Workers]. Workers can avail of both the asynchronous API (the FileReader object) and the synchronous API (the FileReaderSync object).

8.1. The `FileReaderSync` Interface ¶

This interface provides methods to read files in memory, and to access the data from those files as strings.

IDL



[Constructor]
interface FileReaderSync {

  // Synchronously return strings
  // All three methods raise FileException
  
  DOMString readAsBinaryString(in Blob fileBlob); 
  DOMString readAsText(in Blob fileBlob, [Optional] in DOMString encoding);                                                       
  DOMString readAsDataURL(in File file); 
};

Note

The FileReaderSync object's read methods -- namely readAsBinaryString, readAsText, and readAsDataURL -- have the same method signatures as the read methods of the FileReader object, and read files into memory. The difference is that these are specified to behave synchronously, with string return values. These methods raise FileException.

8.1.1. The `readAsBinaryString` method ¶

When the readAsBinaryString(fileBlob) method is called, the following steps MUST be followed:

If an error occurs during file read, throw a FileException with the appropriate error code. Terminate these overall steps.
If no error has occurred, read fileBlob into memory. Return the data contents of fileBlob as a binary string.

8.1.2. The `readAsText` method ¶

When the readAsText(fileBlob, encoding) method is called (the encoding argument is optional), the following steps MUST be followed:

If an error occurs during file read, throw a FileException with the appropriate error code. Terminate these overall steps.
If no error has occurred, read fileBlob into memory. Return the data contents of fileBlob using the encoding determination algorithm.

8.1.3. The `readAsDataURL` method ¶

When the readAsDataURL(file) method is called, the following steps MUST be followed:

If an error occurs during file read, throw a FileException with the appropriate error code. Terminate these overall steps.
If no error has occurred, read fileBlob into memory. Return the data contents of fileBlob as a Data URL [DataURL]

Note
URL length limitiations for Data URLs limit the usefulness of this call. A user agent may throw an ENCODING_ERR for file arguments which, when encoded as Data URLs, exceed URL length limitations for that user agent.[DataURL]

Editorial note

TODO: Land sample code here

9. Errors and Exceptions ¶

Error conditions can occur when reading files from the underlying filesystem. The list below of potential error conditions is informative, with links to normative descriptions of error codes:

The file being accessed may not exist at the time one of the asynchronous read methods or synchronous read methods are called. This may be due to it having been moved or deleted after a reference to it was acquired (e.g. concurrent modification with another application). See NOT_FOUND_ERR
A file may be unreadable. This may be due to permission problems that occur after a reference to a file has been acquired (e.g. concurrent lock with another application). See NOT_READABLE_ERR
User agents MAY determine that some files are unsafe for use within Web applications. A file may change on disk since the original file selection, thus resulting in an invalid read. Additionally, some file and directory structures may be considered restricted by the underlying filesystem; attempts to read from them may be considered a security violation. See the security considerations. See SECURITY_ERR
Files may be too large to return to the data structures of a Web application. An example might be that URL length limitations imposed by user agents on Data URLs may make obtaining large files encoded as Data URLs impossible to return [DataURL]. See ENCODING_ERR
During the reading of a file, the Web application may itself wish to abort (see abort()) the call to an asynchronous read method. See ABORT_ERR

9.1. The `FileError` Interface ¶

This interface is used to report errors asynchronously. The FileReader object's error attribute is a FileError object, and is accessed asynchronously through the onerror event handler when error events are generated.

IDL


 interface FileError {
   // File error codes
   // Found in DOMException
   const unsigned short NOT_FOUND_ERR = 8;
   const unsigned short SECURITY_ERR = 18;
   const unsigned short ABORT_ERR = 20;
   
   // Added by this specification
   const unsigned short NOT_READABLE_ERR = 24;
   const unsigned short ENCODING_ERR = 26;
 
   readonly attribute unsigned short code;
};

The code attribute MUST return one of the constants of the FileError error, which MUST be the most appropriate code from the table below.

9.2. The `FileException` exception ¶

Errors in the synchronous read methods for Web Workers [WebWorkers] are reported using the FileException exception.

IDL


 exception FileException {
  
  const unsigned short NOT_FOUND_ERR = 8;
  const unsigned short SECURITY_ERR = 18;
  const unsigned short ABORT_ERR = 20;
  
  const unsigned short NOT_READABLE_ERR = 24;
  const unsigned short ENCODING_ERR = 26;
 
  unsigned short code;
};

The code attribute MUST return one of the constants of the FileException exception, which MUST be the most appropriate code from the table below.

9.3. Error Code Descriptions ¶

Constant	Code	Situation
`NOT_FOUND_ERR`	8	User agents MUST use this code if the file resource could not be found at the time the read was processed
`SECURITY_ERR`	18	User agents MAY use this code if: it is determined that certain files are unsafe for access within a Web application it is determined that too many read calls are being made on file resources it is determined that the file has changed on disk since the user selected it This is a security error code to be used in situations not covered by any other error codes.
`ABORT_ERR`	20	User agents MUST use this code if the read operation was aborted, typically with a call to `abort()`
`NOT_READABLE_ERR`	24	User agents MUST use this code if the file cannot be read, typically due due to permission problems that occur after a reference to a file has been acquired (e.g. concurrent lock with another application).
`ENCODING_ERR`	26	User agents MAY use this code if URL length limitations for Data URLs in their implementations place limits on the file data that can be represented as a Data URL [DataURL]. User agents MUST NOT use this code for the asynchronous `readAsText()` call and MUST NOT use this code for the synchronous `readAsText()` call, since encoding is determined by the encoding determination algorithm.

10. A UUID URN for File reference ¶

Editorial note

This section does not enjoy the full consensus of the WG.

Will reusing the urn:uuid scheme described in RFC4122 [RFC4122] face origin-determination issues?
Are there further security caveats not cited sufficiently here?
Is using a subset of HTTP response codes acceptable practice, or should we forgo response codes in this specification?

RFC4122 [RFC4122] describes a UUID URN namespace. A File URN is a UUID URN described in RFC4122 [RFC4122] that is used to reference a File object, subject to an origin policy, a lifetime stipulation, and a processing model. A valid File URN takes the form urn:uuid:UUID where UUID matches the UUID production in Section 3 of RFC4122 [RFC4122], and is always a Version 1 UUID as described in Section 4.2.2 of RFC4122 [RFC4122]. Thus urn:uuid:f81d4fae-7dec-11d0-a765-00a0c91e6bf6 is a valid File URN.

10.1. Origin Policy for File URNs ¶

The origin of File URNs MUST be the origin of the script context that spawned the File object on which the urn attribute was called; this is defined as the "first script" in HTML5 [HTML5]. File URN s MUST only be valid within this script context. Retrieving them from within any other script context results in a 403 Not Allowed with an additional affiliated message that implementations MAY use (e.g. "Origin Violation"). [ Processing Model for File URNs]

10.2. Lifetime Stipulation for File URNs ¶

User agents MUST ensure that the lifetime of File URNs is the same as the lifetime of the Document [HTML5] of the origin script which spawned the File object on which the urn attribute was called. When this Document is destroyed, implementations MUST treat requests for File URNs created within this Document as 404 Not Found.[Processing Model for File URNs]

10.3. Processing Model for File URNs ¶

The processing model for File URNs is that they MUST only support requests with GET [HTTP] and MUST only support a subset of responses [HTTP], which are:

10.3.1. 200 OK ¶

This response [HTTP] MUST be used if the request has succeeded, namely the File URN has been requested with a GET, satisfies the origin requirement, and satisfies the lifetime requirement.

10.3.2. 403 Not Allowed ¶

This response [HTTP] MUST be used if the request violates the origin requirement. Additionally, it MUST be used if the underlying file's permission structure has changed (thus preventing access from web content). User agents MAY accompany this response with a message (e.g. "Origin Violation").

10.3.3. 404 Not Found ¶

This response [HTTP] MUST be used if the request violates the lifetime requirement (e.g. if a cached File URN persists after the specified lifetime of the URN has elapsed). Additionally, it MUST be used if the underlying file has moved.

10.3.4. 500 Internal Server Error ¶

This response [HTTP] MAY be used as a generic error condition, including for security errors or access violations.

11. Security Considerations ¶

This section is informative.

This specification allows web content to read files from the underlying file system, as well as provides a means for files to be accessed by unique identifiers, and as such is subject to some security considerations. This specification also assumes that the primary user interaction is with the <input type="file"/> element of HTML forms [HTML5], and that all files that are being read by FileReader objects have first been selected by the user. Important security considerations include preventing malicious file selection attacks (selection looping), preventing access to system-sensitive files, and guarding against modifications of files on disk after a selection has taken place.

Preventing selection looping. During file selection, a user may be bombarded with the file picker associated with <input type="file"/> (in a "must choose" loop that forces selection before the file picker is dismissed) and a user agent may prevent file access to any selections by making the FileList object returned be of size 0.
System-sensitive files (e.g. files in /usr/bin, password files, other native operating system executables) typically should not be exposed to web content, and should not be accessed via URNs. User agents MAY raise a SECURITY_ERR if such files are accessed or a read method is called on them.
Post-selection file modifications occur when a file changes on disk after it has been selected. In such cases, if a read method is called on a file, user agents MAY raise a SECURITY_ERR.

Editorial note

This section is provisional; more security data may supplement this in subsequent drafts.

12. Requirements and Use Cases ¶

This section covers what the requirements are for this API, as well as illustrates some use cases. This version of the API does not satisfy all use cases; subsequent versions may elect to address these.

Once a user has given permission, user agents should provide the ability to read and parse data directly from a local file programmatically.
- Example: A lyrics viewer. User wants to read song lyrics from songs in his plist file. User browses for plist file. File is opened, read, parsed, and presented to the user as a sortable, actionable list within a web application. User can select songs to fetch lyrics. User uses the "browse for file" dialog.
Data should be able to be stored locally so that it is available for later use, which is useful for offline data access for web applications.
- Example: A Calendar App. User's company has a calendar. User wants to sync local events to company calendar, marked as "busy" slots (without leaking personal info). User browses for file and selects it. The text/calendar file is parsed in the browser, allowing the user to merge the files to one calendar view. The user wants to then save the file back to his local calendar file. (using "Save As" ?). The user can also send the integrated calendar file back to the server calendar store asynchronously.
User agents should provide the ability to save a local file programmatically given an amount of data and a file name.
- Example: A Spreadsheet App. User interacts with a form, and generates some input. The form then generates a CSV (Comma Separated Variables) output for the user to import into a spreadsheet, and uses "Save...". The generated output can also be directly integrated into a web-based spreadsheet, and uploaded asynchronously.
User agents should provide a streamlined programmatic ability to send data from a file to a remote server that works more efficiently than form-based uploads today
- Example: A Video/Photo Upload App. User is able to select large files for upload, which can then be "chunk-transfered" to the server.
User agents should provide an API exposed to script that exposes the features above. The user is notified by UI anytime interaction with the file system takes place, giving the user full ability to cancel or abort the transaction. The user is notified of any file selections, and can cancel these. No invocations to these APIs occur silently without user intervention.

13. Acknowledgements ¶

This specification was originally developed by the SVG Working Group. Many thanks to Mark Baker and Anne van Kesteren for their feedback.

Thanks to Robin Berjon for editing the original specification.

Special thanks to Jonas Sicking, Olli Pettay, Nikunj Mehta, Garrett Smith, Michael Nordman, Ian Hickson, Sam Weinig, Aaron Boodman, Julian Reschke

Thanks to the W3C WebApps WG, and to participants on the public-webapps@w3.org listserv

14. References ¶

Editorial note

ToDo: Add author names

14.1. Normative references ¶

RFC2119: Key words for use in RFCs to Indicate Requirement Levels
XMLHttpRequest: XMLHttpRequest Level 2
HTML5: HTML 5: A vocabulary and associated APIs for HTML and XHTML
ProgressEvents 1.0: Progress Events 1.0
RFC2397: The "data" URL Scheme
Web Workers: Web Workers
DOM 3 Core: DOM 3 Core
DOM 3 Events: DOM 3 Events
DOMException Extensions Defined in HTML5: DOM 3 Core DOMException Extensions Defined in HTML5
Unicode: The Unicode Standard, Version 5.2.0.
RFC4122: A Universally Unique IDentifier (UUID) URN Namespace
RFC2616: Hypertext Transfer Protocol -- HTTP/1.1
RFC2046: Multipurpose Internet Mail Extensions (MIME) Part Two: Media Extensions
RFC3986: Uniform Resource Identifier (URI): Generic Syntax
RFC1738: Uniform Resource Locators (URL)

14.2. Informative References ¶

IANA Charsets: Official Names for Character Sets on the Internet
File Upload State of the input element: File Upload State of the HTML5 input Element
RFC5234: Augmented BNF for Syntax Specifications: ABNF
RFC4648: The Base16, Base32, and Base64 Data Encodings
Google Gears Blob API: Google Gears Blob API

File API

W3C Working Draft 17 November 2009

Abstract

Status of this Document

Table of Contents

1. Introduction ¶

2. Conformance ¶

3. Terminology and Algorithms ¶

4. The FileList Sequence ¶

5. The Blob Interface ¶

5.1. Attributes ¶

5.2. Methods and Parameters ¶

6. The File Interface ¶

6.1. Attributes ¶

7. The FileReader Interface ¶

7.1. The FileReader Task Source ¶

7.2. Constructors ¶

7.3. Event Handler Attributes ¶

7.4. FileReader States ¶

7.5. Reading a File ¶

7.5.1. Multiple Reads ¶

7.5.2. The result attribute ¶

7.5.3. The readAsBinaryString() method ¶

7.5.4. The readAsDataURL() method ¶

7.5.5. The readAsText() method ¶

7.5.6. The abort() method ¶

7.5.7. Blob and File Parameters ¶

7.5.8. Determining Encoding ¶

7.5.9. Events ¶

7.5.9.1. Event Summary ¶

8. Reading on Threads ¶

8.1. The FileReaderSync Interface ¶

8.1.1. The readAsBinaryString method ¶

8.1.2. The readAsText method ¶

8.1.3. The readAsDataURL method ¶

9. Errors and Exceptions ¶

9.1. The FileError Interface ¶

9.2. The FileException exception ¶

9.3. Error Code Descriptions ¶

10. A UUID URN for File reference ¶

10.1. Origin Policy for File URNs ¶

10.2. Lifetime Stipulation for File URNs ¶

10.3. Processing Model for File URNs ¶

10.3.1. 200 OK ¶

10.3.2. 403 Not Allowed ¶

10.3.3. 404 Not Found ¶

10.3.4. 500 Internal Server Error ¶

11. Security Considerations ¶

12. Requirements and Use Cases ¶

13. Acknowledgements ¶

14. References ¶

14.1. Normative references ¶

14.2. Informative References ¶

7.5.2. The `result` attribute ¶

7.5.3. The `readAsBinaryString()` method ¶

7.5.4. The `readAsDataURL()` method ¶

7.5.5. The `readAsText()` method ¶

8.1. The `FileReaderSync` Interface ¶

8.1.1. The `readAsBinaryString` method ¶

8.1.2. The `readAsText` method ¶

8.1.3. The `readAsDataURL` method ¶

9.1. The `FileError` Interface ¶

9.2. The `FileException` exception ¶