Improving Access to Financial Data on the Web

Two of the items under the 'proposed technical work' section of the workshop report suggest concrete steps to be taken within a W3C Incubator Group:

Practices for naming business and financial information as a basis for combining different sources of information. (Naming)
The openess of the Web can lead to abuse. This makes it important for there to be a robust treatment of provenance. What does this imply in the context of the social Web? (Provenance)

Naming

A strength of the architecture of the web is the central role played by the relationship between names and resources. We propose to leverage that W3C strength, in the public interest. The W3C would create specifications by which information collectors could publish up-to-date lists of the public domain unique identifiers that they bind to entities on which they collect information. The information would include all of the public information that would help users and applications to infer, construct or publish their own relationships among identifiers (experience shows that geographic and postal address information is essential to the mapping process). To take two concrete examples:

Every SEC registrant has a "Central Index Key" (CIK). The CIK-to-company binding is (implicitly) updated nightly at the SEC ftp site. There are over 20,000 CIKs and they are never recycled.
Every FDIC insured bank or bank holding company has an RSSD or Certificate or both, and a current listing can be derived from tab-delimited files published at the Central Data Repository public data distribution site (CDR PDD). There are over 20,000 Certificates and RSSDs, and they are never recycled.

Among those 40,000 identifiers there are only about 750 public companies that are also banks or bank holding companies. Moreover, public companies as registered with the SEC generally own a bank or bank holding company as a fully owned subsidiary; using the financial data from either location requires some awareness of that relationship. But without being able to identify such name registries as such, you can't even begin.

Several analogies to existing specifications were suggested:

It could be an RSS derivative with additional elements in a new namespace to represent business address, mailing address, business registration country or state, domain names, etc., and the RSS feed providign a stream of updates.
It could involve a fundamental ontology of financial entities starting basid distinctions such as government vs commercial, public vs. private, and the thirteen or so fundamental accounting concepts.
It could be analagous to UDDL in the sense that it provides a protocol for a registry, partial replication of registries, etc.
It could specify particular URI syntax such as "http://{authority}?namelookup=..." that would return a set of (scored) matches on names.
Such a name registry requires at least one unique key but there is no reason it could not contain in its name entries, the keys of other name registries (e.g. an SEC name registry could provide the FINRA identifier if available).

Focusing on this aspect of "Financial Data on the Web" lends itself to a bottom-up approach, with each additional government agency or public interest group publishing identifiers adding value and momentum.

Another strengh of the W3C to leverage is that it is well position to develop internationally acceptable specifications, allowing for the first time financial entities registered in different companies to be identified on a common, non-proprietary basis.

Provenance

Incorrect name bindings arising from error or manipulation could undermine or completely defeat the reliability and usability of name registries. Specifications to ensure that any such name registry is required to record sufficient information that applications could discover the origin of the information may be essential.

Improving Access to Financial Data on the Web

Proposed Next Steps

Naming

Provenance