W3C

Provenance Incubator Group Charter

The mission of the Provenance Incubator Group, part of the Incubator Activity, is to provide a state-of-the art understanding and develop a roadmap in the area of provenance for Semantic Web technologies, development, and possible standardization.

Join the Provenance Incubator Group.

End date 30 November 2010
Confidentiality Proceedings are public
Initial Chair Yolanda Gil, USC/ISI
Initiating Members
Usual Meeting Schedule Teleconferences: Weekly
Face-to-face: 1-2, bound to major conferences like WWW201X or ISWC20XX

Scope

The provenance of information is crucial to making determinations about whether information is trusted, how to integrate diverse information sources, and how to give credit to originators when reusing information.  Broadly construed, provenance encompasses the initial sources of information used as well as any entity and process involved in producing a result.  In an open and inclusive environment such as the Web, users find information that is often contradictory or questionable.  People make trust judgements based on provenance that may or may not be explicitly offered to them.  Reasoners in the Semantic Web will need explicit representations of provenance information in order to make trust judgements about the information they use.  With the arrival of massive amounts of Semantic Web data (eg, via the Linked Open Data community) information about the origin of that data, ie, provenance, becomes an important factor in developing new Semantic Web applications. Therefore, a crucial enabler of the Semantic Web deployment is the explicit representation of provenance information that is accessible to machines, not just to humans.

Provenance is concerned with a very broad range of sources and uses.  Business applications may exploit provenance in trusting a product as they consider the manufacturing processes involved.  The provenance of a cultural artefact in terms of its origins and prior ownerships is crucial to determine its authenticity.  In a scientific context, data is integrated depending on the collection and pre-processing methods used, and the validity of an experimental result is determined based on how each analysis step was carried out.  Throughout this diversity, there are many common threads underpinning the representation, capture, and use of provenance that need to be better understood to enable a new generation of Semantic Web applications that takes provenance and trust into account.

There are many pockets of research and development that have studied relevant aspects of provenance.  The Semantic Web and agents communities have developed algorithms for reasoning about unknown information sources in a distributed network. Logic reasoners can produce justifications of how an answer was derived, and explanations that help find and fix errors in ontologies. The information retrieval and argumentation communities have investigated how to amalgamate alternative views and sources of contradictory and complementary information taking into account its origins.   The database and distributed systems communities have looked into the issue of provenance in their respective areas. Provenance has also been studied for workflow systems in e-Science to represent the processes that generate new scientific results.  Licensing standards bodies take into account the attribution of information as it is reused in new contexts.  However, these results are not really known to the Semantic Web community, nor are they necessarily expressed in terms that could facilitate their adoption.  Moreover, it is unclear that this existing body of work could address all the needs for provenance management in the Semantic Web without a better understanding of what those needs are.

Many issues arise regarding provenance in the context of the Semantic Web.  An important issue is the basic level of provenance that is desirable to record for a given piece of information.  More detailed and finer-grained provenance is often preferable but it comes at a cost in terms of performance and storage, so the needs and trade-offs need to be explored.  Another important question is how to verify the provenance information, that is, how to ensure its authenticity and how the provenance mechanisms would be integrated with cryptographic techniques (eg, signature).  Another important issue is the presentation of provenance information to the end user, selecting aspects that are relevant based on the user context and designing it to be understandable.

Generally, there is an open question regarding the delineation of the aspects of the provenance problem that are the direct concern of Semantic Web versus other areas in the web architecture.

The goal of this incubator group is to provide a state-of-the art understanding and develop a roadmap in the area of provenance for Semantic Web technologies, development, and possible standardization.  This includes:

Success Criteria

Publication of a comprehensive state-the-art report on the subject, and of a roadmap touching on all the subjects listed in the Scope section.

Out of Scope

  • Authentication of identity
  • Authorization to access a resource
  • Legal issues concerning provenance

Deliverables

The group will maintain a wiki site containing relevant information on the subject. Furthermore, the group will publish one or more reports on subjects listed in the Scope section. Finally, in case the group decides that a particular technology is ripe for further standardization at the W3C, the group will consider preparing a W3C member submission and/or propose a W3C group charter to be considered by the W3C.

Dependencies

W3C Groups

There are some W3C groups whose results and work this XG will need to take into account as enabling technologies.  These include:

SPARQL Working Group
Querying provenance issues may require particular query patterns which may depend on the possibilities offered by the new developments in SPARQL.
RDB2RDF Working Group
The RDB2RDF Working Group plans to provide means to translate the content of Relational Database to the Semantic Web. This translation process may very well involve provenance related issues, and this group will provide feedback to the group in case that is necessary as well as take into account the developments in that group.
Web Security Activity
The Web Security Activity is concerned with security technology particularly in the context of fraud and deception.

There are some W3C groups that could provide requirements and use cases.  We will seek liasons with these groups.  They include:

eGovernment Interest Group
This group is investigating how governments can improve information access through the use of the web.  In particular, they recognize provenance as a key issue in publishing open government data.
Semantic Web Health Care and Life Sciences (HCLS) Interest Group
This group is concerned with the use of semantic technologies for biomedical science and translational medicine.  This includes the development of use cases and guidelines to facilitate adoption in these areas, where provenance may be relevant.
Web Security Activity
The Web Security Activity is developing use cases that could be relevant for this group, such as authentication and access to information.
Social Web Incubator Group
This group is interested in data portability, privacy, and trust.

Participation

It is envisioned that the XG will begin with 60 minute teleconferences every week. This can be modified as the work proceeds. Additionally, it may be useful to have periodic face-to-face meetings at a venue for which a significant number of XG participants are likely to attend (eg, WWW and ISWC conferences).

Communication

This group primarily conducts its work on the public mailing list public-xg-prov@w3.org (archive). The group's Member-only list is member-xg-prov@w3.org archive)

Information about the group (deliverables, participants, teleconferences, etc.) is available from the Provenance Incubator Group home page.

Decision Policy

As explained in the Process Document (section 3.3), this group will seek to make decisions when there is consensus. When the Chair puts a question and observes dissent, after due consideration of different opinions, the Chair should record a decision (possibly after a formal vote) and any objections, and move on.

Patent Policy

This Incubator Group provides an opportunity to share perspectives on the topic addressed by this charter. W3C reminds Incubator Group participants of their obligation to comply with patent disclosure obligations as set out in Section 6 of the W3C Patent Policy. While the Incubator Group does not produce Recommendation-track documents, when Incubator Group participants review Recommendation-track specifications from Working Groups, the patent disclosure obligations do apply.

Incubator Groups have as a goal to produce work that can be implemented on a Royalty Free basis, as defined in the W3C Patent Policy.

For more information about disclosure obligations for this group, please see the W3C Patent Policy Implementation.

Additional Information

About this Charter

This charter for the Provenance Incubator Group has been created according to the Incubator Group Procedures documentation. In the event of a conflict between this document or the provisions of any charter and the W3C Process, the W3C Process shall take precedence.

Charter update history:


Yolanda Gil, Ivan Herman

$Date: 2010/09/27 08:06:35 $