As part of its wider service, ORCID currently provides data about individuals in RDF. This document proposes a number of small changes to this service that, it is hoped, will help improve the semantics and robustness of the data. In making this proposal it should be emphasised that the current solution is already good, many many more things are right than wrong; the aim is to make it even better.
The proposal adheres to a number of guiding principles:
As an example, the following data is currently returned from http://orcid.org/0000-0003-0782-2704 with accept headers set to text/turtle. Line numbers have been added for ease of reference.
1 @prefix gn: <http://www.geonames.org/ontology#> . 2 @prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> . 3 @prefix prov: <http://www.w3.org/ns/prov#> . 4 @prefix foaf: <http://xmlns.com/foaf/0.1/> . 5 @prefix pav: <http://purl.org/pav/> . 6 @prefix owl: <http://www.w3.org/2002/07/owl#> . 7 @prefix xsd: <http://www.w3.org/2001/XMLSchema#> . 8 @prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> . 9 <http://orcid.org/0000-0003-0782-2704/> 10 a foaf:PersonalProfileDocument , foaf:OnlineAccount ; 11 rdfs:label "0000-0003-0782-2704" ; 12 pav:contributedOn "2012-12-07T14:37:24.441Z"^^xsd:dateTime ; 13 pav:createdBy <http://orcid.org/0000-0003-0782-2704> ; 14 pav:createdOn "2012-12-07T14:34:08.399Z"^^xsd:dateTime ; 15 pav:createdWith <http://orcid.org> ; 16 pav:lastUpdateOn "2015-02-16T03:21:12.933Z"^^xsd:dateTime ; 17 prov:generatedAtTime "2015-02-16T03:21:12.933Z"^^xsd:dateTime ; 18 prov:wasAttributedTo <http://orcid.org/0000-0003-0782-2704> ; 19 foaf:accountName "0000-0003-0782-2704" ; 20 foaf:accountServiceHomepage <http://orcid.org> ; 21 foaf:maker <http://orcid.org/0000-0003-0782-2704> ; 22 foaf:primaryTopic <http://orcid.org/0000-0003-0782-2704> . 23 <http://sws.geonames.org/2750405/> 24 a gn:Feature , <http://schema.org/Place> , rdfs:Resource , <http://www.w3.org/2003/01/geo/wgs84_pos#SpatialThing> ; 25 rdfs:label "Netherlands" , "Kingdom of the Netherlands" ; 26 gn:countryCode "NL" ; 27 gn:name "Netherlands" , "Kingdom of the Netherlands" . 28 <http://orcid.org/0000-0003-0782-2704> 29 a foaf:Person , prov:Person ; 30 rdfs:label "Ivan Herman" ; 31 foaf:account <http://orcid.org/0000-0003-0782-2704/> ; 32 foaf:based_near 33 [ a gn:Feature ; 34 gn:countryCode "NL" ; 35 gn:parentCountry <http://sws.geonames.org/2750405/> 36 ] ; 37 foaf:familyName "Herman" ; 38 foaf:givenName "Ivan" ; 39 foaf:name "Ivan Herman" ; 40 foaf:page <http://www.ivan-herman.name> , <http://www.w3.org/People/Ivan/> , <http://www.ivan-herman.net/professional/> ; 41 foaf:plan "See http://www.ivan-herman.net/professional/CV.html" ; foaf:publications <http://orcid.org/0000-0003-0782-2704/> .
Example 1: Ivan Herman's data 2015-03-06
It is important to note that this data includes two identifiers that differ in the presence or absence of the trailing slash, i.e. the two identifiers are:
http://orcid.org/0000-0003-0782-2704
http://orcid.org/0000-0003-0782-2704/
These are used consistently: the identifier without the trailing slash is used to identify the individual person, that with the trailing slash the online account held by that person. Strictly speaking, this is perfectly correct, however, it is dangerous as discussed recently in a W3C mailing list. Note in particular the contributions from Stian Soiland-Reyes who contributed to the current ORCID implementation. There are several objections to the current implementation:
In short, everyday experience suggests that the presence or absence of a trailing slash on a URL is insufficient and potentially hazardous method to distinguish between a person and information associated with that person. As the recent online discussion shows, the debate about whether http://orcid.org/0000-0003-0782-2704 should identify Ivan Herman or an account held by him is unlikely to lead to consensus.
Can the discussion be avoided altogether?
ORCIDs are defined in terms of what they do, not what they represent, i.e. “... a persistent digital identifier that distinguishes you from every other researcher and, through integration in key research workflows such as manuscript and grant submission, supports automated linkages between you and your professional activities ensuring that your work is recognized.”
The proposed way forward is consistent with that definition: that the semantics of
an ORCID should be simply that it is an ORCID. On its own, it identifies neither
the person nor their account, but dereferencing that identifier in a semantic workflow
will return semantically accurate data. This includes information about the individual person, who
should be identified within the data by appending the fragment #person
.
Similarly, the account would be identified by appending #account
. There is a
further subject in the example data above: the list of publications which is neither the person
nor the account and so should be identified by appending #pubList
.
If adopted, the current implementation would change such that, again within the data,
http://orcid.org/0000-0003-0782-2704
would be replaced by
http://orcid.org/0000-0003-0782-2704#person
http://orcid.org/0000-0003-0782-2704/
would be replaced by
http://orcid.org/0000-0003-0782-2704#account
in all cases except for the value of the foaf:publications
property
(line 41 in the example) which would become http://orcid.org/0000-0003-0782-2704#pubList
.
The advantages of this solution are:
The potential disadvantage of this or any change to the current implementation is that it might adversely affect other people's systems that use the data.
If individual operators are known to use ORCID's RDF data then they should be contacted and the issues discussed. The unknown users are harder to reach but this can probably be achieved through a variety of outreach mechanisms, such as an online call for comment that can be promoted through tweets, conference talks and more.
Any change should be signalled well in advance.
A further improvement in the data returned when dereferencing an ORCID would be to include more of the information available to human readers. Ivan Herman's ORCID Web page shows his education and a full list of his publications but this is not included in the machine readable output. One way forward might be to augment the HTML page with RDFa markup but it's likely to be easy to add to the published RDF data too.
Furthermore, noting the proposal to use the #pubList
> fragment as the subject of
the foaf:publications
property, it would be logically consistent if the Web
page page that humans see when dereferencing an ORCID in a regular browser were amended
to include an id of pubList
on the relevant HTML element.
The current implementation uses content negotiation to return data in HTML, RDF/XML, RDF Turtle, XML and JSON (as an aside, it would be good to add JSON-LD to this list). However, the availability of this functionality could be much more obvious. The usual method, exemplified in sites such as OpenCorporates and Ordnance Survey is to:
For example:
<http://orcid.org/0000-0003-0782-2704.json> a <http://purl.org/dc/dcmitype/Text>, foaf:Document ; dcterms:isFormatOf <http://orcid.org/0000-0003-0782-2704> ; dcterms:format "application/json" .
Making these changes has several advantages:
As noted at the beginning of this short document, there is a lot more right with the current implementation than wrong with it. The changes proposed are incremental in nature and, it is hoped, will increase the utility of the services offered to ORCID's primary constituents in the research community through better discoverability and better semantics.
Phil Archer
W3C Data Activity Lead
March 2015