Warning:
This wiki has been archived and is now read-only.
Tutorial/RDF Vocabularies
This is part 2 of where John and Sandro are developing the slides for their gov2expo talk.
There is a simple script which turns this page into the slidy (real) version.
Title | Mission Possible: Deploying Government Linked Data (Pt2) |
---|---|
Author | Sandro Hawke, (sandro@w3.org), W3C/MIT, @sandhawke John L. Sheridan, @johnlsheridan |
Event | gov 2.0 expo, May 25-26, 2010, Washington DC |
Contents
- 1 Part 2
- 2 Vocabulary as Interface
- 3 Use Case: Crime Reports
- 4 Identify Your Subjects
- 5 Assign Good URIs
- 6 Some Bad (Linked Data) URIs
- 7 Namespaces
- 8 URI Lifecycle
- 9 URI Lifecycle (cont)
- 10 Properties
- 11 Property As Question
- 12 Object and Data Properties
- 13 Data Types
- 14 Some Well-Known Properties
- 15 Overlapping, Competing Vocabularies
- 16 Subproperty
- 17 Messy Overlap
- 18 Conversion Rules
- 19 Applying Rules
- 20 Advice?
- 21 Classes and Subclasses
- 22 Domain and Range
- 23 OWL
- 24 Inference
- 25 SKOS
- 26 Finding Vocabularies
- 27 Browsing Vocabularies
- 28 Creating Vocabularies
- 29 Good Modeling
Part 2
Viewing Your Data as Triples
- Good URIs
- Properties (relationships and attributes)
- Overlap and Competition
- Classes (Description Logic)
- Finding and/or Creating Vocabularies
Vocabulary as Interface
How do programs communicate via triples?
Alice and Bob publish triples, Charlie's software tries to use their data.
It's all about the specific URIs, the vocabulary.
Essentially, when using RDF, the vocabulary is the syntax, the API.
Use Case: Crime Reports
Data Providers:
- Springfield Police Department "Police Blotter"
- Springfield FBI Field Office
- Springfield Citizen's Watch
Data Consumers:
- Journalists
- Mobile Apps (AreYouSafe)
- Real Estate Listings
- Driving Directions
Identify Your Subjects
What are the things your data is about?
(items, entities, objects, individuals, resources)
in scenario: incident, location, suspect/convict, trial, stolen/damaged property, victim
See UML, Database Records, your web site
Assign Good URIs
give them good, long-term URI names
Pick URIs that:
- no one will want for something else
- that may become unfashionable, but wont become wrong
- that someone will web-serve forever
Such as:
See Designing URI Sets for the UK Public Sector
Some Bad (Linked Data) URIs
Misleading Names
- http://example.gov/recent_crime_downtown
- fine if it's a list of recent crimes downtown
- lousy identifier for a particular crime
Can't Derefernce
- tag:hawke.org,2001:Sandro_Hawke
- urn:isbn:0451450523
- urn:uuid:6e8bc430-9c3a-11d9-9669-0800200c9a66
Dereference Doesn't Work
Conflating Item and Page About Item
- http://en.wikipedia.org/wiki/Tim_Berners-Lee identifies a web page, not a person
- (It can be used indirectly, but that's different)
Unreadable names (usually):
Namespaces
Shared Leading Prefix:
- http://dbpedia.org/resource/Massachusetts
- http://dbpedia.org/resource/Boston
- http://dbpedia.org/resource/Deval_Patrick
Namespace name: "http://dbpedia.org/resource/"
Confusingly similar to XML namespaces; sometimes the same, sometimes different.
Namespace document:
- dereference http://dbpedia.org/resource/
- dereference http://xmlns.com/foaf/0.1/
- dereference http://vocab.deri.ie/dcat#
Often the same as dereferencing things in namespace.
URI Lifecycle
Stage 1: Unstable
- Coordinate with all users about every change in meaning
- Easy when you're the only user
- Gets harder, slower as user community grows
- Some users will avoid unstable terms
Stage 2: Stable
- No incompatible changes in meaning.
- okay to improve/rewrite documentation
- don't break people's running code
- maybe change meaning to resolve harmful ambiguity
- (break minority code)
URI Lifecycle (cont)
Stage 3: Deprecated
- Still stable, still served, but not recommended for use
- Should no longer be produced; still consumed for a while.
Stage 4: Dead
- URI can no longer be de-referenced. Best to avoid this.
URIs in a namespace can be at different stages.
Don't version namespaces. Don't be fooled by:
(They were chosen before we knew this, and now they're stuck.)
Properties
Essential to understand a triple, the middle part
Also known as:
- relation, relationship,
- predicate
- column, column-name
- attribute
- field, member, slot (sort of)
Property As Question
Each triple states the answer to a question.
- The subject: the item the question is about
- you, me, Massachusetts, the moon, crude oil, ... whatever
- The property: the question
- who created it? when was it created? where is it located?
- The value: the answer to the question
Object and Data Properties
A Data Property:
- value is a literal (string, number, date, etc)
- data_created, height, weight, name, name_of_owner
An Object Property:
- value is another first-class entity (not just a literal)
- owner, near, friend, hometown, capital
Data Types
RDF uses some XML Schema Part 2: Datatypes, usually these ones.
Each datatype is a mapping:
- from character strings, eg "3" or "3.0" or "0003"
- to their values, eg the number three
- xs:decimal
- xs:integer
- xs:int
- xs:byte
- xs:int
- xs:integer
- xs:time
- xs:dayTimeDuration
... etc
Some Well-Known Properties
@@@ make these be links to real documentation
rdfs:label
rdfs:comment
owl:sameAs
foaf:name
dc:creator
Overlapping, Competing Vocabularies
two terms for the same thing
owl:sameAs
owl:equivalentProperty
- dc:creator owl:equivalentProperty dcterms:creator
- foaf:name owl:equivalentProperty vcard:FN
two terms for possibly-identical things. splitting hairs:
- eg: Peter Pan (various fictional works, productions, editions, copies, characters)
Compare: City of Boston (politcal entity) City of Boston (geographic region)
The Pedantic Web
Subproperty
(dc refines)
rdfs:subPropertyOf
- sandro contact john
- sandro friend john
- sandro recentFriend john
- sandro presentedWith john
- sandro friend john
- friend rdfs:subPropertyOf contact
- recentFriend rdfs:subPropertyOf friend
- presentedWith rdfs:subPropertyOf contact
geographically near, overlapping, contained-within
Messy Overlap
Not all related properties are equiv or sub
foaf:firstName, foaf:givenName, foaf:lastName, foaf:familyName, foaf:name
Conversion Rules
if { ?x foaf:firstName ?first; foaf:lastName ?last } then { ?x foaf:familyName ?last; foaf:givenName ?first; foaf:name func:string-join(?first " " ?last) } if { ?x foaf:name ?name } and pred:contains(?name, " ") then # incorrect if lastname has space, like Hillary Rodham Clinton { ?x foaf:firstName func:string-before(?name, " "); foaf:lastName func:string-after(?name, " ") }
Applying Rules
- In custom code, or using rules engine
- In producer (publish using many vocabs)
- In consumer (accept many vocabs)
Advice?
The world is full of competing standards. That's good, but painful.
Probably best to follow Postel's Law:
Be conservative in what you do; be liberal in what you accept from others.
Research topic: automatic downloading of conversion rules
Classes and Subclasses
Sets of objects with something in common.
Instances / rdf:type
Subclass hierarchy
plant / large_plant / tree / mature_horse_chestnut, the one in my back yard
Domain and Range
domain: the class of things which might have this property.
range: the class of possible values for this property
Domain | Property | Range |
---|---|---|
foaf:Person | foaf:firstName | (string) |
dcat:Catalog | dcat:record | dcat:CatalogRecord |
eg:Parent | eg:daughter | eg:Female |
rdfs:Property | rdfs:domain | rdfs:Class |
OWL
Powerful way of declaring how properties, classes, and individuals relate to each other.
"Ontologies"
http://www.w3.org/TR/owl2-primer/
- class expressions ( US_Citizen and Irish_Citizen and not Minor )
- ChildofSandro (anythin with 'parent' being Sandro)
- BigTree (trees with height over 30 feet)
- inverse properties
- Sandro child Gregorian
- Gregorian parent Sandro
- => parent owl:inverseOf child
- negative assertions (some triple is false)
OWL can be conveyed in triples, but also has some easier-to-read syntaxes. I suggest Manchester, when you don't need triples.
Inference
machines ("reasoners") can process these ontologies
given:
- Gregorian parent Sandro
- parent owl:inverseOf child
they will infer
- Sandro child Gregorian
Which is great if you're querying for child and you have some parent data.
Also helps find errors in data and modeling.
Technology from AI research
http://www.w3.org/2001/sw/wiki/Category:Reasoner
SKOS
A less formal way to document your URIs.
Everything is a Concept. General broader/narrower.
Good when you want to quickly leverage existing controlled vocabulary.
http://www.w3.org/TR/2009/NOTE-skos-primer-20090818/
Finding Vocabularies
http://www.w3.org/2001/sw/wiki/Category:Search_Engine
Browsing Vocabularies
use the HTML documentation
use an ontology viewer http://www.w3.org/2001/sw/wiki/Category:Visualizer
Creating Vocabularies
text editor
protege
topbraid composer
neologism
http://www.w3.org/2001/sw/wiki/Category:Editor
Good Modeling
- Which items are you communicating about?
- What are the logical groups (classes) of those items?
- What properties can each kind of item have?
- use your existing models (UML, SQL, Spreadsheets)
- expect to change your design over time