This document is intended to provide simple but working examples of how clinical trial data can be expressed in and leverage the Semantic Web. The Semantic Web uses a triples based model called RDF. Here, I use a particularly mail-friendly expression of this (called "Turtle"). The RDF in this document is from prot.ttl.
@prefix prot: <http://www.w3.org/2013/02/ValueSet/prot#> . @prefix owl: <http://www.w3.org/2002/07/owl#> . @prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> . @prefix xml: <http://www.w3.org/XML/1998/namespace> . @prefix xsd: <http://www.w3.org/2001/XMLSchema#> . @prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> . @prefix LOINC: <http://www.w3.org/2013/02/ValueSet/LOINC#> . @prefix pharma1: <http://www.w3.org/2013/02/ValueSet/pharma1#> . @prefix pharma2: <http://www.w3.org/2013/02/ValueSet/pharma2#> . @prefix terminology1: <http://www.w3.org/2013/02/ValueSet/terminology1#> . @prefix terminology2: <http://www.w3.org/2013/02/ValueSet/terminology2#> .
If terminology1 has defined terms like 1+
, 2+
, etc. and terminology2 terms like slight
, moderate
, etc. 3rd (and 4th, ...) parties can define value sets re-using those terms.
These value sets are used as the answer set for multiple CRF questions.
################################################################# # Value Sets ################################################################# pharma1:valueSet1234 owl:oneOf ( # the value set"pharma1:valueSet1234"
includes the terms"1Plus"
..."4Plus"
from"terminology1"
. terminology1:1Plus terminology1:2Plus terminology1:3Plus terminology1:4Plus ) . pharma2:valueSetPick owl:oneOf ( # the value set"pharma2:valueSetPick"
includes those plus"not_checked"
,"WNL"
and"absent"
fromterminology2
. terminology2:not_checked terminology2:WNL terminology2:absent terminology1:1Plus terminology1:2Plus terminology1:3Plus terminology1:4Plus ) . pharma2:valueSetBoth owl:oneOf ( # the value setpharma2:valueSetBot
includes overlapping values fromterminology1
andterminology2
. terminology2:not_checked terminology2:WNL terminology2:absent # more ontology statements are required to assert that, for a given question,1Plus
is the same asmild
. terminology1:1Plus terminology1:2Plus terminology1:3Plus terminology1:4Plus terminology2:slight terminology2:moderate terminology2:severe terminology2:very_severe ) .
We then have definitions for the semantic of, and permissible values of, questions which appear on CRFs:
################################################################# # Questions ################################################################# pharma1:Q_Extol rdfs:label "Exercise tolerance" ; # Extol has a label of "Exercise Tolerance" rdfs:subClassOf prot:CRFQuestion , LOINC:pulmonary_functon_test , # Extol is a subclass of CRFQuestion and pulmonary_functon_test [ owl:onProperty prot:obsValue ; owl:allValuesFrom pharma1:valueSet1234 ] . # every Extol has anobsValue
property with a value invalueSet1234
. pharma2:Q_Tread rdfs:label "Treadmill endurance" ; rdfs:subClassOf prot:CRFQuestion , LOINC:cardiac_function_test , [ owl:onProperty prot:obsValue ; owl:allValuesFrom pharma2:valueSetPick ] . pharma2:Q_Sleep rdfs:label "Sleep disturbance" ; rdfs:subClassOf prot:CRFQuestion , LOINC:interruption_of_REM_sleep , [ owl:onProperty prot:obsValue ; owl:allValuesFrom pharma2:valueSetBoth ] .
These leverage a constraints language called OWL (for Web Ontology Language) and annotate the tests with some semantics.
In this case, we're pretending that LOINC terms are human-readable.
Now a study which tries to answer with data which is not in the appropriate value set will be marked as inconsistent.
For instance, terminology2:WNL
is NOT a member of pharma1:valueSet1234
and thus not a permitted value of a pharma1:Q_Extol
result.
We'd like to query for the "compatible" data. Since these are all values with very weak semantics on their own, we must leverage the annotations attached to the question definitions above. These, while distinct, can leverage a hierarchy with the terminology from which they were drawn. I've used a rather flat sub-class-of hierarchy to illistrate this:
LOINC:pulmonary_functon_test rdfs:subClassOf LOINC:exercise_evaluation . LOINC:cardiac_function_test rdfs:subClassOf LOINC:exercise_evaluation .
Presuming we have acquired data for some combination of study, CRF, question and sampling time,
################################################################# # Instantiations ################################################################# pharma1:study1_crf1_Q_extol_t1 a pharma1:Q_Extol ; # study1_crf1_Q_extol_t1 instantiatesExtol
("Exercise Tolerance" defined above). prot:obsValue terminology1:1Plus . # study1_crf1_Q_extol_t1 has anobsValue
of1Plus
(which is invalueSet1234
). pharma2:studyA_crf5_Q_tread_t1 a pharma2:Q_Tread ; prot:obsValue terminology1:2Plus . pharma2:studyB_crf3_Q_sleep_t4 a pharma2:Q_Sleep ; prot:obsValue terminology1:3Plus .
A query for a answers of a common parent will separate the results for the exercise tolerance and treadmill endurance from those for sleep disturbance:
PREFIX : <http://www.w3.org/2013/02/ValueSet/prot#> PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#> PREFIX LOINC: <http://www.w3.org/2013/02/ValueSet/LOINC#> SELECT ?t ?val { ?t :obsValue ?val ; a LOINC:exercise_evaluation . }
yields the values from the related tests:
?t | ?val |
---|---|
pharma1:study1_crf1_Q_extol_t1 | terminology1:1Plus |
pharma2:studyA_crf5_Q_tread_t1 | terminology1:2Plus |
The above query requires runtime inferencing. This can be done entirely in SPARQL by using property paths:
PREFIX : <http://www.w3.org/2013/02/ValueSet/prot#> PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#> PREFIX LOINC: <http://www.w3.org/2013/02/ValueSet/LOINC#> SELECT ?t ?val { ?t :obsValue ?val ; rdf:type/rdfs:subClassOf* LOINC:exercise_evaluation }
$Revision: 1.9 $ of $Date: 2013/03/03 18:33:08 $ by $Author: eric $