See also: IRC log
[slide 3]
<Joanne_Luciano> slides aren't numbered :-(
<Joanne_Luciano> ah, but the browser numbers them!
<egombocz> If you look at them not in show mode, you can see the numbers on the side thumbnails
Cecil: antibiotic-resistent
airline passenger promted review on Tuberculosis Information
Management System (TIMS)
... reporting a TB case required passing a brittle set of
messaging and business rules
[slide 4: Message Processing Integration]
Joanne_Luciano: each state wanted their own standard?
Cecil: CDC wanted a
standard
... states would take anything which makes reporting
easier
... [re: slide 4]
... choices about how to import messages to CDC
... .. after message had some processing
... .. as a Web Service RPC
[slide 5: Deployment Architecture]
Cecil: going with existing CDC
infrastructure
... staring from left:
... .. some source, usually state or large counties (53
jurisdictions) reports
<Joanne_Luciano> is going with the CDC one of those three options on slide 4 or is it another one (not listed on slide 4)?
Cecil: .. goes into data messaging broker, which validates syntax
<Joanne_Luciano> looks like it's option 1 on slide 4
Cecil: .. if a valid TB message,
off to content validation queue
... .. also split into components for e.g. line listing of
incoming cases
... .. after validation, email with contents of alert sent to
CDC's TB group
Joanne_Luciano: this is slide 3 option 1?
Cecil: this is slide option 3
(RPC)
... we had tried driving real-time alerting from biosense
... we took messages off the first transport, never queued in
DMB [slide 5 left]
... the HL7 2.x standard is fairly loose
... flexible, can take any payload
... can be structured in any way
... segments are well-defined, but segment structure requires
point to point negotiation
... p2p neg is a guideline
charlie: HL7 2.x is a syntactic standard and a semantics guideline
[slide 6: Message Content Validation Architecture]
<Joanne_Luciano> JMS?
Cecil: after leaving broker,
falls into JMS interface
... because this has the 2.5 validation, we don't need the 2.x
syntactic validation
... so we don't do the validation
... before we went live, we validated and found 2 errors in HL7
messaging
... (was a benefit of 2-tier validation)
... once live, we don't do syntacit validation
... but we do parse out components
... questions like birthday and date of problem were found via
OBX extractions
... an OWL ontology tells us how to process a message
... the ontology links all the knowledge
... it guides parsing the message by aligning the OBX-extracted
facts with an RDF graph
... we can then use the JESS reasoner for evaluating these
facts
... JESS (Java Expert System Shell) is a rules FW/BW chaining
rules engine
... has a protege plugin, interprets SWRL
... good commercial tool for high-volume processing
... paid for by tax dollars, only free for government
use
... $75K otherwise
<Stuart> Drools
<iker> DROOLS
<mr_sticky> Drools is from JBoss
<mr_sticky> http://www.jboss.org/drools
Cecil: we tried Drools, which has
FW/BW chaining and similar fact structure
... use JESS if you're processing millions of facts
Joanne_Luciano: and Jena?
Cecil: no experience with
it
... at OTR, we pass what we expect to see and what we got as
two graphs
... the choreography of the OTR framework works out that
something is a question about an e.g. resistance pattern of
anitbiotic
... we have a set of "listeners" (patterns)
... we built this on V3 semantics, but mapped back to V2
syntax
... once we've matched the graph against the patterns, we pass
it to jess
... we give jess the profile for an e.g. normal patient, MDR
(multi drug resistant) patient, XDR (extensive drug resistant)
(potential super-spreader)
... the reasoning framework decides if an event needs
action
... another listener strains through alerts from JESS for
outbound messaging
... we also use the output for visualization
... folks don't need to need to use SAS to extract this data
from mid-tier, instead just using graph representations
... with agreement from CDC, we could have sent output messages
back to reporters
... output:
... .. drug resistant
... .. appropriateness of drugging (per WHO codes)
... .. predictive analysis of whether someone is likely to fall
off treatment based on patient history
[slide 7: Types of problems that could be solved by extending the TB framework]
Cecil: had to bend to time and
budget limitations
... we could have added a d2rq interface to retrofit the
pre-existing data
... a lot we could have done
[slide 8: The use of an OWL ontology]
Cecil
[slide 9: HL7 Message Artifact Taxonomy]
Cecil: this is how we mapped the OBX structure to the ontology
[slide 11: Rule Processing]
[slide 12: Message Content Validation Rule Implementation]
Cecil: this demonstrates the
advantage of using OWL
... the blue is what we deleted
... (from TIMS)
... went from 358 to 175
... reduces frustration of reporters facing conflicting
rules
... beyond OWL being able to do syntax, vocabulary, rule
processing, we see the advantage of declarative rules
[slde 13: Message Content Validation Rules]
Cecil: with tons of volume and response time requirements, you need a more efficient bw-chaining system (JESS)
[slide 14: Message Content Validation Results View]
Cecil: sample output
[slide 15: Processing Results]
Cecil: average processing time
3.5s round trip
... far faster than a human, and more accurate
... scales up to ~350k messages/day
... ~300K TB messages/year
... could scale to influenza
... at worst case (4 month window), 50-75M, so ~ 200K
message/day
... in a surveillance, you're also looking at folks who don't
have it
... feeds from 800 VA hospitals, + laps a quest and labcore,
...
... congress says we need response in 2 mins
... had to put everything in memory
... biosense lost funding
mscottm: summary of SemWeb
advantages is very different from our usual tech demos in
HCLS
... what are your SemWeb wins?
... what could be improved?
charlie: would like formal
continuation
... to help us find focal points in HCLS
Cecil: SemWeb is a flexible way
to extract knowledge
... we were given a TB messaging system and a deadline
... 7 days before deadline, CDC said we'd like to upgrade a 1.2
of our implementation guideline
... had around 35 new rules and 100 terminology changes
... because everything CDC gave us was in the OWL. expected to
do it in 4 days
... made it on 4 days with no additional charge to CDC
... big commercial motivation is the flexibility at responding
to rapidly changing knowledge
... at NCI, i wanted to build an EMR system
... NCO SHARP projects kind of get to this
... win 1: rapid software engineering
... win 2: rule validation
... win 3: can infer things that a human has problems
inspecting
<mscottm> Nice to hear that experience in the field confirms my main sales pitch about advantage of SemWeb tech for software: easier maintenance and change, agile development, effectively lower cost.
Cecil: .. (large systems (e.g. BRIDG's UML) hard to swap into a brain)