SV_MEETING_TITLE -- 14 May 2012

<mscottm> https://docs.google.com/document/d/1VrBWM133Pxxqe5jbjgg2IPiwv5JaHo98DDJdqOT1yh0/edit

<mscottm> http://www.w3.org/2012/05/07-HCLS-minutes.html

<joannne_luciano> joanne just joined.

<mscottm> http://www.w3.org/2012/05/07-HCLS-minutes.html

<mscottm> https://docs.google.com/document/d/1VrBWM133Pxxqe5jbjgg2IPiwv5JaHo98DDJdqOT1yh0/edit

<mscottm> Erich: How about HMDB? - the big metabolite database

<michael> mass spec data: http://www.peptideatlas.org/ (although there isn't much experimental context)

Discussion: directions and data sets of interest...

Joanne - what is the end product? Would it be interesting to look at patient stratification for personalized medicine?

Scott - rather than specific product - meta requirements for such a product...

<joannne_luciano> What if we created a Minimum Information Standard for a EHR/PHR

<joannne_luciano> (if it doesn't already exist)

Scott - example with discussion of EHR - doing an example would lay the groundwork for patient stratification

<egonw_> sorry, cannot join today...

Scott: "patient stratification implies an EHR"

Michael Miller (Insightful): involved in pre-term birth study...intended to make data public, along with EHR and nextgen sequencing data....

<mscottm> Scott: Lot's of interest in data description for the purposes of federation: Kalpana Krishnaswami (Metaome), Paul Rigor, Richard Boyce, Biohackathon participants, etc. + interest in specific datasets (see above).

Biomarkers can pop out without the use of EHR

<joannne_luciano> it's from last year, but this paper is coming to mind... http://www.selventa.com/attachments/publications/early-patient-stratification-is-critical-to-enable-effective-and-personalised-drug-discovery-and-development.pdf

<mscottm> Richard: looking at data such as microarray for biomarkers (without EHR) looks like where it's at, but you can only get so far with it.

<mscottm> Richard: Quality and provenance are very important

Scott: how to combine these things...if some of the named datasets are going to be used by members of the task force, then we should take that dataset along ....
... gives us a better chance of engaging stakeholders needs...
... a problem relevant to clinic and Pharmgx -- how can you assemble the various models (refs to ontologies) in a patient record so as to cover the fundamental needs of a clinic (i.e., genetic clinic)
... geneology file might be linked to hospital system - how to declare privacy requirements
... if talking about mutations, biomarkers, SNPS, etc...and have an unambiguous way to refer to them,

<mscottm> my line was dropped - calling in again

Scott: if you can integrate into HL7 what you would have, is a system that could be integrated into hospital system
... Eric and Michael's work on HL7 might be relevant for providing traction w/ hospitals
... what allot of clinical departments are missing, a way to carry information on genetic markers, geneology, w/ clinical data from one machine to another
... clinical genetics - currently cannot look through patient or family history in an sharable, reusable, cross-platform way
... silos of clinical genetics data
... If I can get XML of that data, transform to RDF, now have a substrate for an exchange language for future platforms.

<mscottm> Richard: I know that personalized medicine requires information from several spaces, including molecular data, drugs, and other BioRDF areas of work.

<mscottm> Richard: Isn't TCGA centered on sequencing the tumors? and the clinical data is in their clinic

<mscottm> Michael: Yes - I'm looking at some right now. :)

<joannne_luciano> is there a link for that dataset?

<mscottm> Michael: Yes, the personal "people" data is all private but there's sequence data, mRNA, and many types of related data that aren't protected.

<mscottm> Richard: The LODD cloud represents a network of chemical identity.

<mscottm> ..you can map the identities to the strings that are used to represent a drug.

<mscottm> ..Then quality of the network of chemical identity and its provenance become important.

Michel: the network of chemical identity is of interest

<mscottm> G+ post about chem identity (from Egon): https://plus.google.com/u/0/103703750118158205464/posts/Ld2cwfJM8Kd

Michel: to his lab -- did some work in TMO
... incorrect reference to semantic types a major problem
... example, gene product vs gene identifier
... source data contains semantic nonsense
... does this get propogated from the source to the linked data
... what responsibility does the linked data group have in identifying the limitations of the source?
... initial work with TMO

<mscottm> I'm still here (that was pc2)

Rich: what work can this group do to contribute to improving quality of data?
... what elements are source issues, linked data developer issues, and user issues?

<mscottm> Richard: Talked about the difficulty of provenance for drug interaction - different sources of selection, DrugBank 1 vs. 2 vs. 3

Michel: we need to clarify the relationship of linked data curators and the source data sets
... where does the responsibility lie when there are known problems with the source data
... provenance of relevance to linked data: where did the data come from, how did you get from the source to the linked data resource

Rich: also, how to share what has been done by the linked data curator -- community developed

<mscottm> Michel: In Bio2RDF, we created a github repository for the scripts

<egombocz> excellent meeting today! - apologies, but I need to sign out due to another upcoming meeting

<mscottm> tx Erich!

Michel: part of our mandate - make it possibly to branch linked data sets

<joannne_luciano> i recommend looking at the conversion tools developed at RPI -- csv2rdf4lod for example

<joannne_luciano> also datafaqs that i mentioned earlier

Scott: also a need in the clinic - how to relate DICOM represented scan to metadata

<joannne_luciano> and the recent work coming out of the provenance working group

<michel> in our experience, it is never the case that you simply take data and naively convert it in order to generate high quality (e.g. well represented) data

<michel> it takes substantial parsing to generate structured data

<joannne_luciano> agree michel - that's not what is being suggested/proposed. it;s to convert and add the appropriate metadata to make quality linked data, not junk rdf

Scott: Daniel Rubin working on links between EHR, DICOM, and genomic data
... basic work - standard way of representing metadata in a DICOM file....proprietary issues

Michael: the framework for running an analysis would be desired

(what kind of analysis?)

Michael: making use of linked data that is available during analysis, how do you represent the analysis for a particular expression data?

e.g, what does it mean to tak RDF , run an analysis, and incorporate the end results....

Michael: can you do that within the context of RDF

(need to go in 1 minute)

<michel> HyQue does just this: http://www.slideshare.net/micheldumontier/hyque-evaluating-scientific-hypotheses-using-semantic-web-technologies

See slide 29

<mscottm> Michel: on slide 29, you can see how results are linked together with the source

<mscottm> Michel: SADI implements the policies of making use of RDF to produce RDF

<mscottm> Thanks to everyone for the valuable input!

- DRAFT -

SV_MEETING_TITLE

14 May 2012

Attendees

Contents

Summary of Action Items

Scribe.perl diagnostic output