HCLSIG BioRDF Subgroup/Tasks/Reagents/Status Reports/2006-04-17
This note describes first attempts at translating the Alzforum Antibody Directory in to RDF.
Science Background: http://en.wikipedia.org/wiki/Antibody
Goal of this round: Review the contents of the database, start building a model (ontology), translate the database using it, see what problems arise.
Technology: I am targeting OWL, using custom tools written in a java-based common lisp which use the Pellet libraries.
Selected Issues:
- Namespace - what form of URI's to use for the various entities
- http://www.w3.org/2001/sw/hcls/ontologies/reagent.owl# for the ontology
- http://www.alzforum.org/res/com/ant# for the Alzforum specific fields
- Model - How to model the various entities
- The antibody isotype (e.g. IgG versus IgM), light versus heavy chain
- Experimental Methods that this antibody can be used for (http://www.alzforum.org/res/com/ant/glossary.asp) Note there are implicit relations between methods that should be modeled (e.g. more and less general terms)
- Sample Preparations - e.g. frozen sections, cells etc.
- Construction - how the antibody was created
- Epitopes - the part of the protein the antibody binds to
- Source - how the antibody was created, and in which species
- Reactivity/Specificity - what forms of the protein (e.g. phosphorylated) in which species (and which not)
- Connecting to existing ontologies, vocabularies - which to use
- Species: NCBI taxonomy?
- Gene: lsid? ncbi entrez URL?
- Methods: PSI-MI insufficient, SNOMED?
- Epitope: Sequence ontology? - but no way to specify e.g positional range of a sequence
- Company information - foaf? vcard?
- Parsing - Important information is in a restricted but varying subset of natural language
- Gene name is sometimes a standard name, sometimes not (e.g. embedded greek letter xml entities)
- Epitope descriptions (a variety of sorts of description)
- Reactivity/Specificity (mostly species, but with negative information, such as not mouse)
- Applicable experimental methods (some methods were not in glossary). May be list, may include negative information
- Method Glossary (manipulated in emacs to lisp form -> generate owl from it)
Files
- antibody-db-dump-2006-04-16.txt.gz - Source spreadsheet from AlzForum (Thanks Colin!) **
- antibody1.owl.gz - Just the basics Annotation properties for some of the source fields as literals. A bit of modelling. **
- parsed-epitopes.txt - An initial attempt to take apart the epitope descriptions **
- Media:HCLSIG_BioRDF_Subgroup$$Tasks$$Reagents$$Status_Reports$$2006-04-17$antibody.lisp - Very messy, but that's real life. Next version will be cleaner.
- * Note that these files are not available publicly for the moment, but are available on request from mailto:alanr-w@mumble.net