Skip to content

Latest commit

 

History

History
109 lines (97 loc) · 5.88 KB

workshop-description.md

File metadata and controls

109 lines (97 loc) · 5.88 KB

Computable evolutionary phenotype knowledge: a hands-on workshop

Synopsis

Interested in discovering, linking to, recombining, or otherwise computing with machine-interpretable evolutionary phenotypes extracted from published studies? The Phenoscape project is holding a hands-on workshop on Dec 11-14, 2017, at Duke University in Durham, North Carolina, to foster broader adoption, reuse, and interoperability of its flagship resource, the Phenoscape Knowledgebase (KB; http://kb.phenoscape.org). The KB offers programmable (API) access to natural language phenotype descriptions annotated with formal ontologies so that machines can understand and compute with the semantics of descriptions at scale. Currently the KB includes data extracted from more than 150 comparative vertebrate morphology studies, integrated with similarly augmented gene phenotype data drawn from vertebrate model organism databases. The KB’s API also provides online access to machine reasoning and other computational semantics algorithms for its data content, including synthesizing presence/absence character matrices, and finding evolutionary phenotype transitions semantically similar to gene phenotypes. The event will bring together a diverse group of people to collaboratively design and work hands-on on targets of their interest that take advantage and promote reuse of the KB’s data and services.

Motivation

Efficiently repurposing, integrating, and data mining the vast stores of phenotype data has long been hampered by the limited amount of data accessible online in standard formats, and by the challenges involved with enabling machines to compute with data that is largely recorded in natural language text descriptions. A variety of advances, including knowledge representation technologies, the development of shared domain ontologies, and the curation of large databases of ontology-linked phenotype data, has begun to address these challenges, providing new opportunities for computation-driven data science with phenotype data. For natural biodiversity, a unique resource for data science-enabled phenotype descriptions is the Phenoscape Knowledgebase (KB; http://kb.phenoscape.org). The KB offers online and programmable access to phenotype data from more than 150 comparative morphology studies, focusing on the vertebrate fin-to-limb transition and comparative fish morphology. Using shared ontologies for morphological, spatial, and other requisite domain knowledge, these data are linked to phenotypes reported in genetic perturbation studies for pertinent model organisms (zebrafish and mouse), and to human genetic disease phenotypes. Querying the KB uses machine reasoning to match data by its semantics, and the KB also provides API access to other algorithms based on machine reasoning, such as synthesizing characters and states that are implied by but not expressly asserted in the original studies, and finding evolutionary phenotype transitions semantically similar to gene phenotypes.

Evolutionary phenotype data augmented in this way is still relatively new to the field, and therefore use-cases and applications that effectively exploit the new capabilities are only beginning to emerge. To foster broader adoption, reuse, and interoperability of the data in the KB, as well as the machine reasoning capabilities it implements, Phenoscape is holding a KB Data and Interoperability Codefest. The event will take place Dec 11-14, 2017, at Duke University in Durham, North Carolina, and aims to bring together a diverse group of data scientists, developers, and others interested in trait-oriented data sets, tools and resources.

Scope, Goals, and Objectives

The overall objective of the event is to enable more tools, resources, and researchers to more seamlessly integrate with and take advantage of both the data in the KB, as well as the computational services it can provide. As a consequence, one of the main goals we hope to accomplish is to establish concrete use cases, inform requirements and prioritize needs for future development aimed at lowering the barriers to access to the programmatic and other interfaces to the KB.

We are keeping the scope for target projects at the event that qualify intentionally broad, so as not to limit ideas we may not have thought of. Generally speaking, we expect work targets to take advantage of data in the KB (whether through the API or not), computational services the KB provides, or increase visibility for and to lower barriers to (re)using the data content in some other way. Examples include, but are not limited to the following:

  • Developing links from other online resources to data in the KB
  • Mashing up KB data with other data
  • Integrating KB data and/or services with other tools (whether the necessary API methods exist already or not)
  • Developing documentation for using the KB to build a dataset needed or a certain type of research question
  • Developing visualization (or visualization apps) that involve or describe KB data

In general, work targets that align well with participants’ own professional interests, and thus are more likely to be continued in some way after the event, are the most desirable.

Who should participate

We are looking to assemble a diverse group of people, including developers, data scientists, and researchers from a variety of fields such as evolutionary science, ecology, biodiversity, biomedical sciences, and bioinformatics. Example personas include, but aren’t limited to the following:

  • Software / tool / resource developer
  • Data product developer
  • Biology / biodiversity / ecology scientist
  • Evolutionary medicine researcher
  • Training / documentation specialist

Organizing Team

  • Scott Chamberlain (rOpenSci)
  • Matthew Collins (University of Florida, iDigBio Informatics)
  • Melissa Haendel (Oregon Science & Health University, Monarch Initiative PI)
  • Hilmar Lapp (Duke University, Phenoscape co-PI)
  • Emily Jane McTavish (UC Merced, Open Tree of Life PI)