Skip to content
ktnyt edited this page Jun 17, 2016 · 7 revisions

Crick-chan BS - a question answering system with domain knowledge

  • Participants: Kotone Itaya, Nobuaki Kono, Yuki Yoshida, and Kazuharu Arakawa (Institute for Advanced Biosciences, Keio University)

Crick-chan's Demo queries and Introduction PDF

  • Oh I am so tired. How have you been Crick-chan?
  • I love you Crick-chan.
  • How many stone steps are there in Mt. Haguro?
  • What are the recent trends in the topic of "interoperability of biological databases"?
  • Who are the leaders in the field of "Semantic Web"?
  • What are the recent trends of "Toshiaki Katayama"?
  • What is the enzyme regulation of Cas9?
  • Can you write an Introduction on the topic of "Semantic Web" for biology?
  • PDF Available Here!

Introduction

Success of the IBM Watson in the quiz show Jeopardy highlighted the potentials of state-of-the-art cognitive computing in answering natural language questions. IBM Watson, however, does not rely so much on semantics nor machine learning, but rather it is primarily based on queries on unstructured data, with statistical identification of answer domains (Lexical Answer Type).

IBM Watson software (DeepQA) is a system to answer a "word" matching the natural language quiz, searching through millions of pages of documents, including the entire text of Wikipedia. A scientific fact or knowledge is almost always written in natural language text in the form of manuscript, use of which is relatively less explored in the semantic web context. Therefore, here we aimed to develop a software system mimicking DeepQA that finds a most relevant "sentence" (as opposed to a "word" in Watson) from millions of scientific documents. Since the software deals with the biological knowledge as a counterpart of "Watson", we named the software "Crick-chan". "-chan" is a postfix added when calling kids' names in Japanese, since our software is still quite immature compared to Watson.

Please see Crick-chan page from BioHackathon 2015 for details about the basic architecture.

Key Updates

Understanding of special terms

Crick-chan now undestands basic biological terms, their ontology, and definition. For example, if a sentence includes the word "Tardigrada", she understands that it is a phylum that is a subclass of "Invertebrates", and is "A phylum of microscopic ecdysozoan invertebrates, closely related to ARTHROPODS. Members exhibit anabiosis and cryptobiosis, dormant states where metabolic activity is reduced or absent, thus making them tolerant to extreme environmental conditions. They are distributed worldwide and most are semi-aquatic."

Finding trends and Writing summaries

Crick-chan can identify key words and their trends about a given topic, or a person. This will help you to plan your next research. Based on her survey, she can also write up a short summary on the topic, which you can readily use as a draft of the Introduction section of your next work!

Artificial Intelligence

Crick-chan can now learn how to use web-based databases. By showing her the

  1. URL of database
  2. Two example query keywords that will produce successful search results

She can "learn" how to use the web database and extracts the information.

No more coding to use different web pages!!

See the autosearch repository for more information.

Crick-chan software

Crick-chan is accessible at: http://link.g-language.org/crick-chan/

Source code for Crick-chan core API is here: https://github.com/gaou/crick-chan requires G-language Genome Analysis Environment.

Examples

Crick-chan interface

  1. What genes are associated with Alzheimer disease?
  2. What is G-language Genome Analysis Environment?
  3. Who is Luke Skywalker married to?
  4. How does semantic web technologies facilitate life science?

Acknowledgements

This software uses or is derived from Enju 2.4.2 for CentOS 5.5 for x86_64 software, Enju 2.4.2 for CentOS 5.5 for x86_64 modules, and/or Enju 2.4.2 for CentOS 5.5 for x86_64 itineraries, developed at the Tsujii Laboratory, the University of Tokyo. (c) Copyright 2011 the University of Tokyo

The character "Crick-chan" is created and owned by artist Paperu (ぱぺる).

Image courtesy to: freepictureweb (server room image) and Gfycat (background animation).

Music used in background is "Poppin' Shower" by P*Light.