-
Notifications
You must be signed in to change notification settings - Fork 2
Crick chan BS
- Participants: Kotone Itaya, Nobuaki Kono, Yuki Yoshida, and Kazuharu Arakawa (Institute for Advanced Biosciences, Keio University)
- Oh I am so tired. How have you been Crick-chan?
- I love you Crick-chan.
- How many stone steps are there in Mt. Haguro?
- What are the recent trends in the topic of "interoperability of biological databases"?
- Who are the leaders in the field of "Semantic Web"?
- What are the recent trends of "Toshiaki Katayama"?
- What is the enzyme regulation of Cas9?
- Can you write an Introduction on the topic of "Semantic Web" for biology?
- PDF Available Here!
Success of the IBM Watson in the quiz show Jeopardy highlighted the potentials of state-of-the-art cognitive computing in answering natural language questions. IBM Watson, however, does not rely so much on semantics nor machine learning, but rather it is primarily based on queries on unstructured data, with statistical identification of answer domains (Lexical Answer Type).
IBM Watson software (DeepQA) is a system to answer a "word" matching the natural language quiz, searching through millions of pages of documents, including the entire text of Wikipedia. A scientific fact or knowledge is almost always written in natural language text in the form of manuscript, use of which is relatively less explored in the semantic web context. Therefore, here we aimed to develop a software system mimicking DeepQA that finds a most relevant "sentence" (as opposed to a "word" in Watson) from millions of scientific documents. Since the software deals with the biological knowledge as a counterpart of "Watson", we named the software "Crick-chan". "-chan" is a postfix added when calling kids' names in Japanese, since our software is still quite immature compared to Watson.
Please see Crick-chan page from BioHackathon 2015 for details about the basic architecture.
Crick-chan now undestands basic biological terms, their ontology, and definition. For example, if a sentence includes the word "Tardigrada", she understands that it is a phylum that is a subclass of "Invertebrates", and is "A phylum of microscopic ecdysozoan invertebrates, closely related to ARTHROPODS. Members exhibit anabiosis and cryptobiosis, dormant states where metabolic activity is reduced or absent, thus making them tolerant to extreme environmental conditions. They are distributed worldwide and most are semi-aquatic."
Crick-chan can identify key words and their trends about a given topic, or a person. This will help you to plan your next research. Based on her survey, she can also write up a short summary on the topic, which you can readily use as a draft of the Introduction section of your next work!
Crick-chan can now learn how to use web-based databases. By showing her the
- URL of database
- Two example query keywords that will produce successful search results
She can "learn" how to use the web database and extracts the information.
No more coding to use different web pages!!
See the autosearch repository for more information.
Crick-chan is accessible at: http://link.g-language.org/crick-chan/
Source code for Crick-chan core API is here: https://github.com/gaou/crick-chan requires G-language Genome Analysis Environment.
- What genes are associated with Alzheimer disease?
- What is G-language Genome Analysis Environment?
- Who is Luke Skywalker married to?
- How does semantic web technologies facilitate life science?
This software uses or is derived from Enju 2.4.2 for CentOS 5.5 for x86_64 software, Enju 2.4.2 for CentOS 5.5 for x86_64 modules, and/or Enju 2.4.2 for CentOS 5.5 for x86_64 itineraries, developed at the Tsujii Laboratory, the University of Tokyo. (c) Copyright 2011 the University of Tokyo
The character "Crick-chan" is created and owned by artist Paperu (ぱぺる).
Image courtesy to: freepictureweb (server room image) and Gfycat (background animation).
Music used in background is "Poppin' Shower" by P*Light.