-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
Challenge 24 - Using transformer models to develop a search engine for datasets, charts, and documentation #9
Comments
Hi all, I am really interested in this project and want to work on it. |
But I don't really understand the deliverable and the data that need to be |
Thanks for your interest in our challenge! The deliverable is as described in the 'Solution' section of the challenge - a free text search box into which a user can type a natural language search question or phrase. Results in the form of answers or links should appear on submission. Users should ideally be asked for feedback on the usefulness of the results. The data would be a mix of our Confluence api (for documentation), our charts api, our data and parameters apis, as well as the data from the search engine itself. During the course of the project we may discover other sources of data - maybe even external ones - that would improve the search results. Existing searches to review for replacement with the new search engine are the ones mentioned under the 'Implementation. Possible milestones' section (see links in last paragraph). Our parameter search is also relevant. We also recommend watching the presentation by one of the mentors, Myranda - the link is given in the 'Additional comment' section of the challenge. We hope this provides enough information for you. Of course we are also interested in your ideas and don't want to be too prescriptive! |
okay, thank you, could give me an estimation of the time you can expect me to work per week? |
Hi, |
Challenge 24 - Using transformer models to develop a search engine for datasets, charts, and documentation
Goal
Develop a natural language search engine to improve the discoverability of ECMWF datasets, graphical products, and documentation using natural language
Mentors and skills
Challenge description
It is difficult for users to find ECMWF data, both when using external and internal searches. This is true even though we have added Google structured data to our dataset pages because we only have limited content and metadata for our datasets.
There is editorial inconsistency in our documentation and a lot of it! The data and charts content has recently been reviewed and rewritten, but it can still be difficult for users to find the documentation they need.
Data/System to use
Solution
A ML-based search engine presents users with a simple free text search box into which they can type natural language search terms and questions. This will then show a list of matching results, selected by the ML search system.
An example user search might be "what data do you have for Oslo rainfall in 1963?"
Consideration should be given to users using other languages to search and read results.
It should be possible to weigh results by, for example, population or proximity to ECMWF.
A possible extra for this project - time permitting - could be to write a Confluence plugin or macro, with parameters for search scope.
Implementation. Possible milestones
We will devise a reference set of questions, based on popular real user enquiries, to test search results before and after implementation.
The search box could be used on the datasets search page, the chart search page and the chart browser search and the support portal and possibly as a replacement for the confluence search features.
Additional comment
We hope to mentor this project in cooperation with Myranda Uselton Shirk at NOAA who provided this following presentation from AMS that greatly inspired this proposal.
The text was updated successfully, but these errors were encountered: