Please use plain English where possible. Try not to assume specialist knowledge or use dense technical terminology without brief explanation.
Please use third person impersonal, e.g. "this group aims to" not "we aim to" etc.
Aim for 6 words or fewer. A clear description of the project, can be different to official academic name of the project, please avoid acronyms/initialisms
"The Turing Way": A handbook for reproducible data science
Project leader 1: Kirstie Whitaker, The Alan Turing Institute
Contact name: Kirstie Whitaker (add email)
Project start date: November 2018
1 sentence, present tense, e.g. Using…, Developing…, Investigating…
Developing a handbook for best practice in academic data science.
Clear, concise, ~3 sentences – e.g. 1st sentence: the problem being addressed, 2nd sentence: the potential solution/method, 3rd sentence: applications, output
Reproducible research is work that can be independently verified. In practice, it means sharing the data and code that were used to generate published results - yet this is often easier said than done. The Turing Way is a guide to reproducible data science that will support students and academics as they develop their code, with the aim of helping them produce work that will be regarded as gold-standard examples of trustworthy and reusable research.
What is the work hoping to achieve? What would define success? Why is this work worth doing? 100-300 words
In the ideal case, all published results should be independently verifiable and suitable for other researchers to build upon. For this to happen, the data and code that support the publication need to be made available in an easy-to-use and open format.
Sharing these research outputs means understanding data management, library sciences, software development, and continuous integration techniques: skills that are not widely taught or expected of academic researchers and data scientists.
The Turing Way is a handbook to support students, their supervisors, funders and journal editors in ensuring that reproducible data science is "too easy not to do". It will include training material on version control, analysis testing, and open and transparent communication with future users, and build on Turing Institute case studies and workshops.
Is there theory or methods that would be good to explain to understand the project’s work better? Use plain English where possible. 100-300 words
[Can we skip this section? Check with comms.]
Where is this work being applied, what area/industry could it benefit? 100-300 words
The Turing Way will support everybody involved in data science research: the developers of the code (research engineers, postdocs and doctoral students), their supervisors and the business team members who coordinate these projects. The format will be easy for the reader to dip in and out of, depending on their level of experience in the various topics. The project will help to answer questions that researchers don't always ask: "How do I ensure that my code's existing functionality doesn't change as I extend the codebase?", "How do I make my project easy for someone else to run?", and many more.
Senior team members - Turing fellows, program directors and managers - will be catered for with key points tailored towards managing reproducible research projects highlighted for each topic covered. The project will build and curate checklists for what can be done to ensure all project outputs are reproducible. A chapter on Binder will be of interest to supervisors who want to regularly review their students' code, and will include the technical details of how to set up a BinderHub that will be useful for research software engineers.
Achievements/project milestones reached since project started, with month/year
The Turing Way team will host three workshops in March 2019, all focussed around Binder, an easy-to-use service that runs version-controlled computational environments.
- Friday 1st March, University of Manchester: Boost your research reproducibility with Binder
- Tuesday 12th March, The Alan Turing Institute: Boost your research reproducibility with Binder
- Monday 18th March: University of Sheffield: Build a BinderHub
During this free workshop we will discuss reproducible computing environments, show examples of others’ projects in myBinder.org and help you learn how to prepare a Binder-ready project. At the end of the workshop you will be able to take some of your own content (in a R or Jupyter notebook, or scripts that can be run in the terminal) and prepare it so that it can be used by others on myBinder.org.
This workshop is for people who are:
- Interested in reproducibility, containers, Docker or continuous integration;
- Already familiar with R Markdown or Jupyter notebooks;
- Looking to communicate their research more effectively.
During this free workshop we will demonstrate how to build your own BinderHub on Microsoft Azure cloud computing resources. We will help you get started with building a BinderHub on your institution's computing platform and discuss the challenges of maintaining a BinderHub. At the end of the workshop you will know why this would be a useful resource for your team, and will know where to look for help and support building your institution's BinderHub.
This workshop is for Research Software Engineers and IT staff who are:
- Interested in reproducibility, containers, Docker or continuous integration;
- Already familiar with Binder and R Markdown or Python for data science;
- Interested in setting up their own local BinderHub.
Please include titles and affiliations for all participants
- Dr Rachael Ainsworth, University of Manchester
- Becky Arnold, University of Sheffield
- Dr Louise Bowler, The Alan Turing Institute
- Dr Sarah Gibson, The Alan Turing Institute
- Patricia Herterich, University of Birmingham
- Rosie Higman, University of Manchester
- Dr Anna Krystalli, University of Sheffield
- Alexander Morley, University of Oxford
- Dr Martin O’Reilly, The Alan Turing Institute
- Dr Kirstie Whitaker, The Alan Turing Institute
Please include their roles as part of the project, e.g. funder, collaborator, data supplier etc
N/A
If there is any additional content that should be included, or doesn’t fit in other fields, please add it here. This could include links to images, videos, or figures (with plain English captions) that would be helpful in communicating the project
This project is openly developed; any and all questions, comments and recommendations are welcome at our GitHub repository.
To hear about our events and monthly project updates, # to our newsletter.
This is an extra piece of content added to the pages of projects that the Research Engineering Group is involved with
Members of the Research Engineering Group at the Turing are contributing their expertise to this project.
They are working a set of guidelines for research software engineers and others who want to set up a BinderHub of their own, and will set up an internal BinderHub for the use of Turing staff and students with increased compute capabilities over the current public offering. In addition, they are contributing to several projects that will feature as case studies in The Turing Way.
The group's experience in best practices for software engineering and data science will be captured in the guidance set out in The Turing Way.