The goal of this course is to develop a broad and innovative toolbox for reproducible research. It will focus on Python for data collection, handling and analysis, and on Github for collaborative tools and versioning. While Python is very versatile, and many courses on Python exist, we want to focus on applications that are perhaps less straightforward in other progamming languages or statistical packages.
This includes
-
Access to online data sources through API interaction and web scraping;
-
Preparing, handling and analyzing (very) large datasets and databases;
-
Interactive statistics and graphs for teaching and presentation through Python notebooks;
-
Object oriented programming and efficient and scalable coding;
-
A very high-level introduction to machine learning.
Since learning a language requires a lot of practice, the course is set up to be learning by doing. And since this is a completely voluntary course, we jointly learn by doing.
The goal is for everyone to prepare the content at home before the next session. This includes watching some videos and/or additional references, and do some exercises. We hope to keep the workload low, by targetting 2/3 hours of prep work per week at home.
Exercises can be downloaded from the Github page, you can then code up the exercises, and commit your code to the Github repository. This way, all participants' code is visible for the whole group for discussion and to share best practices. In the mean time, we learn how to use Github as a collaborative toolbox.
Don't forget to take along your laptop during the sessions!