bit-by-bit/session 04 - collaborative work and reproducible research.md at master · michaelschiltz/bit-by-bit · GitHub

Collaboration and openness as a paradigm; the threat of fraudulent science

Linus's law: "Given enough eyeballs, all bugs are shallow"; and The Cathedral and the Bazaar.
open source and open access
crowdsourcing and the success of wikipedia
importantly, and possibly counterintuitively, openness and transparancy have become a means for incentivizing quality rather than sloppiness;
however, note predatory open access publishing
professional editors, too, can be fooled: SciGen and MathGen; for a history of hoaxes, see the wikipedia-page
retraction watch
pitfalls of popular, proprietary formats:
- Fowler, Dan. 2017. “Excel Is Threatening the Quality of Research Data — Data Packages Are Here to Help.” February 22. http://blogs.lse.ac.uk/impactofsocialsciences/2017/02/22/excel-is-threatening-the-quality-of-research-data-data-packages-are-here-to-help/.
- Strasser, Carly. 2015. “Introduction to Open Science: Why Data Versioning and Data Care Practices Are Key for Science and Social Science.” February 9. http://blogs.lse.ac.uk/impactofsocialsciences/2015/02/09/data-versioning-open-science/.

Tools and packages

general: Github
- Perkel, Jeffrey. 2016. “Democratic Databases: Science on GitHub.” Nature News 538 (7623): 127.
- Why we need a GitHub for science
- and why don't you try it out yourself?
collaborative learning
- Stack Overflow
- Code Project
- A Japanese alternative: Qiita
collaborative writing:
- from Google docs to Coda - "we need a doc that can keep up with today’s super-collaborative world."
- Overleaf, ShareLatex; also, note the existence of the ctan-pages, a library for packages you may need to install when using LaTex.
- Authorea for collaborative writing
- Fidus Writer and Manuscripts.io
- these days, google docs too has a nice LaTex (or more accurately, MathJax) integration: Auto-LaTeX-equations
the rise of dynamic documents
- knitr with R and Latex: "The knitr package was designed to be a transparent engine for dynamic report generation with R, solve some long-standing problems in Sweave, and combine features in other add-on packages into one package (knitr ≈ Sweave + cacheSweave + pgfSweave + weaver animation::saveLatex + R2HTML::RweaveHTML + highlight::HighlightWeaveLatex + 0.2 * brew + 0.1 * SweaveListingUtils + more)."
- Overleaf with CodeOcean; compare this (rather long) case-study
- comparable to the aforementioned Ctan-pages, there exists a library for R-packages, which is called Cran.
- Jupyter: "Project Jupyter is an open source project was born out of the IPython Project in 2014 as it evolved to support interactive data science and scientific computing across all programming languages. Jupyter will always be 100% open source software, free for all to use and released under the liberal terms of the modified BSD license".
- Since November 2017, Google made public its internal tool for data science and machine learning workflow. It's called Colaboratory, and it is based on the Jupyter notebook environment mentioned earlier. The difference is that Colaboratory allows you to use and share Jupyter notebooks with others without having to download, install, or run anything on your own computer other than a browser.
- Stencila: "The calls for research to be transparent and reproducible have never been louder. But today's tools for reproducible research can be intimidating - especially if you're not a coder. We're building software for reproducible research with the intuitive, visual interfaces that you and your colleagues are used to."
- Julia and Weave.jl: is this the 'dynamic documents' of the (immediate) future?
code editors
- Sublime Text: "Sublime Text is a sophisticated text editor for code, markup and prose. Sublimetext has Zotero-support. You'll love the slick user interface, extraordinary features and amazing performance."; for chromebook users, see Caret and Zed.
- I am a fan of Atom, developed by the good people at GitHub.
- A free, although not open-source, alternative to Atom, is VS Code
data sharing: figshare, zenodo, and Dryad; OpenAire, and Harvard's DataVerse-project
- as an important player, Nature has started to take the reproducibility very seriously.
- the quest for transparency may even change the process of scientific publishing: see, for instance, ScienceOpen, "a professional networking platform for scholars to enhance their research in the open, make an impact, and receive credit for it. We provide context building services for publishers, to bring researchers closer to the content than ever before."
data management tool
- data management plan tool
a case study: data science with Python
- Jupyter project documentation
- A whirlwind tour of Python (by Jake VanderPlas)
- The Python Data Science Handbook, also available as a Colab notebook (also by Jake VanderPlas)
- Getting started with Jupyter Notebooks for Python, i.e. working with Jupyter Notebooks locally (installation either with pip or Anaconda).
- Yamamoto Jun gave us a nice example of how to implement Julia in Jupyter-notebooks

To watch:

14mech14. 2017. LaTeX Tutorial 10a: LaTeX + R, Knitr. Accessed April 10. https://www.youtube.com/watch?v=LrWBHqN3TUE.
CEBM Oxford. 2017. John Ioannidis - Why Most Clinical Research Is Not Useful. Accessed April 10. https://www.youtube.com/watch?v=Uok-7NPFn4k.
Talks at Google. 2017. John Ioannidis: “Reproducible Research: True or False?” | Talks at Google. Accessed April 10. https://www.youtube.com/watch?v=GPYzY9I78CI.

Further reading

Aruoba, S. Borağan, and Jesús Fernández-Villaverde. 2014. “A Comparison of Programming Languages in Economics.” 20263. http://www.nber.org/papers/w20263.
Balbaert, Ivo. 2015. Getting Started with Julia Programming Language. Birmingham: Packt Publishing - ebooks Account.
Balbaert, Ivo, Avik Sengupta, and Malcolm Sherrington. 2016. Julia: High Performance Programming. 1 edition. Packt Publishing.
Chambers, Chris. 2017. The Seven Deadly Sins of Psychology: A Manifesto for Reforming the Culture of Scientific Practice. Princeton, NJ: Princeton University Press.
Colquhoun, David. 2017. “The Reproducibility Of Research And The Misinterpretation Of P Values.” bioRxiv, June, 144337. doi:10.1101/144337.
Colquhoun, David. 2016. “It’s Time for Science to Abandon the Term ‘Statistically Significant’ – David Colquhoun | Aeon Essays.” Aeon. Accessed June 22, 2020. https://aeon.co/essays/it-s-time-for-science-to-abandon-the-term-statistically-significant.
Gandrud, Christopher. 2015. Reproducible Research with R and R Studio, Second Edition. 2 edition. Boca Raton: Chapman and Hall/CRC.
Kitzes, Justin, Daniel Turek, and Fatma Deniz, eds. 2017. The Practice of Reproducible Research: Case Studies and Lessons from the Data-Intensive Sciences. University of California Press.
Rule, Adam, Amanda Birmingham, Cristal Zuniga, Ilkay Altintas, Shih-Cheng Huang, Rob Knight, Niema Moshiri, et al. 2019. “Ten Simple Rules for Writing and Sharing Computational Analyses in Jupyter Notebooks.” PLOS Computational Biology 15 (7): e1007007. https://doi.org/10.1371/journal.pcbi.1007007.
Somers, James. 2018. “The Scientific Paper Is Obsolete.” The Atlantic, April 5, 2018. https://www.theatlantic.com/science/archive/2018/04/the-scientific-paper-is-obsolete/556676/.
Stodden, Victoria, Friedrich Leisch, and Roger D. Peng, eds. 2014. Implementing Reproducible Research. Boca Raton: Chapman and Hall/CRC.
Xie, Yihui. 2015. Dynamic Documents with R and Knitr, Second Edition. 2 edition. Boca Raton: Chapman and Hall/CRC.