Skip to content

Latest commit

 

History

History
125 lines (83 loc) · 10.3 KB

CONTRIBUTING.md

File metadata and controls

125 lines (83 loc) · 10.3 KB

Contributing to American Gut

The American Gut Project is the largest a crowd-funded science project looking to map the topology of the human superorganisms. One of the goals of the American Gut Project is transparency about data processing and technique development. You can find source code used in American Gut analyses under public revision in the American Gut Repository on Github.

This document covers what you should do to get started contributing to American Gut. You should read this whole document before you consider submitting code to American Gut. This will save time for both you and the American Gut developers.

Types of Submissions

We are looking for submission of new analyses (although it may be a good idea to discuss your analysis with the development team before submitting your pull request), bug fixes, documentation updates, additions, and fixes.

When considering submitting a new analysis to American Gut, you should begin by posting an issue on the American Gut Issue Tracker. The information which needs to be included in this post will differ based on the type of contribution. Your contribution will also need to be tested (discussed below).

  • For new features, describe why the functionality being proposed is relevant. The functionality must be demonstrated as relevant to other users, or other analyses. If appropriate, you may be encouraged to push the functionality to other biocore packages, such as QIIME or scikit-bio.

  • For new analyses, you’ll want to describe the new approach. Explain why you have selected this approach, citing as necessary. Your analysis should be presented as an Jupyter Notebook (see below).

  • For bug fixes, you should provide a detailed description of the bug so other developers can reproduce it. Bugs may relate to errors in code, documentation, test, or the data hosted in the repository.
    You should include the following information in your bug report:

    • The exact command or function call that can be run to reproduce the bug.
    • A link to all necessary input files for reproducing the bug, or a list of samples which produce the bug. This is extremely useful to other developers, and it is likely that if you don’t provide this information, you will not get the response you’re asking for.
      Often, this process will help you understand the bug, as well.
  • For documentation issues, please first post an issue describing what you propose to add, where you’d like to add it, and a description of why it is an important addition. For documentation improvements and fixes, you should post an issue of what is currently wrong or missing, and how you propose to address it.

When you post your issue, the American Gut developers will respond to let you know if we agree with the addition or change. It’s important to go through this step to avoid wasting time working on a feature that will not be included.

Code Review

When you submit code in American Gut, it will be reviewed by one or more American Gut developer. These reviews are intended to confirm a few points:

  • Your code is sufficiently well tested (see Testing Guidelines below)
  • Your analysis adheres to submission guidelines (see Jupyter Notebooks below)
  • Your code and analysis are sufficiently well documented
  • Your update provides relevant changes or additions.

This process is designed to ensure the quality of American Gut submissions, and can provide useful experience for new developers.

For big changes, if you’d like feedback on your code as you work, you should request help in the issue that you created, and one of the American Gut developers will work with you to perform regular code review.

Submitting Code to American Gut

American Gut is hosted on GitHub, and we use GitHub's Pull Request mechanism for accepting submissions. You should go through the following steps to submit code or analyses to American Gut.

  • Begin by creating an issue describing your proposed analysis or change. This should include a description of the proposed change, and a note in the issue documentation that you'd like to work on it. Once you hear back from a maintainer that it's okay to make the change, we will assign the issue to you.
  • Fork the American Gut repository on the Github Website to your Github account.
  • Clone your forked repository to the system where you'll be developing using git clone
  • Ensure that you have the latest version of all the files (especially important if you cloned a long time ago). You should do this by adding American Gut as a remote repository and then pulling from that repository. You'll only need to run the git remote step one time:
git checkout master
git remote add upstream master http://github.com/biocore/american-gut.git
git pull upstream master
  • Create a new topic branch that you will use to make your changes with git checkout -b:
    git checkout -b my-topic-branch

  • Make changes to your branch. You can add them using git add and git commit. Don't forget to update the associated scripts and tests. You should make incremental commits, rather than waiting to make one massive commit. Write a descriptive message for each commit.

  • When you think you're ready to submit your contribution, again insure that you have the latest version of the repository, incase something changed while you were working on your edits.

  • Test your code, to make sure nothing is unexpectedly broken.

  • Once the tests past, you should push your changes to your forked GitHub repository. This can be done through the command line, using the command,

git push origin my-topic-branch
  • Issue a pull request on the GitHub website to requset that we merge your changes. One of the American Gut developers will review your code. If we request changes (which is highly probable), do not issue a new pull request. Your pull request will be updated automatically.

Coding Guidelines

We adhere to the PEP 8 python coding guidelines for code and documentation standards. Before submitting any code to scikit-bio, you should read these carefully and apply the guidelines in your code.

Testing Guidelines

Code submitted to American Gut should be unit tested as much as possible. Tests should be added to the test directory.

Jupyter Notebooks for Analysis

We use Jupyter Notebooks to document and describe analyses performed on American Gut data. There are two primary goals of these notebooks. First, they create a way to reproducibly generate results. They also provide an opportunity for scientific communication and public outreach. Jupyter has the advantage over previous IPython notebooks in that they can accept a variety of backend kernels, including R and Julia.

Audience

Jupyter Notebooks are intended to be read by a broad audience. For instance, they may be useful to individual participants, interested in a particular topic, or members of the scientific community looking for more information about the analysis being performed in a paper.

Notebooks represent a sample calculation for analysis. If a similar set of analysis techniques is applied to the same set of data, multiple notebooks do not need to be produced for each analysis. The results of the notebooks can then be hosted at another location.

We may publish analysis links on the American Gut blog. We recommend providing both documentation for the code used, and logic behind the analysis. A lengthy introduction section, or text describing variables may also be useful.

Index Notebook

Please update the index notebook when a new analysis is added.

Sections

Jupyter Notebooks should more or less follow the outline of a paper.

Credits and License

American Gut analyses are distributed under a BSD license. Please also provide a date of analysis. Typical text is

License: BSD
Copyright: Copyright American Gut Project, 2015

You may also add the following optional information:

  • Author name
  • Author contact (i.e. email, twiter)
  • Last update

Introduction

A brief introduction to the topic addressed in the notebook. This may include theory behind the analysis technique, a brief review of relevant literature and/or a description of the data being analyzed. Scientific citations may be included in this section. The introduction may also be subdivided, as appropriate.

Notebook Requirements

This provides a listing of the current versions of software used to generate the notebook. It is designed to provide a consistent environment. Typically, software compatible with the current version of QIIME is recommended.

Function Imports

Libraries may be imported into Jupyter. It is recommended most code be passed to the notebook, rather than writing functions in the notebook. Separating code from presentation makes it easier to test. It also increases the probability that the code will be reproduced in other environments.

Parameter and File Path Definition

We suggest setting analysis parameters and file paths before the analysis is performed. This clarifies the information, and can help avoid bugs if parameters of file paths need to be changed.

Analysis Steps

The analysis pipeline should then be detailed. We suggest using markdown cells to provide relevant theory about the analysis step, as well as comments in the code cells.

Discussion or Conclusions

The notebook should end with a discussion of the relevant results. Depending on the style of the notebook, it may be more appropriate to do result-by-result discussion. In that case, a short conclusion or prospectus should be provided.

References

Sources should be cited using a Pubmed ID (PMID) or DOI. Citations should be linked to the cited article, if possible.

Getting help with git

If you're new to git, you'll probably find gitref.org helpful.