Skip to content

Directory tree

Clara Qin edited this page Nov 11, 2019 · 5 revisions
├── data  # processed data - can be read into R using readRDS()
|   |
|   ├── NEON_ITS_phyloseq_DL08-13-2019.Rds  # phyloseq object based on NEON ITS sequences
|   |                                       # which were downloaded on DL[date]
|   ├── NEON_ITS_seqtab_nochim_DL08-13-2019.Rds  # sequence table with chimeras removed
|   |
|   └── NEON_ITS_taxa_DL08-13-2019.Rds  # taxa table based on UNITE database
|
├── raw_data  # NONE OF THIS DIRECTORY IS PUSHED TO GITHUB - ACCESS ON SERVER (see below)
|   |
|   ├── sequence_metadata  # metadata for linking ITS sequence data to soil and site data
|   |
|   ├── tax_ref  # taxonomic reference tables to match sequences with taxonomy
|   |   └── sh_general_release_dynamic_02.02.2019.fasta
|   |
|   └── Illumina  # raw fastq files from Illumina sequencing
|       ├── DoB
|       |   └── ITS
|       |       ├── Run1  # contains raw fastq files
|       |       ├── Run2  # contains raw fastq files
|       |       └── Run3  # contains raw fastq files
|       |   
|       └── NEON
|           ├── 16S  # contains raw fastq files
|           └── ITS  # contains raw fastq files, and directories with processed files:
|               ├── filtN  # after filtering out reads containing ambiguous bases ("N")
|               └── cutadapt2  # after removing primers/adapters
|                   └── filtered  # after passing quality filter
|                 
|
└── code  # if running R scripts in RStudio, set working directory
    |     # to be the git root directory (e.g. "NEON_DoB_analysis"), 
    |     # not "code" subdirectory
    |
    ├── utils.R  # contains various functions including one which downloads all
    |            # NEON raw microbial sequence data
    |
    ├── dada2_workflow.R  # follows https://benjjneb.github.io/dada2/ITS_workflow.html
    |                     # to process NEON ITS raw sequence data
    |
    └── dada2_to_phyloseq.R  # assembles outputs of dada2_workflow.R, plus soil data
                             # and sequence metadata, to create phyloseq object