- pandas 1.0.5
- biopython 1.77
- Data_filt.py : The python script performs the data cleaning and writes the cleaned data to a new file. This code gets rid of NAs and unknowns from the clinical dataset. The dataset originally contained 4592 samples which after running the python script were 3620.
- Input: The script takes input the GISAID clinical data containing all samples, along with the complete GISAID genomic data for 4592 samples.
- Output: Two files containing filtered samples.