This workflow is designed for comparing different species at the pan-genomic level and provides base-level alignment information to identify sequence with synteny (SYN) as well as hyper divergence (HYD). Moreover, according to the orthogroups inferred by OrthoFinder, homologous genes located in collinear regions can be identified.
The workflow is developed based on Snakemake, which comprises several interdependent parts. For instance, when comparing genomes, it is necessary to manually modify the corresponding configuration files.
#? example data direction test/
- raw_genome.config #? for genome masked
- mashTree.config #? for phylogenetic tree
- evolution.txt #? for Progressive-Cactus
- species_A.txt #? for Minigraph-Cactus
- species_B.txt #? Minigraph-Cactus
- Genome alignment
- softmasked genome
- phylogenetic tree
- Progressive-Cactus (aligning reference samples for different species)
- Minigraph-Cactus (aligning same species)
- identity SYN and HYD
- homologous gene in SYN
- OrthoFinder
- identified homoeologous in SYN region
- Python (v3.10.12)
- pandas (v2.2.0)
- pysam (v0.22.0)
- pybedtools (v0.9.1)
- Perl (v5.34.0)
- Snakemake (7.25.0)
- Cactus (6.0.0)
- mashtree (v1.4.6)
- quicktree (v2.5)
- seqtk (v1.4-r122)
#* pull the container
singularity pull GGCW.sif library://zpliu/bioinfomatic/ggcw:v1.0
#* pull the Snakemake pipline
wget -c https://github.com/HZAU-CottonLab/GGCW/archive/refs/tags/v1.0.tar.gz
test Data download
wget https://zenodo.org/api/records/10697234/files-archive
example_data.tar.gz
example test dataexample_result.tar.gz
result for example data
#* pull container
singularity pull GGCW.sif library://zpliu/bioinfomatic/ggcw:v1.0
#TODO run snakemake
#* show pipline
singularity exec GGCW.sif snakemake --cores 1 -np