Skip to content

HZAU-CottonLab/GGCW

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Graph genome comparison workflow

GGCW

This workflow is designed for comparing different species at the pan-genomic level and provides base-level alignment information to identify sequence with synteny (SYN) as well as hyper divergence (HYD). Moreover, according to the orthogroups inferred by OrthoFinder, homologous genes located in collinear regions can be identified.

Work flow

The workflow is developed based on Snakemake, which comprises several interdependent parts. For instance, when comparing genomes, it is necessary to manually modify the corresponding configuration files.

#? example data direction test/
- raw_genome.config  #? for genome masked
- mashTree.config  #? for phylogenetic tree
- evolution.txt #? for Progressive-Cactus
- species_A.txt #? for Minigraph-Cactus
- species_B.txt #? Minigraph-Cactus

Table of contents

  • Genome alignment
    • softmasked genome
    • phylogenetic tree
    • Progressive-Cactus (aligning reference samples for different species)
    • Minigraph-Cactus (aligning same species)
    • identity SYN and HYD
  • homologous gene in SYN
    • OrthoFinder
    • identified homoeologous in SYN region

Requirements

  • Python (v3.10.12)
    • pandas (v2.2.0)
    • pysam (v0.22.0)
    • pybedtools (v0.9.1)
  • Perl (v5.34.0)
  • Snakemake (7.25.0)
  • Cactus (6.0.0)
  • mashtree (v1.4.6)
  • quicktree (v2.5)
  • seqtk (v1.4-r122)

Install

Used in container

#* pull the container
singularity pull GGCW.sif library://zpliu/bioinfomatic/ggcw:v1.0

#* pull the Snakemake pipline
wget -c https://github.com/HZAU-CottonLab/GGCW/archive/refs/tags/v1.0.tar.gz

Usage example

test Data download
wget https://zenodo.org/api/records/10697234/files-archive

  • example_data.tar.gz example test data
  • example_result.tar.gz result for example data
#* pull container
singularity pull GGCW.sif library://zpliu/bioinfomatic/ggcw:v1.0
#TODO run snakemake
#* show pipline
singularity exec GGCW.sif snakemake --cores 1 -np