Skip to content

2. Format your data

T. Latrille edited this page Jun 7, 2023 · 2 revisions

BayesCode requires a tree and an alignment file to run.

I. Alignment file

Your alignment file must follow the Phylip format. In addition, the number of bases should be a multiple of 3 (as it will be interpreted as codons). For example, the following file would be a valid alignment for BayesCode (nodemutsel or mutselomega):

8 6
S0      TCCTGA
S1      AATAGT
S2      GGATTT
S3      AATTCA
S4      CGAAGG
S5      AACGCT
S6      ACGAGT
S7      AATATT

A python3 script to convert Fasta to Phylip is available:

python3 fasta_to_ali.py --input ENSG00000000457_SCYL3_NT.fasta --output ENSG00000000457_SCYL3_NT.phy

II. Tree file

Your tree file must follow the newick format. The tree does not need to have branch lengths. In addition, the leaves of the tree should have the same names as the sequences in your alignment file. For example, the following file would be a valid tree file for BayesCode matching the alignment file above:

((((((((S0,S1),(S2,S3)),(S4,S5),(S6,S7))),(S8,S9),(S10,S11)),(S12,S13),(S14,S15))))

III. Example files

The data folder in the BayesCode root folder contains examples of data files usable with BayesCode. The whole folder can be downloaded here: github.com/ThibaultLatrille/bayescode/releases/download/v1.1.6/data.zip.

Clone this wiki locally