Skip to content

fasta-extractor.pl extracts ORFs from a genomic fasta file based on coordinates in an ID list, generating corresponding sequences from two input files: genomic_fasta-file and id_list-file.

Notifications You must be signed in to change notification settings

vsmicrogenomics/fasta-extractor

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 

Repository files navigation

This Perl script fasta-extractor.pl is designed to extract open reading frames (ORFs) from a genomic fasta file based on the coordinates specified in an ID list file. The script takes two input files, genomic_fasta-file and id_list-file, and generates the corresponding ORF sequences based on the specified coordinates.

genomic_fasta-file is a fasta file containing genomic sequences, each starting with a header line that begins with > followed by the sequence identifier. The sequence itself follows on the next line.

id_list-file is a tab-delimited file containing the following columns: sequence identifier, start position, end position, and frame (+ or -). Each row represents the coordinates of an ORF to be extracted.

The script reads the id_list-file line by line and extracts the corresponding sequences from the genomic_fasta-file using the start and end positions specified in the id_list-file. It then generates the complementary and reverse complementary sequences, depending on the frame, and prints them to the standard output in FASTA format.

Test Files:

The script comes with two test files, genomic_fasta-file and id_list-file, which can be used to test the functionality of the script.

genomic_fasta-file contains two short genomic sequences, seq1 and seq2, with lengths of 30 and 15 nucleotides, respectively.

id_list-file contains two lines, each specifying the coordinates of an ORF to be extracted from the genomic_fasta-file.

Example Usage:

perl fasta-extractor.pl

The script generates the ORF sequences in FASTA format and prints them to the standard output.

If you are using the fasta-extractor.pl script, please cite it as follows: Sharma, V. (2023). fasta-extractor.pl [Perl script]. Retrieved from https://github.com/vsmicrogenomics/fasta-extractor

About

fasta-extractor.pl extracts ORFs from a genomic fasta file based on coordinates in an ID list, generating corresponding sequences from two input files: genomic_fasta-file and id_list-file.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages