Skip to content

Latest commit

 

History

History
81 lines (49 loc) · 7.69 KB

README.md

File metadata and controls

81 lines (49 loc) · 7.69 KB

sequencework

Mainly python scripts related to nucleic or protein sequence work

I sorely need to put an index here with links to better guide to the appropriate folders. <== TO DO
(For now look at the title of the folders to try and discern if it is something of interest.)

Descriptions of the scripts are found within README.md files in the sub folders.

Several have demonstrations in sessions served by MyBinder.org from my command line-sequence associated repo; however, probably best to follow guide listed with individual scripts so that you quickly find the right location. If you already know where you are going, you can launch a session via this button:

Binder

Related gist by me

Related 'Binderized' Utilities

Collection of links to launchable Jupyter environments where various sequence analysis tools work WITHOUT ANY NEED FOR ADDITIONAL EFFORT/INSTALLS. Many of my recent scripts are built with use in these environments in mind:

(Many of these include/feature Biopython, too, such as but I haven't made a one all encompassing one yet for that since I use it a lot as an underlying library.)

  • patmatch-binder - launchable Jupyter sessions for running command line-based PatMatch in Jupyter environment provided via Binder (Perl and Python-based).

  • blast-binder - launchable Jupyter sessions for running command line-based BLAST+ in Jupyter environment provided via Binder.

  • Demonstration Jupyter Notebooks for My imporved version ofAdam Bessa's Fasta2Structure - Fasta2Structure-cli - To make it more convenient to use, I've modified the Fasta2Structure script to allow more ways to run it to produce improved_Fasta2Structure (a.k.a Fasta2Structure-cli). It will run on the command line if you supply arguments specifying files as input or fallback to running on the command line if Tkinter cannot connect to a graphical display. improved_Fasta2Structure (a.k.a Fasta2Structure-cli) - User-Friendly Tool for Converting Multiple Aligned FASTA Files to STRUCTURE Format, that is even more user-friendly because it doesn't need a user to select files in a GUI (Tkinter-based) and can thus run well anywhere, such as on a computer cluster or in Jupyter running remotely or in conjunction with software to make pipelines like Snakemake & NextFlow. For those reasons, the improved script is more user-friendly for those familiar with computation and allows scaling up.

  • InterMine-binder - Intermine Web Services available in a Jupyter environment running via the Binder service. (See the guide to getting started with using Intermine sites and Jupyter using MyBinder-served Jupyter notebooks.)

  • mcscan-binder - MCscan software available in a launchable Jupyter environment running via the Binder service (Python 2-based), with an example workflow and some other use examples.

  • mcscan-blast-binder - MCscan and BLAST+ command line software available in a launchable Jupyter environment running via the Binder service (Python 2-based).

  • synchro-binder - SynChro software available in a launchable Jupyter environment running via the Binder service with Quick start and some other illustrations of its use.

  • cl_sq_demo-binder - launchable, working Jupyter-based environment that has a collection of demonstrations of useful resources on command line (or useable in Jupyter sessions) for manipulating sequence files. (Note: THIS WAS STARTED AFTER SEVERAL OTHER DEMO NOTEBOOKS (many meant to be static) MADE FOR SEQUENCE SCRIPTs, and hopefully slowly those will be added to here as well to be available in active form.)

  • clausen_ribonucleotides binder - Analyze ribonucleotide incorporation data from Clausen et al. 2015 data using script plot_5prime_end_from_bedgraph.py.

  • circos-binder - Circos software available in a launchable Jupyter environment running via the Binder service with tutorials illustrating use (TBD)(Perl and Python-based).

Related resources by others

"Install and use genomes & gene annotations the easy way!
genomepy is designed to provide a simple and straightforward way to download and use genomic data. This includes (1) searching available data, (2) showing the available metadata, (3) automatically downloading, preprocessing and matching data and (4) generating optional aligner indexes. All with sensible, yet controllable defaults. Currently, genomepy supports UCSC, Ensembl and NCBI." - Includes an S. cerevisiae example.

See also

My simulated data repo has some useful scripts and resources for generating simulated (mock / fake) sequence data, gene expression data, or gene lists.

My structurework repo - for utilities and code dealing with molecular structures

My proteomicswork repo - for utilities and code dealing with proteomics analysis