Mutable provides tools and scripts for self-hosting a web platform for searching, analysis and visualization for De novo variants (DNVs).
Follow the instructions to install Docker
Clone the repository
git clone https://github.com/ShenLab/mutable-sh; cd mutable-sh
For generating databases for extended data, R is required. And following R packages are required:
- bio3d
- dplyr
- DBI
- stringr
- RSQLite
For self-hosting using example datasets, download the data from here. And unzip instance.zip
Make sure all other databases (dnvs.sqlite, genes.sqlite, samples.sqlite, distance.sqlite, constraint.sqlite, plddt.sqlite, users.sqlite) are in the directory mutable-sh/instance
. Inside mutable-sh/instance
, create a new file config.py
with SECRET_KEY=$YOUR_KEY
, replace with your own secret key for flask to secure session data for the web app, or you can create a random string using python -c 'import os; print(os.urandom(12))'
.
We use Docker to host and deploy Mutable. To self-host Mutable on user's end, under /mutable-sh
, do docker build -t mutable:latest .
to build the Docker container image. And then use docker run -v $INSTANCE_DIRECTORY:/mutable/instance -p 8000:8000 mutable:latest -b 0.0.0.0 "mutable:create_app()"
to run Mutable. Replace the $INSTANCE_DIRECTORY
with the absolute path to the /instance folder. You can modify the docker image tag if needed.
7 databases and 1 additional dataset folder are required to host Mutable. Example data can be retrieved here. For extending unpublished data for DNVs and samples, under mutable-sh/instance
modify the dnvs.sqlite
, samples.sqlite
, and distance.sqlite
accordingly.
Our published data for de novo varaints can be retreived here. It contains the curated annotated data for published DNVs. See dnvs.sql for the schema required for creating and importing the database for DNVs. The schema shows the required attributes for variants data need for Mutable. You can add more fields if needed. We annotate the de novo varaints using the annotation pipelines here
It contains sample-level data for published variants. See samples.sql for the format required for creating the database for samples. The schema shows the required attributes need for sample-level data for Mutable. You can add more fields if needed.
Our example data for pairwise distance data for the published DNVs can be found here.
The pairwise 1D and 3D distances data are stored in this database. We calculate the spatial distance using the script here. First unzip the downloaded UP000005640_9606_HUMAN_v4.zip
, then run the script to generate the pairwise distance between variants. Modify the paths in the script if needed.
The default script takes 4 arguments: paths for the dnvs.sqlite
, genes.sqlite
, UP000005640_9606_HUMAN_v4
, and distance.sqlite
respectively. Make sure all the packages required by R are installed. Then under /instance
run the script with Rscript ../protein_link.R dnvs.sqlite genes.sqlite UP000005640_9606_HUMAN_v4 distance.sqlite
. The distance.sqlite
will be updated or created if not previously exist.
Other databases can also be retreived here.
The users.sqlite
in the example data only contain access for the guest users. If you want to host Mutable and authorize registration access to the complete data, add the user email address as new username to users.sqlite
following user.sql. And then the authorized users will be able to register with the username and their own password on Mutable. Passwords will be hashed for credentials safety.
constraint.sqlite
, genes.sqlite
, and plddt.sqlite
do not require modification as they contain complete curated gene-level information for all human genes. You can also modify the databases if needed.
On the gene-centered page, variant-level annotation data for computational estimates and selection coefficients are selected to display on the gene-centerd page. You can modify config.json to specify data columns to show in the table.