DeePromoter combines a convolutional neural network (CNN) and a long short-term memory (LSTM). Additionally, instead of using non-promoter regions of the genome as a negative set, we derive a more challenging negative set from the promoter sequences. The proposed negative set reconstruction method improves the discrimination ability and significantly reduces the number of false positive predictions.
Please install torch==1.9 from
You can install others Python dependencies with
pip3 install -r requirements.txt
Current supported dataset is:
- EPDnew : A collection of experimentally validated promoters for selected model organisms. Evidence comes from TSS-mapping from high-throughput expreriments such as CAGE and Oligocapping
Dataset for Human and Mouse had been processed and stored in ./data
Procedure for create negative dataset as described in paper:
Step 1: Break the protein sequence to N part(20 as in the paper)
Step 2: Random choose M part of the original protein to keep it, and random initialize the rest
Step 3: For every training step mix the positive batch with negative batch and perform training
python3 -d data/human/nonTATA/hs_pos_nonTATA.txt --experiment_name human_nonTATA
Early stop had been implement and train will automatically stop when Mathews correlation coefficient is saturated
The results will be saved in to ./output/experiment_name
You can do continue training by pass the path to weight by flag -w or --weight
Prepare your dataset in txt format with each DNA sequence(length 300) on a line
Run inference by
python3 -d data/human/nonTATA/hs_pos_nonTATA.txt -w path_to_weight
Output will be save into file infer_results.txt in the main folder
- In addition to using negative sampling as in the paper described(see Preprocessing) I added a random dataset to help the model generalize.
- The author use grid search to find optimal parameters for the network. I used the final set of parameter from the paper. Kernel size = [27, 14, 7], and maxpooling with kernel = 6
- If Scanpy is useful for your research, consider citing DeePromoter paper