From d0d967473c49309343e51e9acdd401d6a63fab8d Mon Sep 17 00:00:00 2001 From: justin-a-sanders <60298590+justin-a-sanders@users.noreply.github.com> Date: Tue, 12 Mar 2024 12:36:34 -0700 Subject: [PATCH] Update README.md --- README.md | 9 ++++----- 1 file changed, 4 insertions(+), 5 deletions(-) diff --git a/README.md b/README.md index 582145eb..76e4adc7 100644 --- a/README.md +++ b/README.md @@ -1,7 +1,7 @@ -# Casanovo +# PSM scoring with Casanovo-DB -This branch of the Casanovo project contains code that implements the Casanovo-DB database search procedure. The preprint version of the paper can be found [here](https://www.biorxiv.org/content/10.1101/2024.01.26.577425v2). Our eventual goal is to provide the full database search functionality as part of Casanovo. For now, however, this branch allows for testing of the methodology by making use of some important functionality available in the Crux mass spectrometry toolkit (http://crux.ms). -You can install this branch (ideally, in an appropriately named Conda environment) using the following command: +This branch of the Casanovo project contains code that implements the Casanovo-DB score function for database search. The preprint version of the paper can be found [here](https://www.biorxiv.org/content/10.1101/2024.01.26.577425v2). Our eventual goal is to provide the full database search functionality as part of Casanovo. For now, however, this branch allows for testing of the methodology by making use of some important functionality available in the Crux mass spectrometry toolkit (http://crux.ms). +You can install this branch using the following command: ``` pip install git+https://github.com/Noble-Lab/casanovo.git@db_search ``` @@ -11,7 +11,7 @@ To use Casanovo-DB, you must also install the Crux toolkit. Given a set of spec Please note that your `.fasta` file cannot contain any 'U' amino acids because it is not in the vocabulary of Casanovo. Replace all occurrences of this character with 'X' to denote a missing amino acid. -2. Identify candidate peptides for each spectrum (be sure to set `top-match` to a very high number): +2. Identify candidate peptides for each spectrum. Be sure to set `top-match` to a very high number so every candidate PSM is considered: - `crux tide-search --output-dir search_results --top-match 1000000 spectra.mgf my_proteome` 3. Extract the candidate peptides from the search results into a format readable by Casanovo-DB (`annotated.mgf`). - `casanovo --mode=annotate --peak_path spectra.mgf --tide_dir_path search_results --output annotated.mgf` @@ -21,7 +21,6 @@ Please note that `spectra.mgf` must contain the `SCANS=` field. 4. Run Casanovo-DB: - `casanovo --mode=db --peak_path annotated.mgf --output casanovo_db_result.mztab` - The resulting file is in mztab format, similar to that produced by Casanovo's `sequence` command, except that there are scores for every candidate peptide against their respective spectrum (pairs as specified in `annotated.mgf`). **_De Novo_ Mass Spectrometry Peptide Sequencing with a Transformer Model**