-
Notifications
You must be signed in to change notification settings - Fork 1
Step 2: BEANIE pipeline
To create a new BEANIE object, the following parameters should be specified: counts_path
, metad_path
, sig_path
, normalised
, and output_dir
. For example:
import beanie as bn
bobj = bn.Beanie(counts_path = "./data/adata_subset.h5ad",
metad_path = "./data/metad.csv",
sig_path = "./data/signatures.gmt",
normalised = True,
output_dir = "./beanie_out/")
Additional parameters that can be tuned according to the user’s preferences are -
-
min_cells
- QC parameter to remove samples that have less than the minimum cells specified here. The default value is 50, and it is recommended to set this value equal to or higher than this. -
bins
- bool parameter to set whether binning by signature size should be done for generating background gene signatures. The default value is False. If set = True,bin_size
can be used for tuning the bin size.
BEANIE offers multiple signature scoring methods, including the default AUCell-inspired method ("beanie"). Other available methods are the weighted mean ("mean") and z-scoring ("combined-z"). The implementation details can be found in the paper.
To run signature scoring:
bobj.SignatureScoring(scoring_method="beanie")
The no_random_sigs
argument can also be provided to tune the number of background signatures used for p-value correction. The default value is 1000, and it is recommended to set this at a value equal to or higher than this.
BEANIE’s differential enrichment consists of subsampling, leave-one-out cross-validation, and biological contextualization of p-values. It can be run as follows -
# run the Differential Testing workflow
bobj.DifferentialExpression()
# View the results
bobj.GetDifferentialExpressionSummary()
Note: This is the most time-consuming step, and can take longer than an hour depending on the size of your dataset. A progress bar is displayed to anticipate the expected time.
BEANIE can also be used to identify top genes based on their ranking by log2 fold change and robustness to sample exclusion. This feature is particularly useful for gene signatures containing a large number of genes. To run the gene rank workflow:
bobj.RankGenes()
bobj.GetRankGenesSummary()