Skip to content

Step 2: BEANIE pipeline

Shreya Johri edited this page Apr 12, 2023 · 3 revisions

Create BEANIE object

To create a new BEANIE object, the following parameters should be specified: counts_path, metad_path, sig_path, normalised, and output_dir. For example:

import beanie as bn
bobj = bn.Beanie(counts_path = "./data/adata_subset.h5ad",
             	     metad_path = "./data/metad.csv",
             	     sig_path = "./data/signatures.gmt",
             	     normalised = True,
                     output_dir = "./beanie_out/")

Additional parameters that can be tuned according to the user’s preferences are -

  • min_cells - QC parameter to remove samples that have less than the minimum cells specified here. The default value is 50, and it is recommended to set this value equal to or higher than this.
  • bins - bool parameter to set whether binning by signature size should be done for generating background gene signatures. The default value is False. If set = True, bin_size can be used for tuning the bin size.

Signature scoring

BEANIE offers multiple signature scoring methods, including the default AUCell-inspired method ("beanie"). Other available methods are the weighted mean ("mean") and z-scoring ("combined-z"). The implementation details can be found in the paper.

To run signature scoring:

bobj.SignatureScoring(scoring_method="beanie")

The no_random_sigs argument can also be provided to tune the number of background signatures used for p-value correction. The default value is 1000, and it is recommended to set this at a value equal to or higher than this.

Differential Enrichment of Gene Signatures

BEANIE’s differential enrichment consists of subsampling, leave-one-out cross-validation, and biological contextualization of p-values. It can be run as follows -

# run the Differential Testing workflow
bobj.DifferentialExpression()

# View the results
bobj.GetDifferentialExpressionSummary()

Note: This is the most time-consuming step, and can take longer than an hour depending on the size of your dataset. A progress bar is displayed to anticipate the expected time.

Gene Importance Ranking

BEANIE can also be used to identify top genes based on their ranking by log2 fold change and robustness to sample exclusion. This feature is particularly useful for gene signatures containing a large number of genes. To run the gene rank workflow:

bobj.RankGenes()
bobj.GetRankGenesSummary()