SCITUNA: a novel single-cell data integration approach that combines both graph-based and anchor-based techniques. SCITUNA constructs a graph for each batch to represent intra-batch cell similarities, and a bipartite graph to capture inter-batch similarities. This transforms the integration problem into a many-to-one matching problem, where cells from a query batch are matched with cells from a reference batch. The resulting matches are then used to transform the query cell space to the reference cell space.
- SCITUNA operates directly in the original gene expression space.
- The method introduces a novel batch ordering strategy based on optimal transport cost.
#For more information, please refer to the article which can be found at here.
The five main stages of the SCITUNA workflow: a) preprocessing and normalization, b) dimensionality reduction and clustering, c) construction of intra-graphs and the inter-graph, d) anchor selection, e) integration, and f) visualization of the integration results.
Below are the steps to obtain the results in the paper.
To download the employed datasets, follow these steps:
-
Navigate to the
data
directory:cd data
-
Run the script to download the dataset. The
dataset
argument can be eitherpancreas
,lung
,small_atac_peaks
orsmall_atac_windows
:python get_data.py [dataset]
Example usage:
python get_data.py pancreas
To integrate multiple batches using SCITUNA, run the following command:
python multi_batch_integration.py --i [input_dataset] --b [batch_id] --c [num_cores]
Arguments
--i (input_dataset): The dataset file located in "data/" (supported formats: H5AD).
--b (batch_id): The column name in ".obs" that indicates batch labels for integration.
--c (num_cores): Number of CPU cores to use for parallel processing.
To perform pairwise batch integration using SCITUNA, run the following command:
python pairwise_integration.py --i [input_dataset] --b [batch_id] --c [num_cores]
We provide t-SNE and UMAP plots for a deeper analysis of the results. You can access them through this Google Drive link.