SEGCECO: Subgraph Embedding of Gene expression matrix for prediction of CEll-cell COmmunication

Introduction

In this work, we present a novel method, SEGCECO, using an attributed graph convolutional neural network approach for predicting cell-cell communication from single-cell RNA seq data. SEGCECO capture the latent and explicit attributes of undirected, attributed graphs constructed from gene expression profile of individual cells. Our proposed method consists of three main steps:

Preprocessing step
Cell-cell communication network (CCN) creation
Applying Graph convolutional network technique

Before applying machine learning techniques, the primary step is to preprocess the data for downstream analysis. Once the data is preprocessed, CCN is constructed using SoptSC method. The last module of our pipeline is our method, SEGCECO, which takes processed sc-RNA seq data and CCN to create an attributed graph dataset and gives the prediction output.

Pipeline of the proposed framework

Pre-processing step

To run pre-processing steps in single-cell RNA sequencing dataset, execute the preprocessing.py from the code folder in project home directory.

Cell-cell communication network (CCN) creation

To create the CCN from pre-processed single-cell RNA sequencing data, execute the CCN.R from the project code folder in home directory. The output of this step is edgelist of the graph which is passed to graphInput.py to perform the label encoding and get the output in the below format:

1 2
1 3
1 4
...

SEGCECO module

This module consists of 2 steps:

Gene selection from pooling layer: The pooling layer in our proposed method consists of selecting genes (with the threshold of 300) by Information Gain feature selection method. Execute the Features.py from the project code folder in home directory. The input to this step is preprocessed single-cell RNA sequencing dataset and the output of this step is attributes_IG.csv in the data folder respective to each dataset.
Run the SEGCECO module - Execute SEGCECO/Main_LinkPredict.py from the project code folder in home directory. The input to this step is edgelist_encoded_HumanD1.txt (CCN) and attributes_IG_HumanD1.csv (explicit attributes) in the data folder respective to each dataset.

Version Requirements for SEGCECO module

python 3.5.5
networkx 2.0
tensorflow 1.7.0
numpy==1.16.3

Dataset Used

Single-cell RNA sequencing dataset

Dataset	Tissue	Accession	#Cells	#Genes
Baron-human1	Human-Pancreas	GSM2230757	1,937	20,125
Baron-human2	Human-Pancreas	GSM2230758	1,724	20,125
Baron-human3	Human-Pancreas	GSM2230759	3,605	20,125
Baron-human4	Human-Pancreas	GSM2230760	1,303	20,125
Baron-mouse1	Mouse-Pancreas	GSM2230761	822	14,878
Baron-mouse2	Mouse-Pancreas	GSM2230762	1,064	14,878

Comparison Study

To evaluate the performance of SEGCECO, we compared it with latent feature methods (i.e. Node2vec, LINE, DeepWalk, SpectralClustering, GAE, VGAE) and state-of-the-art method, WLNM. The code for each methods can be found in Embedding_Methods folder of the project code folder in home directory. The steps to generate embeddings are cited in code/Embedding_Methods/Generate embeddings.txt. The node embedding methods (i.e. Node2vec, LINE, DeepWalk, SpectralClustering) gives the feature representation of nodes in a network as node embedding. These node embedding outputs are stored in Embedding_Results folder in the code folder in home directory. Thus, an additional step (EdgeFeatures.py) is required to learn edge features from node embeddings in order to predict links as a binary classification problem.

Acknowledgements

I would like to express my gratitude to my supervisor, Dr. Luis Rueda, for his assistance and encouragement, Akram Vasighizaker, a PhD student for her collaboration on this project, and the University of Windsor Office of Research and Innovation.

References

Code for "Zhang, Muhan, and Yixin Chen. Weisfeiler-lehman neural machine for link prediction. KDD 2017": https://github.com/KienMN/Weisfeiler-Lehman-Neural-Machine
Code for "Grover, Aditya, and Jure Leskovec. node2vec: Scalable feature learning for networks. KDD 2016.": https://github.com/aditya-grover/node2vec
Code for "Perozzi, Bryan, Rami Al-Rfou, and Steven Skiena. Deepwalk: Online learning of social representations. KDD 2014.": https://github.com/phanein/deepwalk
SoptSC R package: https://mkarikom.github.io/RSoptSC
Code for "Zhang, Muhan, and Yixin Chen. Link prediction based on graph neural networks. Advances in neural information processing systems 31 (2018).": https://github.com/XuSShuai/SEAL-for-link-prediction
Code for "Tang, Jian, et al. Line: Large-scale information network embedding. Proceedings of the 24th international conference on world wide web. 2015.": https://github.com/tangjianpku/LINE
Code for "Kipf, Thomas N., and Max Welling. Variational graph auto-encoders. arXiv preprint arXiv:1611.07308 (2016).": https://github.com/tkipf/gae
Code for Spectral Clustering: https://github.com/lucashu1/link-prediction

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
Results		Results
code		code
data		data
Pipeline.png		Pipeline.png
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SEGCECO: Subgraph Embedding of Gene expression matrix for prediction of CEll-cell COmmunication

About

Releases

Packages

Contributors 3

Languages

sheenahora/SEGCECO

Folders and files

Latest commit

History

Repository files navigation

SEGCECO: Subgraph Embedding of Gene expression matrix for prediction of CEll-cell COmmunication

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages