4-template_Processing.Rmd

---
title: "Final Processing"
author: "Jane Velghe"
date: '2022-08-03'
output: pdf_document
---

This is an [R Markdown](http://rmarkdown.rstudio.com) Notebook. When you execute code within the notebook, the results appear beneath the code. 

Try executing this chunk by clicking the *Run* button within the chunk or by placing your cursor inside it and pressing *Ctrl+Shift+Enter*. 
```{r}
library(Seurat)
library(sctransform)
library(ggplot2)
```

```{r}
name <- '/R441P' #change
folder <- '/R441' #change
```


```{r}
#path to project folder
data_path <- c('~/Verchere/jdrf2')
```

```{r}
dir.create(paste0(data_path,folder,name,"/process"))
path <- paste0(data_path,folder,name,"/process")

objects <- paste0(data_path,folder,name,"/objects")
```

```{r}
# Run the standard workflow for visualization and clustering
obj <- SCTransform(obj, verbose = FALSE) #2
obj <- ScaleData(obj, verbose = FALSE)
obj <- RunPCA(obj, npcs = 30, verbose = FALSE)
obj <- RunUMAP(obj, reduction = "pca", dims = 1:30)
obj <- FindNeighbors(obj, reduction = "pca", dims = 1:30)
obj <- FindClusters(obj, resolution = 0.4) 
# You will have to adjust this cluster resolution based on your data and what make biological sense, you may need to do this a few times to get the proper number of clusters. My mentality is it's better to over-cluster than under-cluster as you can always group clusters together that you think are the same cell type. 

UMAPPlot(obj)
ggsave(plot=last_plot(), file = paste0(path,name,"_origID_UMAP.pdf"), height = 8, width = 10)
UMAPPlot(obj, split.by = "orig.ident")
ggsave(plot=last_plot(), file = paste0(path,namee,"_split_origID_UMAP.pdf"), height = 8, width = 30)
saveRDS(obj, file = paste0(path,name,"_processed.rds"))
```

Pull up the UMAP plot of the clusters
```{r}
UMAPPlot(obj)
```

Easy access to change assay types. For differential gene expression down the line, use SCT or RNA assays, never use integrated
```{r}
DefaultAssay(obj) <- "RNA"
DefaultAssay(obj) <- "SCT"
```

Find cluster markers
```{r}
# find markers for every cluster compared to all remaining cells, report only the positive
# ones
cluster.markers <- FindAllMarkers(obj, only.pos = TRUE, min.pct = 0.25, logfc.threshold = 0.25)
cluster.markers %>%
    group_by(cluster) %>%
    slice_max(n = 2, order_by = avg_log2FC)
write.csv(cluster.markers, file = paste0(path,name,"_cluster_markers.csv"))
```

Get information on cluster markers and show it as a heatmap
```{r}
cluster.markers %>%
    group_by(cluster) %>%
    top_n(n = 10, wt = avg_log2FC) -> top10
DoHeatmap(obj, features = top10$gene) + NoLegend()
ggsave(plot=last_plot(), file = paste0(int_path,obj_int_name,"_cluster_marker_heatmap.pdf"), height = 20, width = 30)
```

Label cell types
```{r}
levels(obj)
cell_type <- c("Naive T", "Classical monocyte", "Intermediate monocyte", "CD8+ T", "B", "Non-classical monocytes", "gdT-cell", "NK", "DC")
names(cell_type) <- levels(obj)
obj <- RenameIdents(obj, cell_type)
DimPlot(obj, reduction = "umap", label = TRUE, pt.size = 0.5) + NoLegend()
ggsave(plot=last_plot(), file = paste0(int_path,obj_int_name,"_celltype_UMAP.pdf"), height = 20, width = 30)
```