scr/AllCells/scSorter.Rmd · CorticalOrganoids

---
title: "scSorter"
author: "Nina-Lydia Kazakou"
date: "22/06/2021"
output: html_document
---

Here, I am going to try to annotate the dataset using known celltype markers, through the SC scSorter. 
The scSorter package implements the semi-supervised cell type assignment algorithm described in 
"scSorter: assigning cells to known cell types according to known marker genes". 
This algorithm assigns cells to known cell types, assuming that the identities of marker genes are 
given but the exact expression levels of marker genes are unavailable.

Here is some literature: https://genomebiology.biomedcentral.com/articles/10.1186/s13059-021-02281-7
 & a relevant vignette: https://cran.r-project.org/web/packages/scSorter/vignettes/scSorter.html
 

# load libraries 
```{r message=FALSE}
library(SingleCellExperiment)
library(Seurat)
library(dplyr)
library(ggsci)
library(tidyverse)
library(Matrix)
library(scales)
library(here)
library(scSorter)
```

# Set the colour pallete 
```{r include=FALSE}
mypal <- pal_npg("nrc", alpha = 0.7)(10)
mypal2 <-pal_tron("legacy", alpha = 0.7)(7)
mypal3 <- pal_lancet("lanonc", alpha = 0.7)(9)
mypal4 <- pal_simpsons(palette = c("springfield"), alpha = 0.7)(16)
mypal5 <- pal_rickandmorty(palette = c("schwifty"), alpha = 0.7)(6)
mypal6 <- pal_futurama(palette = c("planetexpress"), alpha = 0.7)(5)
mypal7 <- pal_startrek(palette = c("uniform"), alpha = 0.7)(5)
mycoloursP<- c(mypal, mypal2, mypal3, mypal4, mypal5, mypal6, mypal7)
show_col(mycoloursP, labels =F)
```

# Load normalised seu.object
```{r}
norm.co.seu <- readRDS(here("data", "norm.co.seu.rds"))

dim(norm.co.seu) # 22735 12923

head(norm.co.seu@meta.data)
```

scSorter takes as input data the expression matrix from single-cell RNA sequencing 
and the annotation file that specifies the names of marker genes for each cell type of interest. 

# Create the annotation file that will be used as input for scSorter
```{r}
# The annotation file should contain two columns; "Type" & "Marker"
annotation_df <- data.frame(Type = c("Oligodendroglia", "Oligodendroglia", "Oligodendroglia", "Oligodendroglia", "Oligodendroglia", "Oligodendroglia", "Oligodendroglia", "Oligodendroglia", "Astrocytes", "Astrocytes", "Astrocytes", "oRG", "oRG", "Radial_Glia", "Radial_Glia", "Radial_Glia", "Radial_Glia", "preOPC", "preOPC", "preOPC", "preOPC", "NPCs", "Pericytes", "Inhibitory_Neurons", "Inhibitory_Neurons", "Immature_Neurons", "Early_Neurons", "Early_Neurons", "Mature_Neurons", "Mature_Neurons", "Mature_Neurons", "Mature_Neurons", "Microglia", "Microglia", "CyP", "CyP"), Marker = c("OLIG1", "OLIG2", "SOX10","PDGFRA", "PCDH15", "MBP", "PLP1", "CNP", "AQP4", "S100B", "GFAP", "PTPRZ1", "HOPX", "HES1", "HES5", "VIM", "PAX6", "EGFR", "BCAN", "DLL1", "DLL3","SOX2", "PDGFRB", "GAD1", "GAD2", "TUBB3", "STMN1", "STMN2", "RBFOX3", "MAP2", "DCX", "SYP", "CD68", "CD40", "MKI67", "TOP2A")) 

# A third optional column can also be used; "Weight" 
 ## Weights could be assigned to each marker gene to represent their relative importance during the cell type assignment. 
 ## For an unbiased approach, I won't set any weights for now. Instead I will set the default_weight option to the constant
 ## value of 2.

head(annotation_df)

write.csv(annotation_df, here("outs", file = "scSorter_annotationDF.csv"))
```

# Pre-process the data that will be used as input for the scSorter
```{r}
 # Choose only the Highly Variable Features
top_genes <- head(VariableFeatures(norm.co.seu), 2000)

exp = GetAssayData(norm.co.seu)
 
# Filter out genes with non-zero expression in less than 10% of total cells
top_genes_filt <- rowSums(as.matrix(exp)[top_genes, ]!=0) > ncol(exp)*.1
top_genes = top_genes[top_genes_filt]
```

# Subset the data
```{r}
picked_genes = unique(c(annotation_df$Marker, top_genes))
exp = exp[rownames(exp) %in% picked_genes, ]
```

# Run the scSorter
```{r}
rts <- scSorter(exp, annotation_df, default_weight = 2) #I randomly choose 2 as the constant for the marker weight
```

```{r}
cell_type <- rts$Pred_Type

norm.co.seu@meta.data$sc_sorter_pred_id <- cell_type

DimPlot(norm.co.seu, group.by= "sc_sorter_pred_id", label= TRUE, cols= mycoloursP)
```


# Explore the results 
```{r}
print(table(rts$Pred_Type))
```

# Add scSorter predicted ids to the the seurat object meta.data
```{r}
cell_type <- rts$Pred_Type

norm.co.seu@meta.data$scSorter_predID <- cell_type

 # Plot 
DimPlot(norm.co.seu, group.by= "scSorter_predID", label= TRUE, cols= mycoloursP)

 # Save updated object
saveRDS(norm.co.seu, here("data", file = "norm.co.seu.rds"))
```

```{r}
sessionInfo()
```