src/58_Validation_PDGFRA_PAX3.Rmd · Luise-Seeker-Human-WM-Glia

---
title: "Validation of PAX3 and PDGFRA positive OPCs in BA4, CB and CSC"
author: "Luise A. Seeker"
date: "15/04/2022"
output: html_document
---

# Background

Six BA4 samples and seven CSC and five CB samples were stained using basescope duplex
kits (ACD) and a PDGFRA probe in C1 (blue) and a custom Pax3 probe in C2 (red). 
Qupath was used to select regions of interest with (squares with 500 micrometres 
side lengths) which were saved separately as .tif files. File names were 
blinded and evaluated in qupath in random order. 

One BA4 sample (SD042/18) was of poor quality and was removed. For one CSC
sample, only three fields of view could be selected and the sample is of inferior 
quality (SD024/14).

A cell had to have at least three blue dots to be counted as PDGFRA positive 
OPC, because other cells can express PDGFRA as well (Pericytes) and because
PAX3 expression is sparse and detecting it where a cell is cut trough its middle
is more convincing than where it is cut throught the tip of a process for example.


```{r}

library(ggplot2)
library(lme4)
library(lmerTest)
library(here)
```


```{r}
pax3_data <- read.csv(here("data", 
                   "validation_data", 
                   "Pax3_PDGFRA_basescope_dupl_data.csv" ))

```


```{r}
names(pax3_data)

pax3_data$donor_tissue <- paste(pax3_data$donor_id, 
                                pax3_data$tissue,
                                sep = "_")

pax3_data$tissue <- as.factor(pax3_data$tissue)

pax3_data$pc_pax3_pos_opcs <- (pax3_data$double_pos/
  (pax3_data$double_pos + pax3_data$PDGFRA_pos)) *100 


```


```{r}
ggplot(pax3_data, aes(x = tissue, y = pc_pax3_pos_opcs)) +
  geom_boxplot()+ geom_jitter(width = 0.2, aes(colour = donor_tissue)) +
  theme_bw(19) + ylab("PAX3+PDGFRA+ (%)") + xlab("Tissue")

```

```{r}
ggplot(pax3_data, aes(x = tissue, y = pc_pax3_pos_opcs)) +
  geom_boxplot()+ geom_jitter(width = 0.2, aes(colour = Tissue_mod)) +
  theme_bw(19) + ylab("PAX3+PDGFRA+ (%)") + xlab("Tissue")

```




```{r}

mod <- glmer( double_pos ~ 
                PDGFRA_pos+
                tissue + 
                Tissue_mod + 
                (1| donor_tissue), 
               data = pax3_data, family = poisson(link = "log"))
summary(mod)

anova(mod)
```
```{r}

mod_int <- glmer( double_pos ~ 
                PDGFRA_pos+
                tissue *  Tissue_mod + 
                (1| donor_tissue), 
               data = pax3_data, family = poisson(link = "log"))
summary(mod_int)

anova(mod_int)
```


```{r}
mod_2 <- glmer( double_pos ~ 
                PDGFRA_pos+
                tissue +  
                (1| donor_tissue), 
               data = pax3_data, family = poisson(link = "log"))
summary(mod_2)

anova(mod_2)

AIC(mod_2)



```


```{r}
mod_3 <- glm(double_pos ~ 
                PDGFRA_pos+
                tissue, 
               data = pax3_data, family = poisson(link = "log"))
summary(mod_3)

anova(mod_2, mod_3)



```
# Subset for white matter only

```{r}
wm_dat <- subset(pax3_data, pax3_data$tissue_region == "WM")

ggplot(wm_dat, aes(x = tissue, y = pc_pax3_pos_opcs)) +
  geom_boxplot()+ geom_jitter(width = 0.2, aes(colour = donor_tissue)) +
  theme_bw(19) + ylab("PAX3+PDGFRA+ as % of PDGFRA+)") + xlab("Tissue")


```



```{r}
names(pax3_data)

pax3_data$tissue <- as.factor(pax3_data$tissue)


pax3_data$sample_id <- pax3_data$donor_tissue


```

Convert from wide to long format to see how many single, double and triple
positive cells are in each sample

```{r}

keycol <- "sample_id"
valuecol <- "count"
gathercols <- c("PDGFRA_pos",
                "PAX3_pos", 
                "double_pos")

long_data <- gather(pax3_data, keycol, valuecol, gathercols)

head(long_data)
nrow(long_data)
names(long_data)
```






```{r}

long_data$sample_id <- factor(long_data$sample_id, levels =
                                c("SD008_18_BA4", 
                                  "SD012_15_BA4",
                                  "SD021_17_BA4",
                                  "SD031_14_BA4",
                                  "SD038_15_BA4",
                                  "SD004_12_CB",
                                  "SD011_18_CB",
                                  "SD031_14_CB",
                                  "SD035_15_CB",
                                  "SD046_17_CB",
                                  "SD012_15_CSC",# is that a typo?
                                  "SD015_12_CSC", # is that a typo?
                                  "SD022_13_CSC",
                                  "SD024_14_CSC",
                                  "SD026_16_CSC",
                                  "SD030_12_CSC",
                                  "SD039_14_CSC"
                                  ))

ggplot(long_data, aes(x = sample_id , 
                          y = valuecol, 
                          fill = keycol, 
                          colour = tissue)) +
  geom_bar(position="stack", stat="identity")+ theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1))+scale_fill_manual(values= mycoloursP[25:30])





ggplot(long_data, aes(x = sample_id , 
                          y = valuecol, 
                          fill = keycol)) +
  geom_bar(position="fill", stat="identity")+ theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1)) +scale_fill_manual(values= mycoloursP[25:30])


```
```{r}


long_data$staining <- "PAX3_PDGFRA"

ggplot(long_data, aes(x = staining , 
                          y = valuecol, 
                          fill = keycol)) +
  geom_bar(position="fill", stat="identity")+ 
  theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1)) +
  scale_fill_manual(values= mycoloursP[15:30])

pax3_subs_long <- subset(long_data, long_data$keycol != "PAX3_pos")

ggplot(pax3_subs_long, aes(x = staining , 
                          y = valuecol, 
                          fill = keycol)) +
  geom_bar(position="fill", stat="identity")+ 
  theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1)) +
  scale_fill_manual(values= mycoloursP[16:30])
```


# Results

Basescope staining confirms that there are statistically significantly more PAX3
positive OPCs in the Spinal cord and cerebellum than in the WM underlying the 
primary motor cortex (BA4). 

I also tested an interaction between anatomical (BA4, CB, CSC) and histological 
(GM vs. WM) tissue region and found that in BA4 there are statistically 
significanlty more PAX3 positive OPCs in the grey matter (which were not included
in the snRNAseq part of the experiment), in CB there is no significant difference
between GM and WM but in CSC there are more PAX3 positive OPCs in the WM. 

I fitted donot_tissue as random effect to account for donor variability but also 
sample quality which seems to improve the model fit compared to a linear
model without random effects. 

```{r}

sessionInfo()
```