r-data-visualization-lemurs-rescue / Lemurs Rescue.Rmd
Lemurs Rescue.Rmd
Raw
---
title: "Lemurs Rescue"
---

```{r setup, include=FALSE}
library(tidyverse)
library(colorspace)
knitr::opts_chunk$set(echo = TRUE)
```

```{r message = FALSE}
lemurs <- readr::read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2021/2021-08-24/lemur_data.csv')

lemurs
```

More information about the dataset can be found here:
https://github.com/rfordatascience/tidytuesday/tree/master/data/2021/2021-08-24 and https://www.nature.com/articles/sdata201419.


**Question:** 

1. Does being a current resident in DLC contribute to higher life expectancy or increased offspring?
2. How does weight relate to life expectancy? How does this differ by species? How does the weight also contribute to number of offspring?

**Introduction:** 

Our purpose for this project is to analyze and explore ways we can save the most endangered mammal species in our planet, Lemurs. We will be exploring the `lemurs` dataset (collected 2014 by Zehr, Roach, et al) which entails 82,609 observations of Lemurs living in Duke Lemur Center (DLC) including 54 attributes/variables detailing age, sex, species, life expectancy, number of offspring, etc. 

Our first question investigates whether taking in lemurs in DLC increases their life expectancy and no. of offspring (i.e. is there a significant change vs. living in the wild). The second question investigates whether weight has a relationship with both life expectancy and no. of offspring. We will only be considering the following attributes/variables to answer our questions:

- The independent variables `weight_g` (describing lemur weight), `taxon` (describing lemur species) and `current_resident` (denoting whether the Lemur is a current resident at DLC)
- The dependent variables `age_max_live_or_dead_y` (maximum age reached by living or dead Lemurs by the time of the study) and `n_known_offspring` (total number of offspring known to have been reproduced by the Lemur)

**Approach:**

To start our analysis, we will follow the following steps:

1. Clean up the dataset by selecting the needed columns
2. Creating 2 boxplots to help answer question 1. The reason for this plot choice is that a boxplot plot will clearly show both the mean values of all categorical distributions (so they can be compared together) as well as give an idea about the variability of each distribution which would solidify our conclusion. 
3. Creating 2 linear regression plots with confidence intervals for the relationship between weight and life expectancy faceted by species, and the relationship between weight and number of offspring faceted by species to answer question 2. The reason for this choice of plot is that fitting a single variable regression model is best shown by a line plot (including confidence limits) overlayed over a scatter plot showing how the data looks. Having plots overlayed this way gives us an insight into whether the model is one that we can derive conclusions from or is it falsely representing reality. Again, this enables us to drive more solid conclusions.

**Analysis:**

```{r }

# Question 1 - Box Plots

# Life Expectancy vs. Residency in DLC
plot_1_1 <- lemurs %>%
  select(current_resident, age_max_live_or_dead_y, n_known_offspring) %>%
  na.omit() %>%
  ggplot(aes(current_resident, age_max_live_or_dead_y)) +
  geom_boxplot(fill = "#56B4E9") +
  scale_x_discrete(
  name = "Current Resident", # x-axis name
  ) +
  scale_y_continuous(
  name = "Life Expectancy" # y-axis name
  ) +
  ggtitle("Life Expectancy vs. Residency in DLC")+ 
  theme_bw(12)

# Number of Offspring vs. Residency in DLC
plot_1_2 <- lemurs %>%
  select(current_resident, age_max_live_or_dead_y, n_known_offspring) %>%
  na.omit() %>%
  ggplot(aes(current_resident, n_known_offspring)) +
  geom_boxplot(fill = "#56B4E9") +
  scale_x_discrete(
  name = "Current Resident", # x-axis name
  ) +
  scale_y_log10(
  name = "No. of Offspring" # y-axis name
  ) +
  ggtitle("Number of Offspring vs. Residency in DLC")+ 
  theme_bw(12)

plot_1_1
plot_1_2

```
```{r fig.height=7, fig.width=10}

# Question 2 - Faceted Linear Regression Plots

# Life Expectancy vs. Weight by Species
plot_2_1 <- lemurs %>%
  select(weight_g, taxon, age_max_live_or_dead_y, n_known_offspring) %>%
  na.omit() %>%
  ggplot(aes(weight_g, age_max_live_or_dead_y)) + geom_point() +
  scale_x_log10(
  name = "Weight (g)", # x-axis name
  ) +
  scale_y_continuous(
  name = "Life Expectancy" # y-axis name
  ) +
  ggtitle("Life Expectancy vs. Weight by Species")+ 
  theme_bw(12) +
  geom_smooth(method = "lm") + facet_wrap(vars(taxon))

plot_2_2 <- lemurs %>%
  select(weight_g, taxon, age_max_live_or_dead_y, n_known_offspring) %>%
  na.omit() %>%
  ggplot(aes(weight_g, n_known_offspring)) + geom_point() +
  scale_x_log10(
  name = "Weight (g)", # x-axis name
  ) +
  scale_y_continuous(
  name = "No. of Offspring" # y-axis name
  ) +
  ggtitle("No. of Offspring vs. Weight by Species")+ 
  theme_bw(12) +
  geom_smooth(method = "lm") + facet_wrap(vars(taxon))


plot_2_1
plot_2_2

```

**Discussion:**

For the first question, it clearly seems from the box plots that:

1. Lemurs not being kept in DLC are living longer than Lemurs who do. This should let us ask a follow up question why Lemurs die younger when in DLC although the main purpose of the institute is to help lemurs live longer and, therefore, reproduce and expand. Possible causes of this would be poor living conditions offered to the lemurs, changes in immunity to diseases, psychological and behaviour changes, etc. The next step is for us to investigate the causes of death for DLC lemurs so we can try to resolve the most influential causes.
2. Lemurs being kept in DLC generally reproduce more. This is expected since DLC's purpose is to save lemurs from extinction. Possible causes include the more general availability of potential mates in close vicinity which increases conception frequency, lemurs being exposed to conditions that increase their conception desire, better offspring care throghout pregnancy and post-birth which decreases death rates.

For the second question, it seems from the linear regression plots that weight is proportional to both life expectancy and no. of offspring (with an exception of a few species) although the positive slope for life expectancy vs. weight is more solid. That can help us conclude that a simple, easy to track, cost-efficient metric like weight can be used to measure how successful lemurs are likely to survive the risk of extinction. In simple terms, keeping the lemurs in ideal weight range would both make them live longer and increase their offspring.