lemurs <- readr::read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2021/2021-08-24/lemur_data.csv')
lemurs
## # A tibble: 82,609 × 54
## taxon dlc_id hybrid sex name current_resident stud_book dob
## <chr> <chr> <chr> <chr> <chr> <chr> <chr> <date>
## 1 OGG 0005 N M KANGA N <NA> 1961-08-25
## 2 OGG 0005 N M KANGA N <NA> 1961-08-25
## 3 OGG 0006 N F ROO N <NA> 1961-03-17
## 4 OGG 0006 N F ROO N <NA> 1961-03-17
## 5 OGG 0009 N M POOH BEAR N <NA> 1963-09-30
## 6 OGG 0009 N M POOH BEAR N <NA> 1963-09-30
## 7 OGG 0009 N M POOH BEAR N <NA> 1963-09-30
## 8 OGG 0010 N M EEYORE N <NA> 1964-05-20
## 9 OGG 0010 N M EEYORE N <NA> 1964-05-20
## 10 OGG 0014 N F ROOLETTE N <NA> 1964-10-27
## # ℹ 82,599 more rows
## # ℹ 46 more variables: birth_month <dbl>, estimated_dob <chr>,
## # birth_type <chr>, birth_institution <chr>, litter_size <dbl>,
## # expected_gestation <dbl>, estimated_concep <date>, concep_month <dbl>,
## # dam_id <chr>, dam_name <chr>, dam_taxon <chr>, dam_dob <date>,
## # dam_age_at_concep_y <dbl>, sire_id <chr>, sire_name <chr>,
## # sire_taxon <chr>, sire_dob <date>, sire_age_at_concep_y <dbl>, …
More information about the dataset can be found here: https://github.com/rfordatascience/tidytuesday/tree/master/data/2021/2021-08-24 and https://www.nature.com/articles/sdata201419.
Question:
Introduction:
Our purpose for this project is to analyze and explore ways we can
save the most endangered mammal species in our planet, Lemurs. We will
be exploring the lemurs
dataset (collected 2014 by Zehr,
Roach, et al) which entails 82,609 observations of Lemurs living in Duke
Lemur Center (DLC) including 54 attributes/variables detailing age, sex,
species, life expectancy, number of offspring, etc.
Our first question investigates whether taking in lemurs in DLC increases their life expectancy and no. of offspring (i.e. is there a significant change vs. living in the wild). The second question investigates whether weight has a relationship with both life expectancy and no. of offspring. We will only be considering the following attributes/variables to answer our questions:
weight_g
(describing lemur
weight), taxon
(describing lemur species) and
current_resident
(denoting whether the Lemur is a current
resident at DLC)age_max_live_or_dead_y
(maximum
age reached by living or dead Lemurs by the time of the study) and
n_known_offspring
(total number of offspring known to have
been reproduced by the Lemur)Approach:
To start our analysis, we will follow the following steps:
Analysis:
# Question 1 - Box Plots
# Life Expectancy vs. Residency in DLC
plot_1_1 <- lemurs %>%
select(current_resident, age_max_live_or_dead_y, n_known_offspring) %>%
na.omit() %>%
ggplot(aes(current_resident, age_max_live_or_dead_y)) +
geom_boxplot(fill = "#56B4E9") +
scale_x_discrete(
name = "Current Resident", # x-axis name
) +
scale_y_continuous(
name = "Life Expectancy" # y-axis name
) +
ggtitle("Life Expectancy vs. Residency in DLC")+
theme_bw(12)
# Number of Offspring vs. Residency in DLC
plot_1_2 <- lemurs %>%
select(current_resident, age_max_live_or_dead_y, n_known_offspring) %>%
na.omit() %>%
ggplot(aes(current_resident, n_known_offspring)) +
geom_boxplot(fill = "#56B4E9") +
scale_x_discrete(
name = "Current Resident", # x-axis name
) +
scale_y_log10(
name = "No. of Offspring" # y-axis name
) +
ggtitle("Number of Offspring vs. Residency in DLC")+
theme_bw(12)
plot_1_1
plot_1_2
# Question 2 - Faceted Linear Regression Plots
# Life Expectancy vs. Weight by Species
plot_2_1 <- lemurs %>%
select(weight_g, taxon, age_max_live_or_dead_y, n_known_offspring) %>%
na.omit() %>%
ggplot(aes(weight_g, age_max_live_or_dead_y)) + geom_point() +
scale_x_log10(
name = "Weight (g)", # x-axis name
) +
scale_y_continuous(
name = "Life Expectancy" # y-axis name
) +
ggtitle("Life Expectancy vs. Weight by Species")+
theme_bw(12) +
geom_smooth(method = "lm") + facet_wrap(vars(taxon))
plot_2_2 <- lemurs %>%
select(weight_g, taxon, age_max_live_or_dead_y, n_known_offspring) %>%
na.omit() %>%
ggplot(aes(weight_g, n_known_offspring)) + geom_point() +
scale_x_log10(
name = "Weight (g)", # x-axis name
) +
scale_y_continuous(
name = "No. of Offspring" # y-axis name
) +
ggtitle("No. of Offspring vs. Weight by Species")+
theme_bw(12) +
geom_smooth(method = "lm") + facet_wrap(vars(taxon))
plot_2_1
## `geom_smooth()` using formula = 'y ~ x'
plot_2_2
## `geom_smooth()` using formula = 'y ~ x'
Discussion:
For the first question, it clearly seems from the box plots that:
For the second question, it seems from the linear regression plots that weight is proportional to both life expectancy and no. of offspring (with an exception of a few species) although the positive slope for life expectancy vs. weight is more solid. That can help us conclude that a simple, easy to track, cost-efficient metric like weight can be used to measure how successful lemurs are likely to survive the risk of extinction. In simple terms, keeping the lemurs in ideal weight range would both make them live longer and increase their offspring.