Wind Turbine Capacities.Rmd · r-data-visualization-wind-turbines

---
title: "Wind Turbine Capacities"
---

```{r setup, include=FALSE}
library(tidyverse)
library(colorspace)
library(broom)
knitr::opts_chunk$set(echo = TRUE)
```

```{r message = FALSE}

wind_turbine <- readr::read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2020/2020-10-27/wind-turbine.csv')

```

More information about this dataset can be found here: 
https://github.com/rfordatascience/tidytuesday/tree/master/data/2020/2020-10-27

**Question:**

How does turbine rotor diameter relate to its rated capacity in Watts? Are there observed clusters in this relationship?

**Introduction:** 

In this project, we will use the `wind_turbine` dataset to analyze and understand several wind turbine designs for the purpose of understanding ways to maximize turbine's Watt capacity. `wind_turbine` dataset (collected 2020 by the Canadian Government) includes 6,698 observations of wind turbines across Canada including 15 attributes/variables providing details regarding locations, turbine specifications, project name, etc. 

Our question aims to understand the relationship between a wind turbine's rotor diameter and its watt capacity. We will only be considering the following variables from the dataset in our analysis:

- `rotor_diameter_m` which denotes each wind turbine's rotor diameter expressed in meters
- `turbine_rated_capacity_k_w` which denotes each wind turbine's rated watt capacity expressed in kilowatts

**Approach:** 

To perform our analysis, we will be using two plots:

1. First plot is a linear regression plot over a scatter plot. The aim of this plot is to explain the nature of the relationship between rotor diameter and rated watt capacity for wind turbines
2. Second plot is a k-means clustering plot which aims to show how many clusters of turbines could be grouped in the rotor hub vs. watt capacity relationship. To optimize the number of chosen clusters, we will use another scree plot to supplement the original k-means plot.

**Analysis:**

```{r }

wind_turbine_analysis <- wind_turbine %>%
  select(rotor_diameter_m, turbine_rated_capacity_k_w, province_territory)

# First Plot - Linear Regression
wind_turbine %>%
  select(rotor_diameter_m, turbine_rated_capacity_k_w) %>%
  na.omit() %>%
  ggplot(aes(rotor_diameter_m, turbine_rated_capacity_k_w)) + geom_point() +
  scale_x_continuous(
  name = "Rotor Diameter (m)",
  breaks = seq(0, 150, by = 10)# x-axis name
  ) +
  scale_y_continuous(
  name = "Turbine Rated Capacity (kW)" # y-axis name
  ) +
  ggtitle("Rated Capacity vs. Rotor Diameter")+ 
  theme_bw(12) +
  geom_smooth(method = "lm")

# Second Plot - Scree Plot (inspired by lecture notes)
withinss <- function(data, centers) {
  km_fit <- select(data, where(is.numeric)) %>%
    kmeans(centers = centers, nstart = 10)
  km_fit$tot.withinss
}
tibble(centers = 1:15) %>%
  mutate(
    within_sum_squares = map_dbl(
      centers, ~withinss(na.omit(wind_turbine_analysis), .x)
    )
  ) %>%
  ggplot() +
  aes(centers, within_sum_squares) +
  geom_point() +
  geom_line() +
  ggtitle("Scree Plot") +
  theme_bw(12)

# Third Plot - k-Means Plot with 4 clusters (inspired by lecture notes)

# Perform k-Means clustering
km_fit <- na.omit(wind_turbine_analysis) %>% 
  select(where(is.numeric)) %>%
  kmeans(centers = 4, nstart = 10)

# Plotting results
km_fit %>%
  augment(na.omit(wind_turbine_analysis)) %>%
  ggplot() +
  aes(rotor_diameter_m, turbine_rated_capacity_k_w) +
  geom_point(
    aes(color = .cluster)
  ) +
  geom_point(
    data = tidy(km_fit),
    aes(fill = cluster),
    shape = 21, color = "black", size = 4
  ) +
  guides(color = "none") +
  scale_x_continuous(
  name = "Rotor Diameter (m)",
  breaks = seq(0, 150, by = 10)# x-axis name
  ) +
  scale_y_continuous(
  name = "Turbine Rated Capacity (kW)" # y-axis name
  ) +
  ggtitle("Rated Capacity vs. Rotor Diameter")+ 
  theme_bw(12)

```

**Discussion:** 

Given the plots above, we can observe the following:

1. There is a clear proportional relationship between a wind turbine's rotor diameter and its rated capacity. The scatter plot seems to show the proportional relationship very clearly. Therefore, we can claim that increasing rotor diameter will surely increase the turbine's rated capacity.
2. We have 4 clusters of turbine's performance: low rotor diameter low rated capacity, medium rotor diameter with lower rated capacity, medium rotor diameter with higher rated capacity and high rotor diameter high rated capacity. Given the optimum clusters, we can observe that some medium rotor diameter turbines (roughly ranging between 60 and 115 meters) perform significantly better than others. It is key to better understand why rated capacity differs significantly for a close range of rotor diameters. This would as well present an opportunity to improve upon the medium rotor diameter turbines already available to generate more wattage.

Proposed next steps from here would be to:
1. Investigate the cost per kWh of generated electricity for each cluster of turbines. This would help us understand how to efficiently design high rotor diameter turbines with as less cost as possible. 
2. Understand the reason behind reduced performance for similar rotor diameter ranges. This would help generate more power with existing turbines thus increasing throughput of turbine farms across the country