wind_turbine <- readr::read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2020/2020-10-27/wind-turbine.csv')
More information about this dataset can be found here: https://github.com/rfordatascience/tidytuesday/tree/master/data/2020/2020-10-27
Question:
How does turbine rotor diameter relate to its rated capacity in Watts? Are there observed clusters in this relationship?
Introduction:
In this project, we will use the wind_turbine
dataset to
analyze and understand several wind turbine designs for the purpose of
understanding ways to maximize turbine’s Watt capacity.
wind_turbine
dataset (collected 2020 by the Canadian
Government) includes 6,698 observations of wind turbines across Canada
including 15 attributes/variables providing details regarding locations,
turbine specifications, project name, etc.
Our question aims to understand the relationship between a wind turbine’s rotor diameter and its watt capacity. We will only be considering the following variables from the dataset in our analysis:
rotor_diameter_m
which denotes each wind turbine’s
rotor diameter expressed in metersturbine_rated_capacity_k_w
which denotes each wind
turbine’s rated watt capacity expressed in kilowattsApproach:
To perform our analysis, we will be using two plots:
Analysis:
wind_turbine_analysis <- wind_turbine %>%
select(rotor_diameter_m, turbine_rated_capacity_k_w, province_territory)
# First Plot - Linear Regression
wind_turbine %>%
select(rotor_diameter_m, turbine_rated_capacity_k_w) %>%
na.omit() %>%
ggplot(aes(rotor_diameter_m, turbine_rated_capacity_k_w)) + geom_point() +
scale_x_continuous(
name = "Rotor Diameter (m)",
breaks = seq(0, 150, by = 10)# x-axis name
) +
scale_y_continuous(
name = "Turbine Rated Capacity (kW)" # y-axis name
) +
ggtitle("Rated Capacity vs. Rotor Diameter")+
theme_bw(12) +
geom_smooth(method = "lm")
## `geom_smooth()` using formula = 'y ~ x'
# Second Plot - Scree Plot (inspired by lecture notes)
withinss <- function(data, centers) {
km_fit <- select(data, where(is.numeric)) %>%
kmeans(centers = centers, nstart = 10)
km_fit$tot.withinss
}
tibble(centers = 1:15) %>%
mutate(
within_sum_squares = map_dbl(
centers, ~withinss(na.omit(wind_turbine_analysis), .x)
)
) %>%
ggplot() +
aes(centers, within_sum_squares) +
geom_point() +
geom_line() +
ggtitle("Scree Plot") +
theme_bw(12)
# Third Plot - k-Means Plot with 4 clusters (inspired by lecture notes)
# Perform k-Means clustering
km_fit <- na.omit(wind_turbine_analysis) %>%
select(where(is.numeric)) %>%
kmeans(centers = 4, nstart = 10)
# Plotting results
km_fit %>%
augment(na.omit(wind_turbine_analysis)) %>%
ggplot() +
aes(rotor_diameter_m, turbine_rated_capacity_k_w) +
geom_point(
aes(color = .cluster)
) +
geom_point(
data = tidy(km_fit),
aes(fill = cluster),
shape = 21, color = "black", size = 4
) +
guides(color = "none") +
scale_x_continuous(
name = "Rotor Diameter (m)",
breaks = seq(0, 150, by = 10)# x-axis name
) +
scale_y_continuous(
name = "Turbine Rated Capacity (kW)" # y-axis name
) +
ggtitle("Rated Capacity vs. Rotor Diameter")+
theme_bw(12)
Discussion:
Given the plots above, we can observe the following:
Proposed next steps from here would be to: 1. Investigate the cost per kWh of generated electricity for each cluster of turbines. This would help us understand how to efficiently design high rotor diameter turbines with as less cost as possible. 2. Understand the reason behind reduced performance for similar rotor diameter ranges. This would help generate more power with existing turbines thus increasing throughput of turbine farms across the country