--- title: "Data Visualization in R using ggplot2" author: "Chiranjit Dutta" date: "7/20/21" output: html_document: df_print: paged toc: yes number_sections: yes --- # Why ggplot2? - More elegant & compact code than with base graphics - More aesthetically pleasing defaults than lattice - Very powerful for exploratory data analysis - ‘gg’ is for ‘grammar of graphics’ (term by Lee Wilkinson) - A set of terms that defines the basic components of a plot - Used to produce figures using coherent, consistent syntax - Supports a continuum of expertise: - Easy to get started, plenty of power for complex figures ```{r} # Load library ggplot2 library(ggplot2) ``` # Basics of ggplot2 ## Data - Must be a data.frame - Gets pulled into the ggplot() object ```{r} head(iris) ``` ## Aesthetics - How your data are represented visually - a.k.a. mapping - which data on the x - which data on the y - but also: color, size , shape, transparency ## Geometry - The geometric objects in the plot - points, lines, polygons, etc - shortcut functions: geom point(), geom bar(), geom line() ### Basic Structure - Specify the data and variables inside the ggplot function. - Anything else that goes in here becomes a global setting. - Then add layers: geometric objects, statistical models, and facets. ### An Example: ```{r} #ggplot(data = iris, aes(x = Sepal.Length, y = Sepal.Width))+ geom_point() myplot <- ggplot(data = iris, aes(x = Sepal.Length, y = Sepal.Width)) myplot + geom_point() ``` ### Changing the aesthetics of a geom: Increase the size of points ```{r} ggplot(data = iris, aes(x = Sepal.Length, y = Sepal.Width)) + geom_point(size = 3) ``` ### Changing the aesthetics of a geom: Add some color ```{r} ggplot(iris, aes(Sepal.Length, Sepal.Width, color = Species)) + geom_point(size = 3) ``` ### Changing the aesthetics of a geom: Differentiate points by shape ```{r} ggplot(iris, aes(Sepal.Length, Sepal.Width, color = Species)) + geom_point(aes(shape = Species), size = 3) ``` ## Stats - Statistical transformations and data summary - All geoms have associated default stats, and vice-versa - e.g. binning for a histogram or fitting a linear model ### Example: ```{r} # Box-plot illustrating birth weight by race library(MASS) # for loading birthwt data ggplot(birthwt, aes(factor(race), bwt)) + geom_boxplot() ``` ## Facets - Subsetting data to make lattice plots - Really powerful ### Faceting: single column, multiple rows ```{r} ggplot(iris, aes(Sepal.Length, Sepal.Width, color = Species)) + geom_point() + facet_grid(Species ~ .) ``` ### Faceting: single row, multiple columns ```{r} ggplot(iris, aes(Sepal.Length, Sepal.Width, color = Species)) + geom_point() + facet_grid( . ~ Species) ``` ### or just wrap your facets ```{r} ggplot(iris, aes(Sepal.Length, Sepal.Width, color = Species)) + geom_point() + facet_wrap( ~ Species) # notice lack of . ``` ## Scales - Control the mapping from data to aesthetics - Often used for adjusting color mapping ### Example ```{r} ggplot(iris, aes(Sepal.Length, Sepal.Width, color = Species)) + geom_point() + facet_grid(Species ~ .) + scale_color_manual(values = c("red", "green", "blue")) ``` ### Commonly used scales ```{r,eval=FALSE} scale_fill_discrete(); scale_colour_discrete() scale_fill_hue(); scale_color_hue() scale_fill_manual(); scale_color_manual() scale_fill_brewer(); scale_color_brewer() scale_linetype(); scale_shape_manual() ``` # Histogram ```{r} h <- ggplot(faithful, aes(x = waiting)) h + geom_histogram(binwidth = 8, fill = "steelblue", colour = "black") ``` # Line plot ```{r} # Read the climate data: climate <- read.csv("C:/Users/Chiranjit Dutta/Dropbox/Chiranjit Dutta/R Tutorial Summer 2022/R Tutorial 2022/Lecture_materials/Data/climate.csv", header = T) ``` ```{r} ggplot(climate, aes(Year, Anomaly10y)) + geom_line() ``` We can also plot confidence regions ```{r} ggplot(climate, aes(Year, Anomaly10y)) + geom_ribbon(aes(ymin = Anomaly10y - Unc10y, ymax = Anomaly10y + Unc10y),fill = "blue", alpha = .1) + geom_line(color = "steelblue") ``` # Bar Plot ```{r} library(tidyr) df <- gather(iris, variable, value, -Species) ``` ```{r} ggplot(df, aes(Species, value, fill = variable)) + geom_bar(stat = "identity") ``` ```{r} ggplot(df, aes(Species, value, fill = variable)) + geom_bar(stat = "identity", position = "dodge") ``` ```{r} ggplot(df, aes(Species, value, fill = variable)) + geom_bar(stat = "identity", position="dodge", color="black") ``` # Density Plot ```{r} ggplot(faithful, aes(waiting)) + geom_density() ``` ```{r} ggplot(faithful, aes(waiting)) + geom_density(fill = "blue", alpha = 0.1) ``` # Themes ### Adding themes Themes are a great way to define custom plots. ```{r,eval=FALSE} +theme() # see ?theme() for more options ``` ### Example of a themed plot ```{r} ggplot(iris, aes(Sepal.Length, Sepal.Width, color = Species)) + geom_point(size = 1.2, shape = 16) + facet_wrap( ~ Species) + theme(legend.key = element_rect(fill = NA), legend.position = "bottom", strip.background = element_rect(fill = NA), axis.title.y = element_text(angle = 0)) ``` # Plotting multiple time series on a single graph: The US economics time series datasets are used from package ggplot2. This is a data frame with 478 rows and 6 variables. - date: Month of data collection - psavert: personal savings rate - pce: personal consumption expenditures, in billions of dollars - unemploy: number of unemployed in thousands - uempmed: median duration of unemployment, in weeks - pop: total population, in thousands ```{r} head(economics) ``` ```{r} ggplot(economics, aes(x=date)) + geom_line(aes(y = psavert), color = "darkred") + geom_line(aes(y = uempmed), color="steelblue", linetype="twodash") + ggtitle("Multiple time series plot on a single graph") # putting the title on the graph ``` # Saving plots - If the plot is on your screen ```{r,eval=FALSE} ggsave("˜/path/to/figure/filename.png") ``` - If your plot is assigned to an object ```{r,eval=FALSE} ggsave(plot1, file = "˜/path/to/figure/filename.png") ``` - Specify a size ```{r,eval=FALSE} ggsave(file = "/path/to/figure/filename.png", width = 6, height =4) ``` - or any format (pdf, png, eps, svg, jpg) ```{r,eval=FALSE} ggsave(file = "/path/to/figure/filename.eps") ggsave(file = "/path/to/figure/filename.jpg") ggsave(file = "/path/to/figure/filename.pdf") ```