R-Bootcamp-Course / 4_Visualization_in_R.Rmd
4_Visualization_in_R.Rmd
Raw
---
title: "Data Visualization in R using ggplot2"
author: "Chiranjit Dutta"
date: "7/20/21"
output: 
   html_document:
    df_print: paged
    toc: yes
    number_sections: yes
---


# Why ggplot2?

- More elegant & compact code than with base graphics
- More aesthetically pleasing defaults than lattice
- Very powerful for exploratory data analysis
- ‘gg’ is for ‘grammar of graphics’ (term by Lee Wilkinson)
- A set of terms that defines the basic components of a plot
- Used to produce figures using coherent, consistent syntax
- Supports a continuum of expertise:
- Easy to get started, plenty of power for complex figures

```{r}
# Load library ggplot2
library(ggplot2)
```

# Basics of ggplot2
## Data

- Must be a data.frame
- Gets pulled into the ggplot() object

```{r}
head(iris)
```

## Aesthetics

- How your data are represented visually
    - a.k.a. mapping
- which data on the x
- which data on the y
- but also: color, size , shape, transparency


## Geometry

- The geometric objects in the plot
- points, lines, polygons, etc
- shortcut functions: geom point(), geom bar(), geom line()

### Basic Structure

- Specify the data and variables inside the ggplot function.
- Anything else that goes in here becomes a global setting.
- Then add layers: geometric objects, statistical models, and
facets.

### An Example:

```{r}
#ggplot(data = iris, aes(x = Sepal.Length, y = Sepal.Width))+ geom_point()
myplot <- ggplot(data = iris, aes(x = Sepal.Length, y = Sepal.Width))
myplot + geom_point()
```

### Changing the aesthetics of a geom: Increase the size of points

```{r}
ggplot(data = iris, aes(x = Sepal.Length, y = Sepal.Width)) +
geom_point(size = 3)
```

### Changing the aesthetics of a geom: Add some color

```{r}
ggplot(iris, aes(Sepal.Length, Sepal.Width, color = Species)) +
geom_point(size = 3)
```

### Changing the aesthetics of a geom: Differentiate points by shape

```{r}
ggplot(iris, aes(Sepal.Length, Sepal.Width, color = Species)) +
geom_point(aes(shape = Species), size = 3)
```

## Stats

- Statistical transformations and data summary
- All geoms have associated default stats, and vice-versa
- e.g. binning for a histogram or fitting a linear model

### Example:

```{r}
# Box-plot illustrating birth weight by race
library(MASS) # for loading birthwt data 
ggplot(birthwt, aes(factor(race), bwt)) + geom_boxplot()
```

## Facets

- Subsetting data to make lattice plots
- Really powerful

### Faceting: single column, multiple rows

```{r}
ggplot(iris, aes(Sepal.Length, Sepal.Width, color = Species)) + geom_point() + facet_grid(Species ~ .)
```

### Faceting: single row, multiple columns

```{r}
ggplot(iris, aes(Sepal.Length, Sepal.Width, color = Species)) + geom_point() + facet_grid( . ~ Species)
```

### or just wrap your facets

```{r}
ggplot(iris, aes(Sepal.Length, Sepal.Width, color = Species)) +
geom_point() +
facet_wrap( ~ Species) # notice lack of .
```


## Scales

- Control the mapping from data to aesthetics
- Often used for adjusting color mapping

### Example

```{r}
ggplot(iris, aes(Sepal.Length, Sepal.Width, color = Species)) +
  geom_point() + facet_grid(Species ~ .) + scale_color_manual(values = c("red", "green", "blue"))
```

### Commonly used scales

```{r,eval=FALSE}
scale_fill_discrete(); scale_colour_discrete()
scale_fill_hue(); scale_color_hue()
scale_fill_manual(); scale_color_manual()
scale_fill_brewer(); scale_color_brewer()
scale_linetype(); scale_shape_manual()
```

# Histogram

```{r}
h <- ggplot(faithful, aes(x = waiting))
h + geom_histogram(binwidth = 8, fill = "steelblue",
colour = "black")
```

# Line plot

```{r}
# Read the climate data:
climate <- read.csv("C:/Users/Chiranjit Dutta/Dropbox/Chiranjit Dutta/R Tutorial Summer 2022/R Tutorial 2022/Lecture_materials/Data/climate.csv", header = T)
```

```{r}
ggplot(climate, aes(Year, Anomaly10y)) + geom_line()
```

We can also plot confidence regions

```{r}
ggplot(climate, aes(Year, Anomaly10y)) + geom_ribbon(aes(ymin = Anomaly10y - Unc10y, ymax = Anomaly10y + Unc10y),fill = "blue", alpha = .1) + geom_line(color = "steelblue")
```

# Bar Plot

```{r}
library(tidyr)
df <- gather(iris, variable, value, -Species)
```

```{r}
ggplot(df, aes(Species, value, fill = variable)) +
geom_bar(stat = "identity")
```

```{r}
ggplot(df, aes(Species, value, fill = variable)) + geom_bar(stat = "identity", position = "dodge")
```

```{r}
ggplot(df, aes(Species, value, fill = variable)) + geom_bar(stat = "identity", position="dodge", color="black")
```

# Density Plot

```{r}
ggplot(faithful, aes(waiting)) + geom_density()
```

```{r}
ggplot(faithful, aes(waiting)) + geom_density(fill = "blue", alpha = 0.1)
```

# Themes

### Adding themes

Themes are a great way to define custom plots.

```{r,eval=FALSE}
+theme()
# see ?theme() for more options
```

### Example of a themed plot

```{r}
ggplot(iris, aes(Sepal.Length, Sepal.Width, color = Species)) + geom_point(size = 1.2, shape = 16) +
  facet_wrap( ~ Species) +
  theme(legend.key = element_rect(fill = NA),
        legend.position = "bottom",
        strip.background = element_rect(fill = NA),
        axis.title.y = element_text(angle = 0))
```

# Plotting multiple time series on a single graph:

The US economics time series datasets are used from package ggplot2. This is a data frame with 478 rows and 6 variables.

- date: Month of data collection
- psavert: personal savings rate
- pce: personal consumption expenditures, in billions of dollars
- unemploy: number of unemployed in thousands
- uempmed: median duration of unemployment, in weeks
- pop: total population, in thousands

```{r}
head(economics)

```

```{r}
ggplot(economics, aes(x=date)) + 
  geom_line(aes(y = psavert), color = "darkred") + 
  geom_line(aes(y = uempmed), color="steelblue", linetype="twodash") + 
  ggtitle("Multiple time series plot on a single graph") # putting the title on the graph
```

# Saving plots

- If the plot is on your screen

```{r,eval=FALSE}
ggsave("˜/path/to/figure/filename.png")
```


- If your plot is assigned to an object

```{r,eval=FALSE}
ggsave(plot1, file = "˜/path/to/figure/filename.png")
```


- Specify a size

```{r,eval=FALSE}
ggsave(file = "/path/to/figure/filename.png", width = 6, height =4)
```

- or any format (pdf, png, eps, svg, jpg)

```{r,eval=FALSE}
ggsave(file = "/path/to/figure/filename.eps")
ggsave(file = "/path/to/figure/filename.jpg")
ggsave(file = "/path/to/figure/filename.pdf")
```