Goals

  • An introduction to the Grammar of Graphics
  • An introduction to ggplot

ggplot2

ggplot components

Basic concepts of ggplot2

A ggplot graphic has at least three key components:

  1. data
  2. A set of aesthetic mappings between variables in the data and visual properties (x and y axis, colour, dotsize etc.)
  3. At least one geometries layer which describes how to render each observation (lines, point, bars etc.). Layers are usually created with a geom function

Task

  • Copy and execute these lines:
  • displ displacement by hwy highway miles per gallon
library(tidyverse)
ggplot(data = mpg, mapping = aes(x = displ, y = hwy)) + 
  geom_point()
Please stop the video here! Continue after completing the task!

library(tidyverse)
ggplot(mpg, aes(x = displ, y = hwy)) + 
  geom_point()

Task

  • Copy and execute these lines:
  • Axis: displ displacement and hwy highway miles per gallon
  • Colour: drv f = front-wheel drive, r = rear wheel drive, 4 = 4wd
  • Size: cyl number of cylinders
ggplot(mpg, aes(x = displ, y = hwy, color = drv, size = cyl)) + 
  geom_point()
Please stop the video here! Continue after completing the task!

ggplot(mpg, aes(x = displ, y = hwy, color = drv, size = cyl)) + 
  geom_point()

The ggplot() function

The main function is ggplot(). It takes two arguments:

  1. data : A data frame
  2. mapping : Aesthetic mappings provided with the aes() function.

Additional layers are added with a + sign.

ggplot(mpg, aes(x = displ, y = hwy, color = drv, size = cyl)) + 
  geom_point()

Task

  • Take the mpg data frame.
  • Plot a graph with …
    • cty and hwy displayed on the axis and
    • the colour is mapped on the variable class and
    • the shape (shape = drv) is mapped on the variable drv.
  • Use the geom_point() layer.
Please stop the video here! Continue after completing the task!

ggplot(mpg, aes(cty, hwy, colour = class, shape = drv))  + 
  geom_point()

Fixed aestetics

  • Aestetics can also be provided in the geom function.
  • Here they are not mapped to variables but fixed.
ggplot(diamonds, aes(carat, price)) + 
  geom_point(colour ="red", shape = "+", size = 1)

Some geoms

  • geom_point() : Dots for each data point.
  • geom_line() : Lines connecting each x-axis data point
  • geom_bar() : Bars
  • geom_text() : Text at x and y positions
  • geom_smooth() : Smoothed conditional means

Task

  • Take the economics data frame
  • Create a lineplot for date and unemployment. (geom_line())
  • Add a second layer for red dots of size = 1 (geom_point())
Please stop the video here! Continue after completing the task!

ggplot(economics, aes(date, unemploy)) + 
  geom_line() + 
  geom_point(color = "red", size = 1)

geom_bar()

  • geom_bar() draws bars
  • By default, it counts the numbers of entities of categories provided as the x variable
# Number of cars in each class:
ggplot(mpg, aes(class)) +
  geom_bar()

Task

  • Take the mpg data frame.
  • Create a barplot with the counts of categories for the drv variable.
  • Colour the bars red with the fill argument.
  • Set the argument width = 0.8 to resize the bar width.
Please stop the video here! Continue after completing the task!

ggplot(mpg, aes(drv)) +
  geom_bar(fill = "red", width = 0.8)

geom_bar()

  • With the argument stat = "identity", bar heights and bar categories are taken from the x and y variables:

Example

df <- data.frame(
  type = c("A", "B", "C"), 
  mean = c(2.5, 4.4, 6.3)
)
ggplot(df, aes(x = type, y = mean)) +
  geom_bar(stat = "identity")

Task

  • Take the starwars database.
  • Calculate the bmi. (mutate(bmi = mass / (height / 100)^2))
  • Summar ise the median of the bmi grouped by species. (summarise(mean_bmi = median(bmi, na.rm = TRUE))
  • Create a fitting barplot.
Please stop the video here! Continue after completing the task!

starwars %>% 
  mutate(bmi = mass / (height / 100)^2) %>% 
  group_by(species) %>% 
  summarise(
    mean_bmi = median(bmi, na.rm = TRUE)
  ) %>%
  ggplot(aes(species, mean_bmi)) +
    geom_bar(stat = "identity")

geom_smooth()

geom_smooth() is used to add smoothed conditional means in scatterplots.

ggplot(mpg,aes(displ, hwy)) + 
  geom_point() +
  geom_smooth()

Task

  • Take the economics data frame.
  • Create a scatterplot with number of unemployed unemploy by population pop.
  • Add a geom_smooth layer.
Please stop the video here! Continue after completing the task!

economics %>%
  ggplot(aes(pop, unemploy)) + 
    geom_point() + 
    geom_smooth()

Task

  • Install and activate the library dslabs.
  • Take the dataframe gapminder.
  • Group the data by year and continent. (group_by(year, continent))
  • Use the summarize() function to calculate the mean of infant_mortality.
  • Create a line and dot plot with year on x-axis, mean of infant_mortality on y-axis, and continent as line/dot colours.
  • Add a smooth layer.
Please stop the video here! Continue after completing the task!

library(dslabs)
gapminder %>% group_by(year, continent) %>%
  summarize(m_infant_mortality = mean(infant_mortality, na.rm = TRUE)) %>%
  ggplot(aes(x = year, y = m_infant_mortality, color = continent)) +
    geom_line() + 
    geom_point() + 
    geom_smooth()

Distributions of multiple datapoints in categories

When you have multiple values ordered in a categorical variable. Simple plots become messy:

ggplot(mpg,aes(drv, hwy)) +
  geom_point()

Distributions of multiple datapoints in categories (2)

Solutions

  • geom_jitter() : Adds a litle random jitter to each datapoint
  • geom_boxplot() : Draws a boxplot
  • geom_violin() : Draws a violine plot

Task

  • Take the mgp dataset
  • Create the following plots for the variables x = drv and y = hwy
    • geom_jitter()
    • geom_boxplot()
    • geom_violin()
Please stop the video here! Continue after completing the task!

ggplot(mpg,aes(drv, hwy)) +geom_jitter(width = 0.2)
ggplot(mpg,aes(drv, hwy)) +geom_boxplot()
ggplot(mpg,aes(drv, hwy)) +geom_violin()

Further layers

  • Add a title: ggtitle() ( e.g. ggtitle(“My first plot”) )
  • Change axis labels: labs(x = NULL, y = NULL) (e.g. labs(x = “Categories”, y = “Mean”))
  • Change axis scales: ylim(min = 0, max = 10) ; xlim(min = 0, max = 10)

ggplot(mpg, aes(cty, hwy)) + 
  geom_point() +
  facet_wrap(~class)

class <- mpg %>%
  group_by(class) %>%
  summarise(n=n(), hwy = mean(hwy))

ggplot(mpg, aes(class, hwy)) +
  geom_jitter(width = 0.2) +
  geom_point(data = class, mapping = aes(class, hwy), colour = "red", size = 6) +
  geom_text(data = class, aes(class, 10, label = paste0("n=", n))) + 
  ylim(10, 45)