Version: 30 June 2020

- arrow keys: move through slides
- f: toggle full-screen

dyplyr

  • dplyr is a package tailed towards data manipulation.
  • filter() picks cases based on their values.
  • select() picks variables based on their names.
  • mutate() adds new variables that are functions of existing variables
  • summarise() reduces multiple values down to a single summary.
  • group_by() allows to perform any operation by grouping variables.
  • arrange() changes the ordering of the rows.

filter() picks cases based on their values

filter(.data, ...)

# Without %>%:
filter(mtcars, disp > 350 & mpg < 11)

# with %>%
mtcars %>%
  filter(disp > 350 & mpg < 11)

Note: You don’t have to provide the name of a data object within the dplyr function!

not: filter(mtcars, mtcars$disp > 350 & mtcars$mpg < 11)

but: filter(mtcars, disp > 350 & mpg < 11)

Task

  • Take the starwars database.
  • Filter all cases with brown hair color and female gender (Hint: variables hair_color and sex)
Please stop the video here! Continue after completing the task!

library(tidyverse)
starwars %>% 
  filter(hair_color == "brown" & sex == "female")

select() picks variables based on their names

select(.data, ...)

# Either column numbers or column names
mtcars %>% select(1, 3, carb)
# Use the `:` symbol for ranges
mtcars %>% select(1:3, am:carb)
# use `-` to drop variables and keep the others
mtcars %>% select(-am, -(1:3), -(disp:drat))

starts_with() and ends_with() can be used to select variables by parts of their names:

mtcars %>% select(starts_with("d"))
mtcars %>% select(-ends_with("b"))

Task

  • Take the starwars database.
  • Filter for Human and select the variables name, height, and mass (Hint: variable species)
Please stop the video here! Continue after completing the task!

starwars %>% 
  filter(species == "Human") %>% 
  select(name:mass)

everything() use select to rearrange columns

The everything() function helps to rearrange the columns of a data frame

starwars %>% 
  select(name, birth_year, sex, everything())

mutate() adds new variables that are functions of existing variables

mutate(.data, ...)

mtcars %>%
  mutate(
    efficiency = mpg / disp,
    effect = round(hp / mpg * 100)
  )

Task

  • Take the starwars database.
  • Create a new variable with the body mass index bmi. Hint: bmi = kg / m²
  • Filter all humans with an bmi >= 26.
  • Select name, bmi, height, and mass
Please stop the video here! Continue after completing the task!

starwars %>%
  mutate(bmi = mass/ (height/100)^2) %>%
  filter(bmi >= 26 & species == "Human") %>%
  select(name, bmi, height, mass)

summarise() reduces multiple values down to a single summary.

mtcars %>%
  summarise(
    mean_mpg = mean(mpg),
    sd_mpg = sd(mpg),
    n = n()
  )

Task

  • Take the starwars database.
  • Filter Humans
  • Calculate the mean and the median for height and mass (Note: don’t forget to remove NA values: na.rm = TRUE)
Please stop the video here! Continue after completing the task!

starwars %>%
  filter(species == "Human") %>%
  summarise(
    mean_height = mean(height, na.rm = TRUE),
    median_height = median(height, na.rm = TRUE),
    mean_mass = mean(mass, na.rm = TRUE),
    median_mass = median(mass, na.rm = TRUE)
  )

group_by() allows to perform any operation by grouping variables.

mtcars %>% group_by(cyl) %>%
  summarise(
    mean_mpg = mean(mpg),
    sd_mpg = sd(mpg),
    n = n()
  )

mtcars %>% group_by(cyl, am) %>%
  summarise(
    mean_mpg = mean(mpg),
    sd_mpg = sd(mpg),
    n = n()
  )

Task

  • Take the starwars database.
  • Group by sex
  • Calculate the median for height and mass
  • And the number of cases per group (Hint: n())
Please stop the video here! Continue after completing the task!

starwars %>%
  group_by(sex) %>%
  summarise(
    median_height = median(height, na.rm = TRUE),
    median_mass = median(mass, na.rm = TRUE),
    n = n()
  )

arrange() changes the ordering of the rows

arrange(.data, ...)

dat <- data.frame(age = c(5,5,6,7,6), sen = c(0, 1, 1, 0, 0), points = c(34, 55, 22, 11, 9))
dat %>%
  arrange(sen, age) 

desc() specifies a variable to be arranged descending

dat %>%
  arrange(sen, desc(points))

Task

  • Take the starwars database.
  • Calculate the bmi.
  • Summarize the median of the bmi grouped by species.
  • round the bmi to one decimal.
  • Also calculate the n for each group.
  • Only show cases with n > 2.
  • Arrange the resulting table. Sort by n in descending order.
Please stop the video here! Continue after completing the task!

starwars %>% 
  mutate(bmi = mass / (height / 100)^2) %>% 
  group_by(species) %>% 
  summarise(
    mean_bmi = median(bmi, na.rm = TRUE) %>% round(1), 
    n = n()
  ) %>%
  filter(n >2) %>%
  arrange(desc(n))