Version: 10 November 2020

- arrow keys: move through slides
- f: toggle full-screen

Goals

  • Basic concepts of R: library, function, argument
  • Object types: numbers, logic values, characters
  • object structures: vectors, data frames
  • Sophisticated subsetting
  • Import and export data to Microsoft Excel, csv files, SPSS
  • Install and update packages (cran, source, and other repositories)
  • Organize your work in R Studio (Projects)

Goals

  • Basic concepts of R: library, function, argument
  • Object types: numbers, logic values, characters
  • object structures: vectors, data frames
  • Sophiticated subsetting
  • Import and export data to Microsoft Excel, csv files, SPSS
  • Install and update packages (cran, source, and other repositories)
  • Organize your work in R Studio (Projects)

Basic concepts

Functions

  • With a function you command the computer to do something.
  • functions have a function name (e.g., mean, sqrt).
  • functions take arguments to specify what to do.
  • arguments have argument names as well.
  • functions always consist of a function name followed by brackets.

function_name(argument_name = value, argumentname = value, ...)

Examples

# sqrt calculates the square root
sqrt(x = 16)
## [1] 4
# You can omit the argument name if it is the first argument
sqrt(16)
## [1] 4
# Even without arguments you still need the brackets
date()
## [1] "Tue Nov 10 10:26:09 2020"

Help files

# Function
help("sqrt")

# Short cut
?sqrt

… or use the bottom-right help panel in R Studio

Task

Take a look at the mean() function:
What arguments could be specified?

Please stop the video here! Continue after completing the task!

help("mean")
?mean

The mean function takes three arguments:

  • x : A vector with values
  • trim : A fraction (e.g. 0.3) of values that are trimmed from each each end of the vector x
  • na.rm : If TRUE (na.rm = TRUE), missing values (NAs) are removed before calculating the mean

Operations

Operations are a special kind of functions that have a shortcut.

# function `assign` and the short cut
assign(x = "y", value = 10)

y <- 10
# function `+`
"+"(e1 = 10, e2 = 10)

10 + 10
# function `print`
print(x = y)

y

Objects

Objects (also called variables) have an object name and contain data.
The data are assigned to an object with the <- or = operator.

x <- 10

You can see the value(s) of an object with the print() function, or by just typing the object name:

print(x)
x

Objects can be used for operators and arguments in functions:

x <- 16
y <- 13

x * y
sqrt(x)

You can write the return values of a function into a new object:

z <- sqrt(x)
z

And you can combine these:

exp(z) + sqrt(y)

Task

Assign the values 40 and 24 to the variables a and b. Calculate the square root of the sum of a and b.

Please stop the video here! Continue after completing the task!

Task - solution

Assign the values 40 and 24 to the variables a and b. Calculate the square root of the sum of a and b.

a <- 40
b <- 24
sqrt(a + b)
## [1] 8

Data types

The data of objects can be numbers, text or TRUE/FALSE values. These are called data types

  • Numeric: e.g. Integer or decimal numbers 1, 1.35
  • Character: Always between " " or ’ ’ signs: "A", 'House'
  • Logical: TRUE, FALSE
x <- 10
y <- "Hello world!"
z <- FALSE

Data structures

Data are organized in structures:

  • Atomic vectors: A single value
  • Vectors: A chain of values
  • Factors: Values with assigned labels
  • List: A series elements, each one containing one or more (atomic) vectors
  • Data Frames: A list with one vector for each element and all vectors of the same length
  • Matrix: A two dimensional table with values of the same data type.
  • Array: Like a matrix but with more dimensions.

Data structures

Data are organized in structures:

  • Atomic vectors: A single value
  • Vectors: A chain of values
  • Factors: Values with assigned labels
  • List: A series elements, each one containing one or more (atomic) vectors
  • Data Frames: A list with one vector for each element and all vectors of the same length
  • Matrix: A two dimensional table with values of the same data type.
  • Array: Like a matrix but with more dimensions.

How to build a vector

You create a vector with the c() function:

c(2, 4, 6, 3, 7)
## [1] 2 4 6 3 7
y <- c(2, 4, 6, 3, 7)
y
## [1] 2 4 6 3 7

The colon : operator creates a numerical sequence:

1:10

You can build a vector of any data type:

firstname <- c("Dustin", "Mike", "Will")
curly <- c(TRUE, FALSE, FALSE)
age <- c(9, 11, 10)

But do not mix data types in a vector. You will get an error or they are internally changed:

age <- c("quite young", 10, 12, "very old")
age
## [1] "quite young" "10"          "12"          "very old"

Task

Create a vector (named friends comprising four names of your friends.

Please stop the video here! Continue after completing the task!

Task - solution

Create a vector (named friends comprising four names of your friends.

friends <- c("Matthias", "Markus", "Thomas", "Christian")

Combining vectors to new vectors

When an object is a vector it can be reused within the c() function to build a new vector:

x <- c(3, 5, 7)
c(x, 5, 8, 9)
## [1] 3 5 7 5 8 9

Combining vectors to new vectors

Be careful not to confuse an object name with a character:

x <- c("A", "B", "C")
c("x", "D", "E", "F")
## [1] "x" "D" "E" "F"
c(x, "D", "E", "F")
## [1] "A" "B" "C" "D" "E" "F"
c(A, B, C)
## Error in eval(expr, envir, enclos): object 'A' not found

Missing values

A missing value is represented with NA (Not Available).

age <- c(9, NA, 11)
name <- c("Tick", "Trick", NA)
age
## [1]  9 NA 11
name
## [1] "Tick"  "Trick" NA

Task

Create a vector with the values 2, 5, 7, 4, 7, 2, 6. Calculate the mean of these values. (Note: Use the mean() function to calculate the mean).

Please stop the video here! Continue after completing the task!

Task - solution

Create a vector with the values 2, 5, 7, 4, 7, 2, 6. Calculate the mean of these values. (Note: Use the mean() function to calculate the mean).

x <- c(2, 5, 7, 4, 7, 2, 6)
mean(x)
## [1] 4.714286

Task

Create a vector with the values 2, NA, 7, 4, NA, 2, 6. Calculate the mean of these values.
(Note: Read through ?mean if you encounter problems.)

Please stop the video here! Continue after completing the task!

Task - solution

Create a vector with the values 2, NA, 7, 4, NA, 2, 6. Calculate the mean of these values.
(Note: Read through ?mean if you encounter problems.)

x <- c(2, NA, 7, 4, NA, 2, 6)
mean(x, na.rm = TRUE)
## [1] 4.2

Selecting elements with square brackets

names <- c("Sheldon", "Leonard", "Penny", "Amy")
names[1]
## [1] "Sheldon"
# Pass a vector to extract multiple elements:
names[c(1,4)]
## [1] "Sheldon" "Amy"

Task

Take the vector
names <- c("Sheldon", "Leonard", "Penny", "Amy")
and reorder it to get the following result:
[1] "Sheldon" "Amy" "Sheldon" "Amy" "Leonard" "Penny"

Please stop the video here! Continue after completing the task!

Task - solution

Take the vector
names <- c("Sheldon", "Leonard", "Penny", "Amy")
and reorder it to get the following result:
[1] "Sheldon" "Amy" "Sheldon" "Amy" "Leonard" "Penny"

x <- c(1, 4, 1, 4, 2, 3)
new_order <- names[x]
new_order
## [1] "Sheldon" "Amy"     "Sheldon" "Amy"     "Leonard" "Penny"

A factor

A factor is a vector with labels for vector levels. E.g., a vector contains the values 0 and 1 where 0 stands for “with behavioral problems” and 1 stands for "“without behavioral problems”.

A factor is created with the factor() function.

sen <- factor(c(1, 0, 1, 0, 0, 0), levels =  c(0, 1), labels = c("No SEN", "With SEN"))
sen
## [1] With SEN No SEN   With SEN No SEN   No SEN   No SEN  
## Levels: No SEN With SEN

Task

Build a factor for gender with the labels male, female, non-binary. Include a vector for six fictitious gender values.

Please stop the video here! Continue after completing the task!

Task - solution

Build a factor for gender with the labels male, female, non-binary. Include a vector for six fictitious gender values.

gender <- factor(
  c(1, 3, 2, 1, 2, 1), 
  levels = 1:3, 
  labels = c("male", "female", "non-binary")
)
gender
## [1] male       non-binary female     male       female     male      
## Levels: male female non-binary

How to build a data frame

Data frames are the standard object for storing research data. They contain variables (columns) and cases (rows). A data frame is created with the data.frame() function.

# For better convenience I have inserted additional linebreaks and spaces
study <- data.frame(
  sen    = c(0, 1, 0, 1, 0, 1),
  gender = c("M", "M", "F", "M", "F", "F"),
  age    = c(12, 13, 11, 10, 11, 14),
  IQ     = c(90, 85, 90, 87, 99, 89)
)
study
sen gender age IQ
0 M 12 90
1 M 13 85
0 F 11 90
1 M 10 87
0 F 11 99
1 F 14 89

Extracting a variable from a data frame

Variables within a data frame are extracted with double square brackets.

study[["sen"]]
## [1] 0 1 0 1 0 1
study[["IQ"]]
## [1] 90 85 90 87 99 89

Note: an alternative approach is to use the $ sign: study$sen. But we will not use this approach for now.

Subsetting a data frame

Specific cases are selected within square brackets: object_name[rows, columns].

study[5, ]
sen gender age IQ
5 0 F 11 99
study[c(2, 6), ]
sen gender age IQ
2 1 M 13 85
6 1 F 14 89

study[c(2, 6), "IQ"]
## [1] 85 89
study[c(2, 6), c("sen", "IQ")]
sen IQ
2 1 85
6 1 89

You could also use numbers to address the columns:

study[, 2]
## [1] "M" "M" "F" "M" "F" "F"
study[c(2, 6), c(1, 3)]
sen age
2 1 13
6 1 14

Task

Please create a new data frame (study2) comprising the gender and age variables for the cases 1, 3, and 5 of the study data frame.

Please stop the video here! Continue after completing the task!

Task - solution

Please create a new data frame (study2) comprising the gender and age variables for the cases 1, 3, and 5 of the study data frame.

study2 <- study[c(1, 3, 5), c("gender", "age")]
study2
gender age
1 M 12
3 F 11
5 F 11