Introduction To R

Alan T. Arnholt

Last modified on August 15, 2023 14:20:00 Eastern Daylight Time

R as a calculator

> 3 + 3      # addition
[1] 6
> 10 - 4     # subtraction
[1] 6
> 6 * 7      # multiplication
[1] 42
> 120 / 10   # division
[1] 12

Comments and variable assignment

  • R uses # for comments
> # This is a comment
  • To assign a value to a variable use <-
  • The expression my_stuff <- 123 assigns the number 123 to the variable my_stuff
> my_stuff <- 123
> my_stuff
[1] 123

More complex expressions

Find the volume of a cylinder with a radius of 1 foot and a height of 10 feet.

Recall that \(\text{Volume} = \pi \times r^2 \times h\)

> r <- 1
> h <- 10
> (volume <- pi*r^2*h)
[1] 31.41593
> # or
> pi*1^2*10
[1] 31.41593

Basic data types

  • Numeric
  • Logical
  • Character
  • Complex
  • Raw

What is the data type?

> my_numeric <- 13
> my_logical <- TRUE
> my_character <- "some text"
> class(my_numeric)
[1] "numeric"
> class(my_logical)
[1] "logical"
> class(my_character)
[1] "character"

Creating vectors

To create a vector use the c() function

> numeric_vector <- c(3, 5, 7, 11)
> numeric_vector
[1]  3  5  7 11
> logical_vector <- c(TRUE, FALSE, TRUE, TRUE)
> logical_vector
[1]  TRUE FALSE  TRUE  TRUE
> character_vector <- c("Alan", "Bob", "Charlie")
> character_vector
[1] "Alan"    "Bob"     "Charlie"

Naming a vector

> # Predicted temperature in Boone 8/21/17 - 8/25/17
> predicted <- c(65, 75, 74, 71, 70)
> predicted
[1] 65 75 74 71 70
> days <- c("Monday", "Tuesday", "Wednesday", "Thursday", "Friday")
> days
[1] "Monday"    "Tuesday"   "Wednesday" "Thursday"  "Friday"   
> names(predicted) <- days
> predicted
   Monday   Tuesday Wednesday  Thursday    Friday 
       65        75        74        71        70 

Vector selection

Find the days it is predicted to be above 72.

> predicted > 72
   Monday   Tuesday Wednesday  Thursday    Friday 
    FALSE      TRUE      TRUE     FALSE     FALSE 
> predicted[predicted > 72]
  Tuesday Wednesday 
       75        74 

Matrices

  • A matrix is a collection of the same data type (numeric, character, logical, or complex) arranged into a fixed number of rows and columns.
  • Matrices in R are column major.
  • Use the matrix() function to construct a matrix.
> a_vector <- 1:9
> a_matrix <- matrix(a_vector, nrow = 3)
> a_matrix
     [,1] [,2] [,3]
[1,]    1    4    7
[2,]    2    5    8
[3,]    3    6    9

Naming a matrix

> row_names <- c("A", "B", "C")
> col_names <- c("X", "Y", "Z")
> dimnames(a_matrix) <- list(row_names, col_names)
> a_matrix
  X Y Z
A 1 4 7
B 2 5 8
C 3 6 9
> a_matrix[2, 2] # or a_matrix["B", "Y"]
[1] 5

Factors

  • A factor is a statistical data type used to store categorical variables.
> year_group <- c("Freshmen", "Junior", "Junior", "Senior", 
+                 "Sophmore")
> class(year_group)
[1] "character"
> factor_year_group <- factor(year_group)
> factor_year_group
[1] Freshmen Junior   Junior   Senior   Sophmore
Levels: Freshmen Junior Senior Sophmore

Factors (continued)

> # Problem with the levels
> factor_year_group <- factor(year_group, ordered = TRUE, 
+                             levels = c("Freshmen", "Sophmore", 
+                                        "Junior", "Senior"))
> factor_year_group
[1] Freshmen Junior   Junior   Senior   Sophmore
Levels: Freshmen < Sophmore < Junior < Senior
> SE <- factor_year_group[4]
> SO <- factor_year_group[5]
> SE > SO
[1] TRUE

Data frame

A data frame is similar to a matrix in the sense that it is a rectangular structure used to store information. It is different in that all elements of a matrix must be of the same mode (numeric, character, etc.), but this restriction does not apply to data frames. That is, data frames have a two-dimensional structure with rows (experimental units) and columns (variables) where all columns have the same number of rows, which have unique names; yet the column (variables) in a data frame are not required to be of the same mode.

Data frame (2)

Another way to think of a data frame is as a list with the restriction that all its components are equal length vectors.

  • To create a data frame use the data.frame() function.
> nv <- c(1, 3, 6, 8)
> cv <- c("a", "v", "f", "p")
> lv <- c(TRUE, FALSE, FALSE, TRUE)
> DF1 <- data.frame(nv, cv, lv)
> DF1
  nv cv    lv
1  1  a  TRUE
2  3  v FALSE
3  6  f FALSE
4  8  p  TRUE

Selecting data frame elements

> DF1$nv
[1] 1 3 6 8
> DF1$cv
[1] "a" "v" "f" "p"
> DF1$lv[3]
[1] FALSE

Selecting data frame elements (2)

> DF1[1:2, "nv"]     # rows 1 and 2 of column "nv"
[1] 1 3
> # or
> DF1[1:2, 1]
[1] 1 3
> DF1[ , "cv"]       # all rows of "cv"
[1] "a" "v" "f" "p"

Using subset()

> subset(DF1, subset = lv == TRUE)
  nv cv   lv
1  1  a TRUE
4  8  p TRUE
> # Or
> DF1[lv == TRUE, ]
  nv cv   lv
1  1  a TRUE
4  8  p TRUE

Using subset() (2)

> subset(DF1, subset = nv < 5)
  nv cv    lv
1  1  a  TRUE
2  3  v FALSE
> # Or
> DF1[nv < 5, ]
  nv cv    lv
1  1  a  TRUE
2  3  v FALSE

Using order()

order() is a function that gives you the ranked position of each element when it is applied on a variable.

> num_vec <- c(13, 17, 1, 31, 45)
> order(num_vec)
[1] 3 1 2 4 5
> num_vec[order(num_vec)]
[1]  1 13 17 31 45

Using order() to sort a data frame

> head(ChickWeight, n = 3)
  weight Time Chick Diet
1     42    0     1    1
2     51    2     1    1
3     59    4     1    1
> head(ChickWeight[order(ChickWeight$Time, ChickWeight$weight), ], 
+      n = 5)
    weight Time Chick Diet
195     39    0    18    1
293     39    0    27    2
305     39    0    28    2
317     39    0    29    2
365     39    0    33    3

Lists

A list is an object whose elements can be of different modes (character, numeric, logical, etc.). Lists are used to unite related data that have different modes. The objects in a list can be matrices, vectors, data frames, and even other lists.

  • To create a list use the list() function.
> my_vector <- 1:5
> my_matrix <- matrix(1:9, byrow = TRUE, nrow = 3)
> my_df <- DF1
> my_list <- list(my_vector, my_matrix, my_df)

Lists (2)

> my_list
[[1]]
[1] 1 2 3 4 5

[[2]]
     [,1] [,2] [,3]
[1,]    1    2    3
[2,]    4    5    6
[3,]    7    8    9

[[3]]
  nv cv    lv
1  1  a  TRUE
2  3  v FALSE
3  6  f FALSE
4  8  p  TRUE

Creating a named list

> my_list2 <- list(Vector = my_vector, Matrix = my_matrix)
> my_list2
$Vector
[1] 1 2 3 4 5

$Matrix
     [,1] [,2] [,3]
[1,]    1    2    3
[2,]    4    5    6
[3,]    7    8    9

Selecting elements from a list

One way to select a component is using a numbered position of that component. For example, to “extract” the second component of my_list2, enter my_list2[[2]].

> my_list2[[2]]
     [,1] [,2] [,3]
[1,]    1    2    3
[2,]    4    5    6
[3,]    7    8    9

Selecting elements from a list (2)

> my_list2[["Matrix"]]
     [,1] [,2] [,3]
[1,]    1    2    3
[2,]    4    5    6
[3,]    7    8    9
> # or
> my_list2$Matrix
     [,1] [,2] [,3]
[1,]    1    2    3
[2,]    4    5    6
[3,]    7    8    9