Logic and types in R

Lecture 02

Dr. Colin Rundel

In R (almost)
everything is a vector

Vectors

The fundamental building block of data in R are vectors (collections of related values, objects, data structures, etc).

R has two types of vectors:

  • atomic vectors (vectors)

    • homogeneous collections of the same type (e.g. all true/false values, all numbers, or all character strings).
  • generic vectors (lists)

    • heterogeneous collections of any type of R object, even other lists (meaning they can have a hierarchical/tree-like structure).

Atomic Vectors

Atomic Vectors

R has six atomic vector types, we can check the type of any object in R using the typeof() function

typeof() mode()
logical logical
double numeric
integer numeric
character character
complex complex
raw raw

logical - boolean values (TRUE and FALSE)

typeof(TRUE)
[1] "logical"
typeof(FALSE)
[1] "logical"
mode(TRUE)
[1] "logical"
mode(FALSE)
[1] "logical"

R will let you use T and F as shortcuts to TRUE and FALSE, this is a bad practice as these values are actually global variables that can be overwritten.

T
[1] TRUE
T = FALSE
T
[1] FALSE

character - text strings

Either single or double quotes are fine, opening and closing quote must match.

typeof("hello")
[1] "character"
typeof('world')
[1] "character"
mode("hello")
[1] "character"
mode('world')
[1] "character"

Quote characters can be included by escaping or using a non-matching quote.

"abc'123"
[1] "abc'123"
'abc"123'
[1] "abc\"123"
"abc\"123"
[1] "abc\"123"
'abc\'123'
[1] "abc'123"

Numeric types

double - floating point values (these are the default numerical type)

typeof(1.33)
[1] "double"
typeof(7)
[1] "double"
mode(1.33)
[1] "numeric"
mode(7)
[1] "numeric"

integer - integer values (literals are indicated with an L suffix)

typeof( 7L )
[1] "integer"
typeof( 1:3 )
[1] "integer"
mode( 7L )
[1] "numeric"
mode( 1:3 )
[1] "numeric"

Concatenation

Atomic vectors can be grown (combined) using the combine c() function.

c(1, 2, 3)
[1] 1 2 3
c("Hello", "World!")
[1] "Hello"  "World!"
c(1, 1:10)
 [1]  1  1  2  3  4  5  6  7  8  9 10
c(1,c(2, c(3)))
[1] 1 2 3

Inspecting types

  • typeof(x) - returns a character vector (length 1) of the type of object x.

  • mode(x) - returns a character vector (length 1) of the mode of object x.

typeof(1)
[1] "double"
typeof(1L)
[1] "integer"
typeof("A")
[1] "character"
typeof(TRUE)
[1] "logical"
mode(1)
[1] "numeric"
mode(1L)
[1] "numeric"
mode("A")
[1] "character"
mode(TRUE)
[1] "logical"

Type predicates

  • is.logical(x) - returns TRUE if x has type logical.
  • is.character(x) - returns TRUE if x has type character.
  • is.double(x) - returns TRUE if x has type double.
  • is.integer(x) - returns TRUE if x has type integer.
  • is.numeric(x) - returns TRUE if x has mode numeric.
is.integer(1)
[1] FALSE
is.integer(1L)
[1] TRUE
is.integer(3:7)
[1] TRUE
is.double(1)
[1] TRUE
is.double(1L)
[1] FALSE
is.double(3:8)
[1] FALSE
is.numeric(1)
[1] TRUE
is.numeric(1L)
[1] TRUE
is.numeric(3:7)
[1] TRUE

Other useful predicates

  • is.atomic(x) - returns TRUE if x is an atomic vector.
  • is.list(x) - returns TRUE if x is a list (generic vector).
  • is.vector(x) - returns TRUE if x is either an atomic or generic vector.
is.atomic(c(1,2,3))
[1] TRUE
is.list(c(1,2,3))
[1] FALSE
is.vector(c(1,2,3))
[1] TRUE
is.atomic(list(1,2,3))
[1] FALSE
is.list(list(1,2,3))
[1] TRUE
is.vector(list(1,2,3))
[1] TRUE

Type Coercion

R is a dynamically typed language – it will automatically convert between most types without raising warnings or errors. Keep in mind that atomic vectors must always contain values of the same type.

c(1, "Hello")
[1] "1"     "Hello"
c(FALSE, 3L)
[1] 0 3
c(1.2, 3L)
[1] 1.2 3.0
c(FALSE, "Hello")
[1] "FALSE" "Hello"

Operator coercion

Builtin operators and functions (e.g. +, &, log(), etc.) will generally attempt to coerce values to an appropriate type for the given operation

3.1+1L
[1] 4.1
5 + FALSE
[1] 5
log(1)
[1] 0
log(TRUE)
[1] 0
TRUE & FALSE
[1] FALSE
TRUE & 7
[1] TRUE
TRUE | FALSE
[1] TRUE
FALSE | !5
[1] FALSE

Explicit Coercion

Most of the is functions we just saw have an as variant which can be used for explicit coercion.

as.logical(5.2)
[1] TRUE
as.character(TRUE)
[1] "TRUE"
as.integer(pi)
[1] 3
as.numeric(FALSE)
[1] 0
as.double("7.2")
[1] 7.2
as.double("one")
[1] NA

Missing Values

Missing Values

R uses NA to represent missing values in its data structures, what may not be obvious is that there are different NAs for different atomic types.

typeof(NA)
[1] "logical"
typeof(NA+1)
[1] "double"
typeof(NA+1L)
[1] "integer"
typeof(c(NA,""))
[1] "character"
typeof(NA_character_)
[1] "character"
typeof(NA_real_)
[1] "double"
typeof(NA_integer_)
[1] "integer"
typeof(NA_complex_)
[1] "complex"

NA “stickiness”

Because NAs represent missing values it makes sense that any calculation using them should also be missing.

1 + NA
[1] NA
1 / NA
[1] NA
NA * 5
[1] NA
sqrt(NA)
[1] NA
3^NA
[1] NA
sum(c(1, 2, 3, NA))
[1] NA

Summarizing functions (e.g. sum(), mean(), sd(), etc.) will often have a na.rm argument which will allow you to drop missing values.

sum(c(1, 2, 3, NA), na.rm = TRUE)
[1] 6
mean(c(1, 2, 3, NA), na.rm = TRUE)
[1] 2

NAs are not always sticky

A useful mental model for NAs is to consider them as a unknown value that could take any of the possible values for a type.

For numbers or characters this isn’t very helpful, but for a logical value we know that the value must either be TRUE or FALSE and we can use that when deciding what value to return.

TRUE & NA
[1] NA
FALSE & NA
[1] FALSE
TRUE | NA
[1] TRUE
FALSE | NA
[1] NA

Other Special values (double)

These are defined as part of the IEEE floating point standard (not unique to R)

  • NaN - Not a number

  • Inf - Positive infinity

  • -Inf - Negative infinity


pi / 0
[1] Inf
0 / 0
[1] NaN
1/0 + 1/0
[1] Inf
1/0 - 1/0
[1] NaN
NaN / NA
[1] NA
NaN * NA
[1] NA

Testing for Inf and NaN

NaN and Inf don’t have the same testing issues that NAs do, but there are still convenience functions for testing for these types of values

is.finite(Inf)
[1] FALSE
is.infinite(-Inf)
[1] TRUE
is.nan(Inf)
[1] FALSE
is.nan(-Inf)
[1] FALSE
Inf > 1
[1] TRUE
-Inf > 1
[1] FALSE
is.finite(NaN)
[1] FALSE
is.infinite(NaN)
[1] FALSE
is.nan(NaN)
[1] TRUE
is.finite(NA)
[1] FALSE
is.infinite(NA)
[1] FALSE
is.nan(NA)
[1] FALSE

Coercion for infinity and NaN

First remember that Inf, -Inf, and NaN are doubles, however their coercion behavior is not the same as other doubles

as.integer(Inf)
[1] NA
as.integer(NaN)
[1] NA
as.logical(Inf)
[1] TRUE
as.logical(-Inf)
[1] TRUE
as.logical(NaN)
[1] NA
as.character(Inf)
[1] "Inf"
as.character(-Inf)
[1] "-Inf"
as.character(NaN)
[1] "NaN"

Exercise 1

Part 1

What is the type of the following vectors? Explain why they have that type.

  • c(1, NA+1L, "C")
  • c(1L / 0, NA)
  • c(1:3, 5)
  • c(3L, NaN+1L)
  • c(NA, TRUE)

Part 2

Considering only the four (common) data types, what is R’s implicit type conversion hierarchy (from highest priority to lowest priority)?

05:00

Conditionals & Control Flow

Logical (boolean) operators


Operator Operation Vectorized?
x | y or Yes
x & y and Yes
!x not Yes
x || y or No
x && y and No
xor(x, y) exclusive or Yes

Vectorized?

x = c(TRUE,FALSE,TRUE)
y = c(FALSE,TRUE,TRUE)
x | y
[1] TRUE TRUE TRUE
x & y
[1] FALSE FALSE  TRUE
x || y
Error in x || y: 'length = 3' in coercion to 'logical(1)'
x && y
Error in x && y: 'length = 3' in coercion to 'logical(1)'

Vectorization and math

Almost all of the basic mathematical operations (and many other functions) in R are vectorized.

c(1, 2, 3) + c(3, 2, 1)
[1] 4 4 4
c(1, 2, 3) / c(3, 2, 1)
[1] 0.3333333 1.0000000 3.0000000
log(c(1, 3, 0))
[1] 0.000000 1.098612     -Inf
sin(c(1, 2, 3))
[1] 0.8414710 0.9092974 0.1411200

Length coercion (aka recycling)

If the lengths of the vector do not match, then the shorter vector has its values recycled to match the length of the longer vector.

x = c(TRUE, FALSE, TRUE)
y = c(TRUE)
z = c(FALSE, TRUE)
x | y
[1] TRUE TRUE TRUE
x & y
[1]  TRUE FALSE  TRUE
y | z
[1] TRUE TRUE
y & z
[1] FALSE  TRUE
x | z
[1] TRUE TRUE TRUE

Length coercion and math

The same length coercion rules apply for most basic mathematical operators,

x = c(1, 2, 3)
y = c(5, 4)
z = 10L
x + x
[1] 2 4 6
x + z
[1] 11 12 13
y / z
[1] 0.5 0.4
log(x)+z
[1] 10.00000 10.69315 11.09861
x %% y
[1] 1 2 3

Comparison operators


Operator Comparison Vectorized?
x < y less than Yes
x > y greater than Yes
x <= y less than or equal to Yes
x >= y greater than or equal to Yes
x != y not equal to Yes
x == y equal to Yes
x %in% y contains Yes (over x)

Comparisons

x = c("A","B","C")
y = c("A")
x == y
[1]  TRUE FALSE FALSE
x != y
[1] FALSE  TRUE  TRUE
x %in% y
[1]  TRUE FALSE FALSE
y %in% x
[1] TRUE

Type coercion also applies for comparison opperators which can result in interesting behavior

TRUE == "TRUE"
[1] TRUE
FALSE == 1
[1] FALSE
TRUE == 1
[1] TRUE
TRUE == 5
[1] FALSE

> & < with characters

While maybe somewhat unexpected, these comparison operators can be used character values.

"A" < "B"
[1] TRUE
"A" > "B"
[1] FALSE
"A" < "a"
[1] FALSE
"a" > "!"
[1] TRUE
"Good" < "Goodbye"
[1] TRUE
c("Alice", "Bob", "Carol") <= "B"
[1]  TRUE FALSE FALSE

Conditional Control Flow

Conditional execution of code blocks is achieved via if statements.

x = c(1, 3)
if (3 %in% x) {
  print("Contains 3!")
}
[1] "Contains 3!"
if (1 %in% x)
  print("Contains 1!")
[1] "Contains 1!"
if (5 %in% x) {
  print("Contains 5!")
}
if (5 %in% x) {
  print("Contains 5!")
} else {
  print("Does not contain 5!")
}
[1] "Does not contain 5!"

if is not vectorized

x = c(1, 3)
if (x == 1)
  print("x is 1!")
Error in if (x == 1) print("x is 1!"): the condition has length > 1
if (x == 3)
  print("x is 3!")
Error in if (x == 3) print("x is 3!"): the condition has length > 1

Collapsing logical vectors

There are a couple of helper functions for collapsing a logical vector down to a single value: any, all

x = c(3,4,1)
x >= 2
[1]  TRUE  TRUE FALSE
any(x >= 2)
[1] TRUE
all(x >= 2)
[1] FALSE
x <= 4
[1] TRUE TRUE TRUE
any(x <= 4)
[1] TRUE
all(x <= 4)
[1] TRUE
if (any(x == 3)) 
  print("x contains 3!")
[1] "x contains 3!"

else if and else

x = 3

if (x < 0) {
  "x is negative"
} else if (x > 0) {
  "x is positive"
} else {
  "x is zero"
}
[1] "x is positive"
x = 0

if (x < 0) {
  "x is negative"
} else if (x > 0) {
  "x is positive"
} else {
  "x is zero"
}
[1] "x is zero"

if and return

R’s if conditional statements return a value (invisibly), the two following implementations are equivalent.

x = 5
s = if (x %% 2 == 0) {
  x / 2
} else {
  3*x + 1
}
s
[1] 16
x = 5
if (x %% 2 == 0) {
  s = x / 2
} else {
  s = 3*x + 1
}
s
[1] 16

Exercise 2

Take a look at the following code below on the left, without running it in R what do you expect the outcome will be for each call on the right?

f = function(x) {
  # Check small prime
  if (x > 10 || x < -10) {
    stop("Input too big")
  } else if (x %in% c(2, 3, 5, 7)) {
    cat("Input is prime!\n")
  } else if (x %% 2 == 0) {
    cat("Input is even!\n")
  } else if (x %% 2 == 1) {
    cat("Input is odd!\n")
  }
}
f(1)
f(3)
f(8)
f(-1)
f(-3)
f(1:2)
f("0")
f("3")
f("zero")
05:00

Conditionals and missing values

NAs can be particularly problematic for control flow,

if (2 != NA) {
  "Here"
}
Error in if (2 != NA) {: missing value where TRUE/FALSE needed
2 != NA
[1] NA
if (all(c(1,2,NA,4) >= 1)) {
  "There"
}
Error in if (all(c(1, 2, NA, 4) >= 1)) {: missing value where TRUE/FALSE needed
all(c(1,2,NA,4) >= 1)
[1] NA
if (any(c(1,2,NA,4) >= 1)) {
  "There"
}
[1] "There"
any(c(1,2,NA,4) >= 1)
[1] TRUE

Testing for NA

To explicitly test if a value is missing it is necessary to use is.na (often along with any or all).

NA == NA
[1] NA
is.na(NA)
[1] TRUE
is.na(1)
[1] FALSE
is.na(c(1,2,3,NA))
[1] FALSE FALSE FALSE  TRUE
any(is.na(c(1,2,3,NA)))
[1] TRUE
all(is.na(c(1,2,3,NA)))
[1] FALSE