Logic and types in R

Lecture 02

Dr. Colin Rundel

In R (almost)
everything is a vector

Vectors

The fundamental building block of data in R are vectors (collections of related values, objects, data structures, etc).

R has two types of vectors:

atomic vectors (vectors)
- homogeneous collections of the same type (e.g. all true/false values, all numbers, or all character strings).
generic vectors (lists)
- heterogeneous collections of any type of R object, even other lists (meaning they can have a hierarchical/tree-like structure).

Atomic Vectors

R has six atomic vector types, we can check the type of any object in R using the typeof() function

`typeof()`	`mode()`
logical	logical
double	numeric
integer	numeric
character	character
complex	complex
raw	raw

`logical` - boolean values (`TRUE` and `FALSE`)

typeof(TRUE)

[1] "logical"

typeof(FALSE)

[1] "logical"

mode(TRUE)

[1] "logical"

mode(FALSE)

[1] "logical"

R will let you use T and F as shortcuts to TRUE and FALSE, this is a bad practice as these values are actually global variables that can be overwritten.

[1] TRUE

T = FALSE
T

[1] FALSE

`character` - text strings

Either single or double quotes are fine, opening and closing quote must match.

typeof("hello")

[1] "character"

typeof('world')

[1] "character"

mode("hello")

[1] "character"

mode('world')

[1] "character"

Quote characters can be included by escaping or using a non-matching quote.

"abc'123"

[1] "abc'123"

'abc"123'

[1] "abc\"123"

"abc\"123"

[1] "abc\"123"

'abc\'123'

[1] "abc'123"

Numeric types

double - floating point values (these are the default numerical type)

typeof(1.33)

[1] "double"

typeof(7)

[1] "double"

mode(1.33)

[1] "numeric"

mode(7)

[1] "numeric"

integer - integer values (literals are indicated with an L suffix)

typeof( 7L )

[1] "integer"

typeof( 1:3 )

[1] "integer"

mode( 7L )

[1] "numeric"

mode( 1:3 )

[1] "numeric"

Concatenation

Atomic vectors can be grown (combined) using the combine c() function.

c(1, 2, 3)

[1] 1 2 3

c("Hello", "World!")

[1] "Hello"  "World!"

c(1, 1:10)

 [1]  1  1  2  3  4  5  6  7  8  9 10

c(1,c(2, c(3)))

[1] 1 2 3

Inspecting types

typeof(x) - returns a character vector (length 1) of the type of object x.
mode(x) - returns a character vector (length 1) of the mode of object x.

typeof(1)

[1] "double"

typeof(1L)

[1] "integer"

typeof("A")

[1] "character"

typeof(TRUE)

[1] "logical"

mode(1)

[1] "numeric"

mode(1L)

[1] "numeric"

mode("A")

[1] "character"

mode(TRUE)

[1] "logical"

Type predicates

is.logical(x) - returns TRUE if x has type logical.
is.character(x) - returns TRUE if x has type character.
is.double(x) - returns TRUE if x has type double.
is.integer(x) - returns TRUE if x has type integer.
is.numeric(x) - returns TRUE if x has mode numeric.

is.integer(1)

[1] FALSE

is.integer(1L)

[1] TRUE

is.integer(3:7)

[1] TRUE

is.double(1)

[1] TRUE

is.double(1L)

[1] FALSE

is.double(3:8)

[1] FALSE

is.numeric(1)

[1] TRUE

is.numeric(1L)

[1] TRUE

is.numeric(3:7)

[1] TRUE

Other useful predicates

is.atomic(x) - returns TRUE if x is an atomic vector.
is.list(x) - returns TRUE if x is a list (generic vector).
is.vector(x) - returns TRUE if x is either an atomic or generic vector.

is.atomic(c(1,2,3))

[1] TRUE

is.list(c(1,2,3))

[1] FALSE

is.vector(c(1,2,3))

[1] TRUE

is.atomic(list(1,2,3))

[1] FALSE

is.list(list(1,2,3))

[1] TRUE

is.vector(list(1,2,3))

[1] TRUE

Type Coercion

R is a dynamically typed language – it will automatically convert between most types without raising warnings or errors. Keep in mind that atomic vectors must always contain values of the same type.

c(1, "Hello")

[1] "1"     "Hello"

c(FALSE, 3L)

[1] 0 3

c(1.2, 3L)

[1] 1.2 3.0

c(FALSE, "Hello")

[1] "FALSE" "Hello"

Operator coercion

Builtin operators and functions (e.g. +, &, log(), etc.) will generally attempt to coerce values to an appropriate type for the given operation

3.1+1L

[1] 4.1

5 + FALSE

[1] 5

log(1)

[1] 0

log(TRUE)

[1] 0

TRUE & FALSE

[1] FALSE

TRUE & 7

[1] TRUE

TRUE | FALSE

[1] TRUE

FALSE | !5

[1] FALSE

Explicit Coercion

Most of the is functions we just saw have an as variant which can be used for explicit coercion.

as.logical(5.2)

[1] TRUE

as.character(TRUE)

[1] "TRUE"

as.integer(pi)

[1] 3

as.numeric(FALSE)

[1] 0

as.double("7.2")

[1] 7.2

as.double("one")

[1] NA

Missing Values

R uses NA to represent missing values in its data structures, what may not be obvious is that there are different NAs for different atomic types.

typeof(NA)

[1] "logical"

typeof(NA+1)

[1] "double"

typeof(NA+1L)

[1] "integer"

typeof(c(NA,""))

[1] "character"

typeof(NA_character_)

[1] "character"

typeof(NA_real_)

[1] "double"

typeof(NA_integer_)

[1] "integer"

typeof(NA_complex_)

[1] "complex"

NA “stickiness”

Because NAs represent missing values it makes sense that any calculation using them should also be missing.

1 + NA

[1] NA

1 / NA

[1] NA

NA * 5

[1] NA

sqrt(NA)

[1] NA

3^NA

[1] NA

sum(c(1, 2, 3, NA))

[1] NA

Summarizing functions (e.g. sum(), mean(), sd(), etc.) will often have a na.rm argument which will allow you to drop missing values.

sum(c(1, 2, 3, NA), na.rm = TRUE)

[1] 6

mean(c(1, 2, 3, NA), na.rm = TRUE)

[1] 2

NAs are not always sticky

A useful mental model for NAs is to consider them as a unknown value that could take any of the possible values for a type.

For numbers or characters this isn’t very helpful, but for a logical value we know that the value must either be TRUE or FALSE and we can use that when deciding what value to return.

TRUE & NA

[1] NA

FALSE & NA

[1] FALSE

TRUE | NA

[1] TRUE

FALSE | NA

[1] NA

Other Special values (double)

These are defined as part of the IEEE floating point standard (not unique to R)

NaN - Not a number
Inf - Positive infinity
-Inf - Negative infinity

pi / 0

[1] Inf

0 / 0

[1] NaN

1/0 + 1/0

[1] Inf

1/0 - 1/0

[1] NaN

NaN / NA

[1] NA

NaN * NA

[1] NA

Testing for `Inf` and `NaN`

NaN and Inf don’t have the same testing issues that NAs do, but there are still convenience functions for testing for these types of values

is.finite(Inf)

[1] FALSE

is.infinite(-Inf)

[1] TRUE

is.nan(Inf)

[1] FALSE

is.nan(-Inf)

[1] FALSE

Inf > 1

[1] TRUE

-Inf > 1

[1] FALSE

is.finite(NaN)

[1] FALSE

is.infinite(NaN)

[1] FALSE

is.nan(NaN)

[1] TRUE

is.finite(NA)

[1] FALSE

is.infinite(NA)

[1] FALSE

is.nan(NA)

[1] FALSE

Coercion for infinity and NaN

First remember that Inf, -Inf, and NaN are doubles, however their coercion behavior is not the same as other doubles

as.integer(Inf)

[1] NA

as.integer(NaN)

[1] NA

as.logical(Inf)

[1] TRUE

as.logical(-Inf)

[1] TRUE

as.logical(NaN)

[1] NA

as.character(Inf)

[1] "Inf"

as.character(-Inf)

[1] "-Inf"

as.character(NaN)

[1] "NaN"

Exercise 1

Part 1

What is the type of the following vectors? Explain why they have that type.

c(1, NA+1L, "C")
c(1L / 0, NA)
c(1:3, 5)
c(3L, NaN+1L)
c(NA, TRUE)

Part 2

Considering only the four (common) data types, what is R’s implicit type conversion hierarchy (from highest priority to lowest priority)?

05:00

Conditionals & Control Flow

Logical (boolean) operators

Operator	Operation	Vectorized?
`x \| y`	or	Yes
`x & y`	and	Yes
`!x`	not	Yes
`x \|\| y`	or	No
`x && y`	and	No
`xor(x, y)`	exclusive or	Yes

Vectorized?

x = c(TRUE,FALSE,TRUE)
y = c(FALSE,TRUE,TRUE)

x | y

[1] TRUE TRUE TRUE

x & y

[1] FALSE FALSE  TRUE

x || y

Error in x || y: 'length = 3' in coercion to 'logical(1)'

x && y

Error in x && y: 'length = 3' in coercion to 'logical(1)'

Vectorization and math

Almost all of the basic mathematical operations (and many other functions) in R are vectorized.

c(1, 2, 3) + c(3, 2, 1)

[1] 4 4 4

c(1, 2, 3) / c(3, 2, 1)

[1] 0.3333333 1.0000000 3.0000000

log(c(1, 3, 0))

[1] 0.000000 1.098612     -Inf

sin(c(1, 2, 3))

[1] 0.8414710 0.9092974 0.1411200

Length coercion (aka recycling)

If the lengths of the vector do not match, then the shorter vector has its values recycled to match the length of the longer vector.

x = c(TRUE, FALSE, TRUE)
y = c(TRUE)
z = c(FALSE, TRUE)

x | y

[1] TRUE TRUE TRUE

x & y

[1]  TRUE FALSE  TRUE

y | z

[1] TRUE TRUE

y & z

[1] FALSE  TRUE

x | z

[1] TRUE TRUE TRUE

Length coercion and math

The same length coercion rules apply for most basic mathematical operators,

x = c(1, 2, 3)
y = c(5, 4)
z = 10L

x + x

[1] 2 4 6

x + z

[1] 11 12 13

y / z

[1] 0.5 0.4

log(x)+z

[1] 10.00000 10.69315 11.09861

x %% y

[1] 1 2 3

Comparison operators

Operator	Comparison	Vectorized?
`x < y`	less than	Yes
`x > y`	greater than	Yes
`x <= y`	less than or equal to	Yes
`x >= y`	greater than or equal to	Yes
`x != y`	not equal to	Yes
`x == y`	equal to	Yes
`x %in% y`	contains	Yes (over `x`)

Comparisons

x = c("A","B","C")
y = c("A")

x == y

[1]  TRUE FALSE FALSE

x != y

[1] FALSE  TRUE  TRUE

x %in% y

[1]  TRUE FALSE FALSE

y %in% x

[1] TRUE

Type coercion also applies for comparison opperators which can result in interesting behavior

TRUE == "TRUE"

[1] TRUE

FALSE == 1

[1] FALSE

TRUE == 1

[1] TRUE

TRUE == 5

[1] FALSE

`>` & `<` with characters

While maybe somewhat unexpected, these comparison operators can be used character values.

"A" < "B"

[1] TRUE

"A" > "B"

[1] FALSE

"A" < "a"

[1] FALSE

"a" > "!"

[1] TRUE

"Good" < "Goodbye"

[1] TRUE

c("Alice", "Bob", "Carol") <= "B"

[1]  TRUE FALSE FALSE

Conditional Control Flow

Conditional execution of code blocks is achieved via if statements.

x = c(1, 3)

if (3 %in% x) {
  print("Contains 3!")
}

[1] "Contains 3!"

if (1 %in% x)
  print("Contains 1!")

[1] "Contains 1!"

if (5 %in% x) {
  print("Contains 5!")
}

if (5 %in% x) {
  print("Contains 5!")
} else {
  print("Does not contain 5!")
}

[1] "Does not contain 5!"

`if` is not vectorized

x = c(1, 3)

if (x == 1)
  print("x is 1!")

Error in if (x == 1) print("x is 1!"): the condition has length > 1

if (x == 3)
  print("x is 3!")

Error in if (x == 3) print("x is 3!"): the condition has length > 1

Collapsing logical vectors

There are a couple of helper functions for collapsing a logical vector down to a single value: any, all

x = c(3,4,1)

x >= 2

[1]  TRUE  TRUE FALSE

any(x >= 2)

[1] TRUE

all(x >= 2)

[1] FALSE

x <= 4

[1] TRUE TRUE TRUE

any(x <= 4)

[1] TRUE

all(x <= 4)

[1] TRUE

if (any(x == 3)) 
  print("x contains 3!")

[1] "x contains 3!"

`else if` and `else`

x = 3

if (x < 0) {
  "x is negative"
} else if (x > 0) {
  "x is positive"
} else {
  "x is zero"
}

[1] "x is positive"

x = 0

if (x < 0) {
  "x is negative"
} else if (x > 0) {
  "x is positive"
} else {
  "x is zero"
}

[1] "x is zero"

`if` and `return`

R’s if conditional statements return a value (invisibly), the two following implementations are equivalent.

x = 5

s = if (x %% 2 == 0) {
  x / 2
} else {
  3*x + 1
}

[1] 16

x = 5

if (x %% 2 == 0) {
  s = x / 2
} else {
  s = 3*x + 1
}

[1] 16

Exercise 2

Take a look at the following code below on the left, without running it in R what do you expect the outcome will be for each call on the right?

f = function(x) {
  # Check small prime
  if (x > 10 || x < -10) {
    stop("Input too big")
  } else if (x %in% c(2, 3, 5, 7)) {
    cat("Input is prime!\n")
  } else if (x %% 2 == 0) {
    cat("Input is even!\n")
  } else if (x %% 2 == 1) {
    cat("Input is odd!\n")
  }
}

f(1)
f(3)
f(8)
f(-1)
f(-3)
f(1:2)
f("0")
f("3")
f("zero")

05:00

Conditionals and missing values

NAs can be particularly problematic for control flow,

if (2 != NA) {
  "Here"
}

Error in if (2 != NA) {: missing value where TRUE/FALSE needed

2 != NA

[1] NA

if (all(c(1,2,NA,4) >= 1)) {
  "There"
}

Error in if (all(c(1, 2, NA, 4) >= 1)) {: missing value where TRUE/FALSE needed

all(c(1,2,NA,4) >= 1)

[1] NA

if (any(c(1,2,NA,4) >= 1)) {
  "There"
}

[1] "There"

any(c(1,2,NA,4) >= 1)

[1] TRUE

Testing for `NA`

To explicitly test if a value is missing it is necessary to use is.na (often along with any or all).

NA == NA

[1] NA

is.na(NA)

[1] TRUE

is.na(1)

[1] FALSE

is.na(c(1,2,3,NA))

[1] FALSE FALSE FALSE  TRUE

any(is.na(c(1,2,3,NA)))

[1] TRUE

all(is.na(c(1,2,3,NA)))

[1] FALSE

Logic and types in R

In R (almost) everything is a vector

Vectors

Atomic Vectors

Atomic Vectors

logical - boolean values (TRUE and FALSE)

character - text strings

Numeric types

Concatenation

Inspecting types

Type predicates

Other useful predicates

Type Coercion

Operator coercion

Explicit Coercion

Missing Values

Missing Values

NA “stickiness”

NAs are not always sticky

Other Special values (double)

Testing for Inf and NaN

Coercion for infinity and NaN

Exercise 1

Part 1

Part 2

Conditionals & Control Flow

Logical (boolean) operators

Vectorized?

Vectorization and math

Length coercion (aka recycling)

Length coercion and math

Comparison operators

Comparisons

> & < with characters

Conditional Control Flow

if is not vectorized

Collapsing logical vectors

else if and else

if and return

Exercise 2

Conditionals and missing values

Testing for NA

In R (almost)
everything is a vector

`logical` - boolean values (`TRUE` and `FALSE`)

`character` - text strings

Testing for `Inf` and `NaN`

`>` & `<` with characters

`if` is not vectorized

`else if` and `else`

`if` and `return`

Testing for `NA`