[1] "logical"
[1] "logical"
Lecture 02
The fundamental building block of data in R are vectors (collections of related values, objects, data structures, etc).
R has two types of vectors:
atomic vectors (vectors)
true/false values, all numbers, or all character strings).generic vectors (lists)
R has six atomic vector types, we can check the type of any object in R using the typeof() function
typeof() |
mode() |
|---|---|
| logical | logical |
| double | numeric |
| integer | numeric |
| character | character |
| complex | complex |
| raw | raw |
logical - boolean values (TRUE and FALSE)character - text stringsEither single or double quotes are fine, opening and closing quote must match.
double - floating point values (these are the default numerical type)
Atomic vectors can be grown (combined) using the combine c() function.
typeof(x) - returns a character vector (length 1) of the type of object x.
mode(x) - returns a character vector (length 1) of the mode of object x.
is.logical(x) - returns TRUE if x has type logical.is.character(x) - returns TRUE if x has type character.is.double(x) - returns TRUE if x has type double.is.integer(x) - returns TRUE if x has type integer.is.numeric(x) - returns TRUE if x has mode numeric.is.atomic(x) - returns TRUE if x is an atomic vector.is.list(x) - returns TRUE if x is a list (generic vector).is.vector(x) - returns TRUE if x is either an atomic or generic vector.R is a dynamically typed language – it will automatically convert between most types without raising warnings or errors. Keep in mind that atomic vectors must always contain values of the same type.
Builtin operators and functions (e.g. +, &, log(), etc.) will generally attempt to coerce values to an appropriate type for the given operation
Most of the is functions we just saw have an as variant which can be used for explicit coercion.
R uses NA to represent missing values in its data structures, what may not be obvious is that there are different NAs for different atomic types.
Because NAs represent missing values it makes sense that any calculation using them should also be missing.
Summarizing functions (e.g. sum(), mean(), sd(), etc.) will often have a na.rm argument which will allow you to drop missing values.
A useful mental model for NAs is to consider them as a unknown value that could take any of the possible values for a type.
For numbers or characters this isn’t very helpful, but for a logical value we know that the value must either be TRUE or FALSE and we can use that when deciding what value to return.
These are defined as part of the IEEE floating point standard (not unique to R)
NaN - Not a number
Inf - Positive infinity
-Inf - Negative infinity
Inf and NaNNaN and Inf don’t have the same testing issues that NAs do, but there are still convenience functions for testing for these types of values
First remember that Inf, -Inf, and NaN are doubles, however their coercion behavior is not the same as other doubles
What is the type of the following vectors? Explain why they have that type.
c(1, NA+1L, "C")c(1L / 0, NA)c(1:3, 5)c(3L, NaN+1L)c(NA, TRUE)Considering only the four (common) data types, what is R’s implicit type conversion hierarchy (from highest priority to lowest priority)?
05:00
| Operator | Operation | Vectorized? |
|---|---|---|
x | y |
or | Yes |
x & y |
and | Yes |
!x |
not | Yes |
x || y |
or | No |
x && y |
and | No |
xor(x, y) |
exclusive or | Yes |
Almost all of the basic mathematical operations (and many other functions) in R are vectorized.
If the lengths of the vector do not match, then the shorter vector has its values recycled to match the length of the longer vector.
The same length coercion rules apply for most basic mathematical operators,
| Operator | Comparison | Vectorized? |
|---|---|---|
x < y |
less than | Yes |
x > y |
greater than | Yes |
x <= y |
less than or equal to | Yes |
x >= y |
greater than or equal to | Yes |
x != y |
not equal to | Yes |
x == y |
equal to | Yes |
x %in% y |
contains | Yes (over x) |
> & < with charactersWhile maybe somewhat unexpected, these comparison operators can be used character values.
Conditional execution of code blocks is achieved via if statements.
if is not vectorizedThere are a couple of helper functions for collapsing a logical vector down to a single value: any, all
else if and elseif and returnR’s if conditional statements return a value (invisibly), the two following implementations are equivalent.
Take a look at the following code below on the left, without running it in R what do you expect the outcome will be for each call on the right?
05:00
NAs can be particularly problematic for control flow,
NATo explicitly test if a value is missing it is necessary to use is.na (often along with any or all).
Sta 523 - Fall 2023