[,1] [,2]
[1,] 1 3
[2,] 2 4
[,1] [,2]
[1,] TRUE TRUE
[2,] FALSE FALSE
Lecture 05
R supports the creation of 2d data structures (rows and columns) of atomic vector types.
Generally these are formed via a call to matrix()
.
Matrices in R use column major ordering (data is stored in column order not row order).
Matrices (and arrays) are just atomic vectors with a dim
attribute attached (they do not have a class attribute, but they do have an implicit class(es)).
Arrays are just an \(n\)-dimensional extension of matrices and are defined by adding the appropriate dimension sizes.
A data frame is how R handles heterogeneous tabular data (i.e. a table of rows and columns) and is one of the most commonly used data structure in R.
Previous to R v4.0, the default behavior of data frames was to convert character data into factors. Sometimes this was useful, but mostly it wasn’t.
This behavior is controlled via the stringsAsFactors
argument to data.frame
(and related functions like read.csv
, read.table
, etc.).
When creating a data frame from different vectors, the lengths of the component vectors will be coerced to match. However, if they not multiples of each other then there will be an error (other previous forms of length coercion would produce a warning for this case).
R has three subsetting operators ([
, [[
, and $
). The behavior of these operators will depend on the object (class) they are being used with.
In general there are 6 different types of subsetting that can be performed:
Positive integer
Negative integer
Logical value
Empty / NULL
Zero valued
Character value (names)
Returns elements at the given location(s)
Excludes elements at the given location(s)
Returns elements that correspond to TRUE
in the logical vector. Length of the logical vector is coerced to be the same as the vector being subsetted.
Returns the original vector, this is not the same as subsetting with NULL
Returns an empty vector (of the same type), this is the same as subsetting with NULL
If the vector has names, selects elements whose names correspond to the values in the name vector.
This final type of subsetting follows the rules for length coercion with a 0-length vector (i.e. the vector being subset gets coerced to having length 0 if the subsetting vector has length 0)
[[
and $
)[[
subsets like [
except it can only subset for a single value
Subsets a single value, but returns the value - not a list containing that value. Vectors are interpreted as nested subsetting.
$
is equivalent to [[
but it only works for name based subsetting of named lists (also it uses partial matching for names)
Why does the following code not work?
The expression x$y
gets interpreted as x[["y"]]
by R, note the inclusion of the "
s, this is not the same as the expression x[[y]]
.
Below are 100 values,
x = c(56, 3, 17, 2, 4, 9, 6, 5, 19, 5, 2, 3, 5, 0, 13, 12, 6, 31, 10, 21, 8, 4, 1, 1, 2, 5, 16, 1, 3, 8, 1,
3, 4, 8, 5, 2, 8, 6, 18, 40, 10, 20, 1, 27, 2, 11, 14, 5, 7, 0, 3, 0, 7, 0, 8, 10, 10, 12, 8, 82,
21, 3, 34, 55, 18, 2, 9, 29, 1, 4, 7, 14, 7, 1, 2, 7, 4, 74, 5, 0, 3, 13, 2, 8, 1, 6, 13, 7, 1, 10,
5, 2, 4, 4, 14, 15, 4, 17, 1, 9)
write down how you would create a subset to accomplish each of the following:
Select every third value starting at position 2 in x
.
Remove all values with an odd index (e.g. 1, 3, etc.)
Remove every 4th value, but only if it is odd.
05:00
As data frames have 2 dimensions, we can subset on either the rows or the columns - the subsetting values are separated by a comma.
Most of the time, R’s [
subset operator is a preserving operator, in that the returned object will always have the same type/class as the object being subset.
Confusingly, when used with some classes (e.g. data frame, matrix or array) [
becomes a simplifying operator (does not preserve type) - this behavior is instead controlled by the drop
argument.
drop
only works when the resulting value can be represented as a 1d vector (either a list or atomic).
Type | Simplifying | Preserving |
---|---|---|
Atomic Vector | x[[1]] |
x[1] |
List | x[[1]] |
x[1] |
Matrix / Array | x[[1]] x[1, ] x[, 1] |
x[1, , drop=FALSE] x[, 1, drop=FALSE] |
Factor | x[1:4, drop=TRUE] |
x[1:4] x[[1]] |
Data frame | x[, 1] x[[1]] |
x[, 1, drop=FALSE] x[1] |
Sta 523 - Fall 2023