You want to properly handle
Sometimes your data will include
NaN. These work somewhat differently from “normal” values, and may require explicit testing.
Here are some examples of comparisons with these values:
x <- NULL x > 5 # logical(0) y <- NA y > 5 # NA z <- NaN z > 5 # NA
Here’s how to test whether a variable has one of these values:
is.null(x) # TRUE is.na(y) # TRUE is.nan(z) # TRUE
NULL is different from the other two.
NULL means that there is no value, while
NaN mean that there is some value, although one that is perhaps not usable. Here’s an illustration of the difference:
# Is y null? is.null(y) # FALSE # Is x NA? is.na(x) # logical(0) # Warning message: # In is.na(x) : is.na() applied to non-(list or vector) of type 'NULL'
In the first case, it checks if
NULL, and the answer is no. In the second case, it tries to check if
x is `NA, but there is no value to be checked.
Ignoring “bad” values in vector summary functions
If you run functions like
sum() on a vector containing
NaN, they will return
NaN, which is generally unhelpful, though this will alert you to the presence of the bad value. Many of these functions take the flag
na.rm, which tells them to ignore these values.
vy <- c(1, 2, 3, NA, 5) # 1 2 3 NA 5 mean(vy) # NA mean(vy, na.rm=TRUE) # 2.75 vz <- c(1, 2, 3, NaN, 5) # 1 2 3 NaN 5 sum(vz) # NaN sum(vz, na.rm=TRUE) # 11 # NULL isn't a problem, because it doesn't exist vx <- c(1, 2, 3, NULL, 5) # 1 2 3 5 sum(vx) # 11
Removing bad values from a vector
These values can be removed from a vector by filtering using
vy # 1 2 3 NA 5 vy[ !is.na(vy) ] # 1 2 3 5 vz # 1 2 3 NaN 5 vz[ !is.nan(vz) ] # 1 2 3 5
There are also the infinite numerical values
-Inf, and the associated functions