Problem

You want to compare two vectors or factors but want comparisons with NA’s to be reported as TRUE or FALSE (instead of NA).

Solution

Suppose you have this data frame with two columns which consist of boolean vectors:

df <- data.frame( a=c(TRUE,TRUE,TRUE,FALSE,FALSE,FALSE,NA,NA,NA),
                  b=c(TRUE,FALSE,NA,TRUE,FALSE,NA,TRUE,FALSE,NA))
df
#>       a     b
#> 1  TRUE  TRUE
#> 2  TRUE FALSE
#> 3  TRUE    NA
#> 4 FALSE  TRUE
#> 5 FALSE FALSE
#> 6 FALSE    NA
#> 7    NA  TRUE
#> 8    NA FALSE
#> 9    NA    NA

Normally, when you compare two vectors or factors containing NA values, the vector of results will have NAs where either of the original items was NA. Depending on your purposes, this may or not be desirable.

df$a == df$b
#> [1]  TRUE FALSE    NA FALSE  TRUE    NA    NA    NA    NA

# The same comparison, but presented as another column in the data frame:
data.frame(df, isSame = (df$a==df$b))
#>       a     b isSame
#> 1  TRUE  TRUE   TRUE
#> 2  TRUE FALSE  FALSE
#> 3  TRUE    NA     NA
#> 4 FALSE  TRUE  FALSE
#> 5 FALSE FALSE   TRUE
#> 6 FALSE    NA     NA
#> 7    NA  TRUE     NA
#> 8    NA FALSE     NA
#> 9    NA    NA     NA

A function for comparing with NA’s

This comparison function will essentially treat NA’s as just another value. If an item in both vectors is NA, then it reports TRUE for that item; if the item is NA in just one vector, it reports FALSE; all other comparisons (between non-NA items) behaves the same.

# This function returns TRUE wherever elements are the same, including NA's,
# and FALSE everywhere else.
compareNA <- function(v1,v2) {
    same <- (v1 == v2) | (is.na(v1) & is.na(v2))
    same[is.na(same)] <- FALSE
    return(same)
}

Examples of the function in use

Comparing boolean vectors:

compareNA(df$a, df$b)
#> [1]  TRUE FALSE FALSE FALSE  TRUE FALSE FALSE FALSE  TRUE

# Same comparison, presented as another column
data.frame(df, isSame = compareNA(df$a,df$b))
#>       a     b isSame
#> 1  TRUE  TRUE   TRUE
#> 2  TRUE FALSE  FALSE
#> 3  TRUE    NA  FALSE
#> 4 FALSE  TRUE  FALSE
#> 5 FALSE FALSE   TRUE
#> 6 FALSE    NA  FALSE
#> 7    NA  TRUE  FALSE
#> 8    NA FALSE  FALSE
#> 9    NA    NA   TRUE

It also works with factors, even if the levels of the factors are in different orders:

# Create sample data frame with factors.
df1 <- data.frame(a = factor(c('x','x','x','y','y','y', NA, NA, NA)),
                  b = factor(c('x','y', NA,'x','y', NA,'x','y', NA)))

# Do the comparison
data.frame(df1, isSame = compareNA(df1$a, df1$b))
#>      a    b isSame
#> 1    x    x   TRUE
#> 2    x    y  FALSE
#> 3    x <NA>  FALSE
#> 4    y    x  FALSE
#> 5    y    y   TRUE
#> 6    y <NA>  FALSE
#> 7 <NA>    x  FALSE
#> 8 <NA>    y  FALSE
#> 9 <NA> <NA>   TRUE


# It still works if the factor levels are arranged in a different order
df1$b <- factor(df1$b, levels=c('y','x'))
data.frame(df1, isSame = compareNA(df1$a, df1$b))
#>      a    b isSame
#> 1    x    x   TRUE
#> 2    x    y  FALSE
#> 3    x <NA>  FALSE
#> 4    y    x  FALSE
#> 5    y    y   TRUE
#> 6    y <NA>  FALSE
#> 7 <NA>    x  FALSE
#> 8 <NA>    y  FALSE
#> 9 <NA> <NA>   TRUE