Problem

You want to re-compute factor levels of all factor columns in a data frame.

Solution

Sometimes after reading in data and cleaning it, you will end up with factor columns that have levels that should no longer be there.

For example, d below has one blank row. When it’s read in, the factor columns have a level "", which shouldn’t be part of the data.

d <- read.csv(header = TRUE, text='
x,y,value
a,one,1
,,5
b,two,4
c,three,10
')

d
#>   x     y value
#> 1 a   one     1
#> 2             5
#> 3 b   two     4
#> 4 c three    10

str(d)
#> 'data.frame':	4 obs. of  3 variables:
#>  $ x    : Factor w/ 4 levels "","a","b","c": 2 1 3 4
#>  $ y    : Factor w/ 4 levels "","one","three",..: 2 1 4 3
#>  $ value: int  1 5 4 10

Even after removing the empty row, the factors still have the blank string "" as a level:

# Remove second row
d <- d[-2,]
d
#>   x     y value
#> 1 a   one     1
#> 3 b   two     4
#> 4 c three    10

str(d)
#> 'data.frame':	3 obs. of  3 variables:
#>  $ x    : Factor w/ 4 levels "","a","b","c": 2 3 4
#>  $ y    : Factor w/ 4 levels "","one","three",..: 2 4 3
#>  $ value: int  1 4 10

With droplevels

The simplest way is to use the droplevels() function:

d1 <- droplevels(d)
str(d1)
#> 'data.frame':	3 obs. of  3 variables:
#>  $ x    : Factor w/ 3 levels "a","b","c": 1 2 3
#>  $ y    : Factor w/ 3 levels "one","three",..: 1 3 2
#>  $ value: int  1 4 10

With vapply and lapply

To re-compute the levels for all factor columns, we can use vapply() with is.factor() to find out which of columns are factors, and then use that information with lapply to apply the factor() function to those columns.

# Find which columns are factors
factor_cols <- vapply(d, is.factor, logical(1))

# Apply the factor() function to those columns, and assign then back into d
d[factor_cols] <- lapply(d[factor_cols], factor)
str(d)
#> 'data.frame':	3 obs. of  3 variables:
#>  $ x    : Factor w/ 3 levels "a","b","c": 1 2 3
#>  $ y    : Factor w/ 3 levels "one","three",..: 1 3 2
#>  $ value: int  1 4 10

See also

For information about re-computing the levels of a factor, see ../Re-computing_the_levels_of_factor.