# Re-computing the levels of all factor columns in a data frame

## Problem

You want to re-compute factor levels of all factor columns in a data frame.

## Solution

Sometimes after reading in data and cleaning it, you will end up with factor columns that have levels that should no longer be there.

For example, `d`

below has one blank row. When it’s read in, the factor columns have a level `""`

, which shouldn’t be part of the data.

```
d <- read.csv(header = TRUE, text='
x,y,value
a,one,1
,,5
b,two,4
c,three,10
')
d
#> x y value
#> 1 a one 1
#> 2 5
#> 3 b two 4
#> 4 c three 10
str(d)
#> 'data.frame': 4 obs. of 3 variables:
#> $ x : Factor w/ 4 levels "","a","b","c": 2 1 3 4
#> $ y : Factor w/ 4 levels "","one","three",..: 2 1 4 3
#> $ value: int 1 5 4 10
```

Even after removing the empty row, the factors still have the blank string `""`

as a level:

```
# Remove second row
d <- d[-2,]
d
#> x y value
#> 1 a one 1
#> 3 b two 4
#> 4 c three 10
str(d)
#> 'data.frame': 3 obs. of 3 variables:
#> $ x : Factor w/ 4 levels "","a","b","c": 2 3 4
#> $ y : Factor w/ 4 levels "","one","three",..: 2 4 3
#> $ value: int 1 4 10
```

### With `droplevels`

The simplest way is to use the `droplevels()`

function:

```
d1 <- droplevels(d)
str(d1)
#> 'data.frame': 3 obs. of 3 variables:
#> $ x : Factor w/ 3 levels "a","b","c": 1 2 3
#> $ y : Factor w/ 3 levels "one","three",..: 1 3 2
#> $ value: int 1 4 10
```

### With `vapply`

and `lapply`

To re-compute the levels for all factor columns, we can use `vapply()`

with `is.factor()`

to find out which of columns are factors, and then use that information with `lapply`

to apply the `factor()`

function to those columns.

```
# Find which columns are factors
factor_cols <- vapply(d, is.factor, logical(1))
# Apply the factor() function to those columns, and assign then back into d
d[factor_cols] <- lapply(d[factor_cols], factor)
str(d)
#> 'data.frame': 3 obs. of 3 variables:
#> $ x : Factor w/ 3 levels "a","b","c": 1 2 3
#> $ y : Factor w/ 3 levels "one","three",..: 1 3 2
#> $ value: int 1 4 10
```

## See also

