Provide a summary for each variable of the number, percent missings, and cumulative sum of missings of the order of the variables. By default, it orders by the most missings in each variable.

miss_var_summary(data, order = FALSE, ...)

Arguments

data

a data.frame

order

a logical indicating whether to order the result by n_miss. Defaults to TRUE. If FALSE, order of variables is the order input.

...

extra arguments

Value

a tibble of the percent of missing data in each variable

Note

n_miss_cumsum is calculated as the cumulative sum of missings in the variables in the order that they are given in the data when entering the function

See also

Examples

miss_var_summary(airquality)
#> # A tibble: 6 x 4 #> variable n_miss pct_miss n_miss_cumsum #> <chr> <int> <dbl> <int> #> 1 Ozone 37 24.2 37 #> 2 Solar.R 7 4.58 44 #> 3 Wind 0 0 44 #> 4 Temp 0 0 44 #> 5 Month 0 0 44 #> 6 Day 0 0 44
miss_var_summary(oceanbuoys, order = TRUE)
#> # A tibble: 8 x 4 #> variable n_miss pct_miss n_miss_cumsum #> <chr> <int> <dbl> <int> #> 1 humidity 93 12.6 177 #> 2 air_temp_c 81 11.0 84 #> 3 sea_temp_c 3 0.408 3 #> 4 year 0 0 0 #> 5 latitude 0 0 0 #> 6 longitude 0 0 0 #> 7 wind_ew 0 0 177 #> 8 wind_ns 0 0 177
# works with group_by from dplyr library(dplyr) airquality %>% group_by(Month) %>% miss_var_summary()
#> # A tibble: 25 x 5 #> Month variable n_miss pct_miss n_miss_cumsum #> <int> <chr> <int> <dbl> <int> #> 1 5 Ozone 5 16.1 5 #> 2 5 Solar.R 4 12.9 9 #> 3 5 Wind 0 0 9 #> 4 5 Temp 0 0 9 #> 5 5 Day 0 0 9 #> 6 6 Ozone 21 70 21 #> 7 6 Solar.R 0 0 21 #> 8 6 Wind 0 0 21 #> 9 6 Temp 0 0 21 #> 10 6 Day 0 0 21 #> # ... with 15 more rows