Provide a summary for each variable of the number and percent missings, ordering by the most missings in each variable.

miss_var_summary(data, order = FALSE, ...)

Arguments

data

a data.frame

order

a logical indicating whether or not to order the result by n_miss. TRUE orders from largest to smallest n_miss, and FALSE orders by order provided by the data.

...

extra arguments

Value

a tibble of the percent of missing data in each variable

See also

Examples

miss_var_summary(airquality)
#> # A tibble: 6 x 4 #> variable n_miss pct_miss n_miss_cumsum #> <chr> <int> <dbl> <int> #> 1 Ozone 37 24.2 37 #> 2 Solar.R 7 4.58 44 #> 3 Wind 0 0 44 #> 4 Temp 0 0 44 #> 5 Month 0 0 44 #> 6 Day 0 0 44
miss_var_summary(oceanbuoys, order = TRUE)
#> # A tibble: 8 x 4 #> variable n_miss pct_miss n_miss_cumsum #> <chr> <int> <dbl> <int> #> 1 humidity 93 12.6 177 #> 2 air_temp_c 81 11.0 84 #> 3 sea_temp_c 3 0.408 3 #> 4 year 0 0 0 #> 5 latitude 0 0 0 #> 6 longitude 0 0 0 #> 7 wind_ew 0 0 177 #> 8 wind_ns 0 0 177
# works with group_by from dplyr library(dplyr) airquality %>% group_by(Month) %>% miss_var_summary()
#> # A tibble: 25 x 5 #> Month variable n_miss pct_miss n_miss_cumsum #> <int> <chr> <int> <dbl> <int> #> 1 5 Ozone 5 16.1 5 #> 2 5 Solar.R 4 12.9 9 #> 3 5 Wind 0 0 9 #> 4 5 Temp 0 0 9 #> 5 5 Day 0 0 9 #> 6 6 Ozone 21 70 21 #> 7 6 Solar.R 0 0 21 #> 8 6 Wind 0 0 21 #> 9 6 Temp 0 0 21 #> 10 6 Day 0 0 21 #> # ... with 15 more rows