It us useful to find the number of missing values that occur in a single run. The function, miss_var_run(), returns a dataframe with the column names "run_length" and "is_na", which describe the length of the run, and whether that run describes a missing value.

miss_var_run(data, var)

Arguments

data

data.frame

var

a bare variable name

Value

dataframe with column names "run_length" and "is_na", which describe the length of the run, and whether that run describes a missing value.

See also

Examples

miss_var_run(pedestrian, hourly_counts)
#> # A tibble: 35 x 2 #> run_length is_na #> <int> <chr> #> 1 6628 complete #> 2 1 missing #> 3 5250 complete #> 4 624 missing #> 5 3652 complete #> 6 1 missing #> 7 1290 complete #> 8 744 missing #> 9 7420 complete #> 10 1 missing #> # … with 25 more rows
if (FALSE) { # find the number of runs missing/complete for each month library(dplyr) pedestrian %>% group_by(month) %>% miss_var_run(hourly_counts) library(ggplot2) # explore the number of missings in a given run miss_var_run(pedestrian, hourly_counts) %>% filter(is_na == "missing") %>% count(run_length) %>% ggplot(aes(x = run_length, y = n)) + geom_col() # look at the number of missing values and the run length of these. miss_var_run(pedestrian, hourly_counts) %>% ggplot(aes(x = is_na, y = run_length)) + geom_boxplot() # using group_by pedestrian %>% group_by(month) %>% miss_var_run(hourly_counts) }