Skip to contents

It us useful to find the number of missing values that occur in a single run. The function, miss_var_run(), returns a dataframe with the column names "run_length" and "is_na", which describe the length of the run, and whether that run describes a missing value.

Usage

miss_var_run(data, var)

Arguments

data

data.frame

var

a bare variable name

Value

dataframe with column names "run_length" and "is_na", which describe the length of the run, and whether that run describes a missing value.

Examples


miss_var_run(pedestrian, hourly_counts)
#> # A tibble: 35 × 2
#>    run_length is_na   
#>         <int> <chr>   
#>  1       6628 complete
#>  2          1 missing 
#>  3       5250 complete
#>  4        624 missing 
#>  5       3652 complete
#>  6          1 missing 
#>  7       1290 complete
#>  8        744 missing 
#>  9       7420 complete
#> 10          1 missing 
#> # ℹ 25 more rows

if (FALSE) {
# find the number of runs missing/complete for each month
library(dplyr)


pedestrian %>%
  group_by(month) %>%
  miss_var_run(hourly_counts)

library(ggplot2)

# explore the number of missings in a given run
miss_var_run(pedestrian, hourly_counts) %>%
  filter(is_na == "missing") %>%
  count(run_length) %>%
  ggplot(aes(x = run_length,
             y = n)) +
      geom_col()

# look at the number of missing values and the run length of these.
miss_var_run(pedestrian, hourly_counts) %>%
  ggplot(aes(x = is_na,
             y = run_length)) +
      geom_boxplot()

# using group_by
 pedestrian %>%
   group_by(month) %>%
   miss_var_run(hourly_counts)
}