To summarise the missing values in a time series object it can be useful to calculate the number of missing values in a given time period. miss_var_span takes a data.frame object, a variable, and a span_every argument and returns a dataframe containing the number of missing values within each span. When the number of observations isn't a perfect multiple of the span length, the final span is whatever the last remainder is. For example, the pedestrian dataset has 37,700 rows. If the span is set to 4000, then there will be 1700 rows remaining. This can be provided using modulo (%%): nrow(data) %% 4000. This remainder number is provided in n_in_span.

miss_var_span(data, var, span_every)

Arguments

data

data.frame

var

bare unquoted variable name of interest.

span_every

integer describing the length of the span to be explored

Value

dataframe with variables n_miss, n_complete, prop_miss, and prop_complete, which describe the number, or proportion of missing or complete values within that given time span. The final variable, n_in_span states how many observations are in the span.

See also

Examples

miss_var_span(data = pedestrian, var = hourly_counts, span_every = 168)
#> # A tibble: 225 x 6 #> span_counter n_miss n_complete prop_miss prop_complete n_in_span #> <int> <int> <int> <dbl> <dbl> <int> #> 1 1 0 168 0 1 168 #> 2 2 0 168 0 1 168 #> 3 3 0 168 0 1 168 #> 4 4 0 168 0 1 168 #> 5 5 0 168 0 1 168 #> 6 6 0 168 0 1 168 #> 7 7 0 168 0 1 168 #> 8 8 0 168 0 1 168 #> 9 9 0 168 0 1 168 #> 10 10 0 168 0 1 168 #> # … with 215 more rows
if (FALSE) { library(dplyr) pedestrian %>% group_by(month) %>% miss_var_span(var = hourly_counts, span_every = 168) }