Summarise the number of missings for a given repeating span on a variable
Source:R/miss-x-span.R
miss_var_span.Rd
To summarise the missing values in a time series object it can be useful to
calculate the number of missing values in a given time period.
miss_var_span
takes a data.frame object, a variable, and a span_every
argument and returns a dataframe containing the number of missing values
within each span. When the number of observations isn't a perfect
multiple of the span length, the final span is whatever the last
remainder is. For example, the pedestrian
dataset has 37,700 rows. If
the span is set to 4000, then there will be 1700 rows remaining. This can
be provided using modulo (%%
): nrow(data) %% 4000
. This remainder
number is provided in n_in_span
.
Arguments
- data
data.frame
- var
bare unquoted variable name of interest.
- span_every
integer describing the length of the span to be explored
Value
dataframe with variables n_miss
, n_complete
, prop_miss
, and
prop_complete
, which describe the number, or proportion of missing or
complete values within that given time span. The final variable,
n_in_span
states how many observations are in the span.
Examples
miss_var_span(data = pedestrian,
var = hourly_counts,
span_every = 168)
#> # A tibble: 225 × 6
#> span_counter n_miss n_complete prop_miss prop_complete n_in_span
#> <int> <int> <int> <dbl> <dbl> <int>
#> 1 1 0 168 0 1 168
#> 2 2 0 168 0 1 168
#> 3 3 0 168 0 1 168
#> 4 4 0 168 0 1 168
#> 5 5 0 168 0 1 168
#> 6 6 0 168 0 1 168
#> 7 7 0 168 0 1 168
#> 8 8 0 168 0 1 168
#> 9 9 0 168 0 1 168
#> 10 10 0 168 0 1 168
#> # ℹ 215 more rows
if (FALSE) {
library(dplyr)
pedestrian %>%
group_by(month) %>%
miss_var_span(var = hourly_counts,
span_every = 168)
}