mcar_test()for Little’s (1988) statistical test for missing completely at random (MCAR) data. The null hypothesis in this test is that the data is MCAR, and the test statistic is a chi-squared value. Given a high statistic value and low p-value, we can conclude data are not missing completely at random. Thanks to Andrew Heiss for the PR.
miss_var_span()(#270) where the number of missings + number of complete values added up to more than the number of rows in the data. This was due to the remainder not being used when calculating the number of complete values.
recode_shadow()(#272) where adding the same special missing value in two subsequent operations fails.
replace_with_nawhen columns provided that don’t exist (see #160). Thank you to michael-dewar for their help with this.
bind_shadow(). In doing so removes the functions,
new_shadow(). These were mostly used internally and it is not expected that users would have used this functions. If these were used, please file an issue and I can implement them again.
miss_case_cumsumare now exported
rowSums(is.na(x)), which was 3 times faster.
gg_miss_fct()where warning is given for non explicit NA values - see 241.
geom_miss_point()ggplot2 layer can now be converted into an interactive web-based version by the
ggplotly()function in the plotly package. In order for this to work, naniar now exports the
geom2trace.GeomMissPoint()function (users should never need to call
ggplotly()calls it for you).
WORDLISTfor spelling thanks to
@seealsobug (#228) (@sfirke)
This fixes two problems that were identified as part of reverse dependency checks of dplyr 0.8.0 release candidate. https://github.com/tidyverse/dplyr/blob/revdep_dplyr_0_8_0_RC/revdep/problems.md#naniar
n() must be imported or prefixed like any other function. In the PR, I’ve changed 1:n() to dplyr::row_number() as naniar seems to prefix all dplyr functions.
update_shadow was only restoring the class attributes, changed so that it restores all attributes, this was causing problems when data was a grouped_df. This likely was a problem before too, but dplyr 0.8.0 is stricter about what is a grouped data frame.
Add custom label support for missings and not missings with functions
add_any_miss(). So you can now do `add_label_missings(data, missing = “custom_missing_label”, complete = “custom_complete_label”)
impute_median() and scoped variants
any_shade() returns a logical TRUE or FALSE depending on if there are any
is_nabular() checks if input is nabular.
Added two new vignettes, “Exploring Imputed Values”, and “Special Missing Values”
miss_case_summary now no longer provide the cumulative sum of missingness in the summaries - this summary can be added back to the data with the option
add_cumsum = TRUE. #186
gg_miss_upset to replace workflow of:
data %>% as_shadow_upset() %>% UpSetR::upset()
recode_shadownow works! This function allows you to recode your missing values into special missing values. These special missing values are stored in the shadow part of the dataframe, which ends in
shadewhere appropriate throughout naniar, and also added verifiers,
which_are_shade, and removed
bind_shadownow return data of class
shadow. This will feed into
recode_shadowmethods for flexibly adding new types of missing data.
shadowmight be changed to
nabbleor something similar.
add_label_missings()gain arguments so you can only label according to the missingness / shadowy-ness of given variables.
which_are_shadow(), to tell you which values are shadows.
long_shadow(), which converts data in shadow/nabular form into a long format suitable for plotting. Related to #165
gg_miss_upsetgets a better default presentation by ordering by the largest intersections, and also an improved error message when data with only 1 or no variables have missing values.
shadow_shiftgains a more informative error message when it doesn’t know the class.
common_na_stringto include escape characters for “?”, “", "." so that if they are used in replacement or searching functions they don’t return the wildcard results from the characters "?", "”, and “.”.
miss_var_tablenow has final column names
pct_miss- fixes #178.
These old names will be made defunct in 0.5.0, and removed completely in 0.6.0.
impute_belowhas changed to be an alias of
shadow_shift- that is it operates on a single vector.
impute_below_alloperates on all columns in a dataframe (as specified in #159)
gg_miss_var(airquality)now prints the ggplot - a typo meant that this did not print the plot
This is a patch release that removes
tidyselect from the package Imports, as it is unnecessary. Fixes #174