Changelog
Source:NEWS.md
naniar 1.1.0 “Prince Caspian”
CRAN release: 2024-03-05
New
- Implement
impute_fixed
,impute_zero
, andimpute_factor
. notably these do not implement “scoped variants” which were previously implemented - for example,impute_fixed_if
etc. This is in favour of using the newacross
workflow withindplyr
, and it is easier to maintain. #261 - Add
digit
argument tomiss_var_summary
to help display %missing data correctly when there is a very small fraction of missingness. #284 - Implemented
impute_mode
- resolves #213. -
geom_miss_point()
works withshape
argument #290 - Fix bug with
all_complete
, which was implemented as!anyNA(x)
but should beall(complete.cases(x))
. - Correctly implement
any_na()
(andany_miss()
) andany_complete()
. Rework examples to demonstrate workflow for finding complete variables.
Bug fixes
- Fix bug with
shadow_long
not working when gathering variables of mixed type. Fix involves specifying a value transform, which defaults to character. #314 - Implement
Date
,POSIXct
andPOSIXlt
methods forimpute_below()
- #158 - Provide replace_na_with, a complement to replace_with_na - #129
- Fix bug with
gg_miss_fct
where it used a deprecated function from forcats - #342
Misc
- Use
cli::cli_abort
andcli::cli_warn
instead ofstop
andwarn
(#326) - Use
expect_snapshot
instead ofexpect_error
(#326)
Changes
- Soft deprecated
shadow_shift
- #193 - Soft deprecate
miss_case_cumsum()
andmiss_var_cumsum()
- #257
naniar 1.0.0
CRAN release: 2023-02-02
Version 1.0.0 of naniar is to signify that this release is associated with the publication of the associated JSS paper, doi:10.18637/jss.v105.i07. There are also a few small changes that have been implemented in this release, which are described below.
There is still a lot to do in naniar, and this release does not signify that there are no changes upcoming, more so to establish that this is a stable release, and that any changes upcoming will go through a more formal deprecation process and so on.
New
- The DOI in the CITATION is for a new JSS publication that will be registered after publication on CRAN.
- Replaced
tidyr::gather
withtidyr::pivot_longer
- resolves #301 - added
set_n_miss
andset_prop_miss
functions - resolved #298
Bug Fixes
- Fix bug in
gg_miss_var()
where a warning appears to due change in how to remove legend #288.
Misc
- Removed gdtools from naniar as no longer needed 302.
- added imports,
vctrs
andcli
- which are both free dependencies as they are used within the already used tidyverse already.
naniar 0.6.1 (2021/05/13) “Incandescent lightbulbs killed the Arc lamps”
CRAN release: 2021-05-14
New features
- naniar now provides
mcar_test()
for Little’s (1988) statistical test for missing completely at random (MCAR) data. The null hypothesis in this test is that the data is MCAR, and the test statistic is a chi-squared value. Given a high statistic value and low p-value, we can conclude data are not missing completely at random. Thanks to Andrew Heiss for the PR. -
common_na_strings
gains"#N/A"
.
Bug fixes
- Fix bug in
miss_var_span()
(#270) where the number of missings + number of complete values added up to more than the number of rows in the data. This was due to the remainder not being used when calculating the number of complete values. - Fix bug in
recode_shadow()
(#272) where adding the same special missing value in two subsequent operations fails.
naniar 0.6.0 (2020/08/17) “Spur of the lamp post”
CRAN release: 2020-09-02
- Provide warning for
replace_with_na
when columns provided that don’t exist (see #160). Thank you to michael-dewar for their help with this.
Breaking Changes
- Drop the “nabular” and “shadow” classes (#268) used in
nabular()
andbind_shadow()
. In doing so removes the functions,as_shadow()
,is_shadow()
,is_nabular()
,new_nabular()
,new_shadow()
. These were mostly used internally and it is not expected that users would have used this functions. If these were used, please file an issue and I can implement them again.
naniar 0.5.2 (2020/06/28) “Silver Apple”
CRAN release: 2020-06-29
Minor Changes
- Improvements to code in
miss_var_summary()
,miss_var_table()
, andprop_miss_var()
, resulting in a 3-10x speedup.
naniar 0.5.0 (2020/02/20) “The End of this Story and the Beginning of all of the Others”
CRAN release: 2020-02-28
Breaking Changes
- The following functions related to calculating the proportion/percentage of missingness were made Defunct and will no longer work:
Instead use: prop_miss_var()
, prop_complete_var()
, pct_miss_var()
, pct_complete_var()
, prop_miss_case()
, prop_complete_case()
, pct_miss_case()
, pct_complete_case()
. (see 242)
-
replace_to_na()
was made defunct, please usereplace_with_na()
instead. (see 242)
Minor changes
-
miss_var_cumsum
andmiss_case_cumsum
are now exported - use
map_dfc
instead ofmap_df
- Fix various extra warnings and improve test coverage
Bug Fixes
- Address bug where the number of missings in a row is not calculated properly - see 238 and 232. The solution involved using
rowSums(is.na(x))
, which was 3 times faster. - Resolve bug in
gg_miss_fct()
where warning is given for non explicit NA values - see 241. - skip vdiffr tests on github actions
- use
tibble()
notdata_frame()
naniar 0.4.2 (2019/02/15) “The Planting of The Tree”
CRAN release: 2019-02-15
Improvements
- The
geom_miss_point()
ggplot2 layer can now be converted into an interactive web-based version by theggplotly()
function in the plotly package. In order for this to work, naniar now exports thegeom2trace.GeomMissPoint()
function (users should never need to callgeom2trace.GeomMissPoint()
directly –ggplotly()
calls it for you). - adds
WORDLIST
for spelling thanks tousethis::use_spell_check()
- fix documentation
@seealso
bug (#228) (@sfirke)
Dependency fixes
-
Thanks to a PR (#223) from @romainfrancois:
This fixes two problems that were identified as part of reverse dependency checks of dplyr 0.8.0 release candidate. https://github.com/tidyverse/dplyr/blob/revdep_dplyr_0_8_0_RC/revdep/problems.md#naniar
n() must be imported or prefixed like any other function. In the PR, I’ve changed 1:n() to dplyr::row_number() as naniar seems to prefix all dplyr functions.
update_shadow was only restoring the class attributes, changed so that it restores all attributes, this was causing problems when data was a grouped_df. This likely was a problem before too, but dplyr 0.8.0 is stricter about what is a grouped data frame.
naniar 0.4.1 (2018/11/20) “Aslan’s Song”
CRAN release: 2018-11-20
Minor Change
- Fixes to
new_tibble
#220 - Thanks to Kirill Müller. - Refactoring the capture of arguments from
rlang
#218 - thanks for Lionel Henry.
naniar 0.4.0 (2018/09/10) “An Unexpected Meeting”
New Feature
Add custom label support for missings and not missings with functions
add_label_missings
andadd_label_shadow()
andadd_any_miss()
. So you can now do `add_label_missings(data, missing = “custom_missing_label”, complete = “custom_complete_label”)impute_median()
and scoped variantsany_shade()
returns a logical TRUE or FALSE depending on if there are anyshade
valuesnabular()
an alias forbind_shadow()
to tie thenabular
term into the work.is_nabular()
checks if input is nabular.geom_miss_point()
now gains the arguments fromshadow_shift()
/impute_below()
for altering the amount ofjitter
and proportion below (prop_below
).Added two new vignettes, “Exploring Imputed Values”, and “Special Missing Values”
miss_var_summary
andmiss_case_summary
now no longer provide the cumulative sum of missingness in the summaries - this summary can be added back to the data with the optionadd_cumsum = TRUE
. #186-
Added
gg_miss_upset
to replace workflow of:data %>% as_shadow_upset() %>% UpSetR::upset()
Major Change
-
recode_shadow
now works! This function allows you to recode your missing values into special missing values. These special missing values are stored in the shadow part of the dataframe, which ends in_NA
. - implemented
shade
where appropriate throughout naniar, and also added verifiers,is_shade
,are_shade
,which_are_shade
, and removedwhich_are_shadow
. -
as_shadow
andbind_shadow
now return data of classshadow
. This will feed intorecode_shadow
methods for flexibly adding new types of missing data. - Note that in the future
shadow
might be changed tonabble
or something similar.
Minor feature
- Functions
add_label_shadow()
andadd_label_missings()
gain arguments so you can only label according to the missingness / shadowy-ness of given variables. - new function
which_are_shadow()
, to tell you which values are shadows. - new function
long_shadow()
, which converts data in shadow/nabular form into a long format suitable for plotting. Related to #165 - Added tests for
miss_scan_count
Minor Changes
-
gg_miss_upset
gets a better default presentation by ordering by the largest intersections, and also an improved error message when data with only 1 or no variables have missing values. -
shadow_shift
gains a more informative error message when it doesn’t know the class. - Changed
common_na_string
to include escape characters for “?”, “”, ”.” so that if they are used in replacement or searching functions they don’t return the wildcard results from the characters ”?”, ””, and “.”. -
miss_case_table
andmiss_var_table
now has final column namespct_vars
, andpct_cases
instead ofpct_miss
- fixes #178.
Breaking Changes
- Deprecated old names of the scalar missingness summaries, in favour of a more consistent syntax #171. The old the and new are:
old_names | new_names |
---|---|
miss_case_pct |
pct_miss_case |
miss_case_prop |
prop_miss_case |
miss_var_pct |
pct_miss_var |
miss_var_prop |
prop_miss_var |
complete_case_pct |
pct_complete_case |
complete_case_prop |
prop_complete_case |
complete_var_pct |
pct_complete_var |
complete_var_prop |
prop_complete_var |
These old names will be made defunct in 0.5.0, and removed completely in 0.6.0.
-
impute_below
has changed to be an alias ofshadow_shift
- that is it operates on a single vector.impute_below_all
operates on all columns in a dataframe (as specified in #159)
naniar 0.3.1 (2018/06/10) “Strawberry’s Adventure”
CRAN release: 2018-06-08
Minor Change
This is a patch release that removes tidyselect
from the package Imports, as it is unnecessary. Fixes #174
# naniar 0.3.0 (2018/06/06) “Digory and his Uncle Are Both in Trouble”
CRAN release: 2018-06-07
New Features
Added
all_miss()
/all_na()
equivalent toall(is.na(x))
Added
any_complete()
equivalent toall(complete.cases(x))
Added
any_miss()
equivalent toanyNA(x)
Added
common_na_numbers
and finalisedcommon_na_strings
- to provide a list of commonly used NA values #168Added
miss_var_which
, to lists the variable names with missings-
Added
as_shadow_upset
which gets the data into a format suitable for plotting as anUpSetR
plot:airquality %>% as_shadow_upset() %>% UpSetR::upset()
-
Added some imputation functions to assist with exploring missingness structure and visualisation:
-
impute_below
Perfoms as forshadow_shift
, but performs on all columns. This means that it imputes missing values 10% below the range of the data (powered byshadow_shift
), to facilitate graphical exloration of the data. Closes #145 There are also scoped variants that work for specific named columns:impute_below_at
, and for columns that satisfy some predicate function:impute_below_if
. -
impute_mean
, imputes the mean value, and scoped variantsimpute_mean_at
, andimpute_mean_if
.
-
impute_below
andshadow_shift
gain argumentsprop_below
andjitter
to control the degree of shift, and also the extent of jitter.Added
complete_{case/var}_{pct/prop}
, which complementmiss_{var/case}_{pct/prop}
#150Added
unbind_shadow
andunbind_data
as helpers to remove shadow columns from data, and data from shadows, respectively.Added
is_shadow
andare_shadow
to determine if something contains a shadow column. simimlar torlang::is_na
andrland::are_na
,is_shadow
this returns a logical vector of length 1, andare_shadow
returns a logical vector of length of the number of names of a data.frame. This might be revisited at a later point (seeany_shade
inadd_label_shadow
).Aesthetics now map as expected in geom_miss_point(). This means you can write things like
geom_miss_point(aes(colour = Month))
and it works appropriately. Fixed by Luke Smith in Pull request #144, fixing #137.
Minor Changes
miss_var_summary
andmiss_case_summary
now return useorder = TRUE
by default, so cases and variables with the most missings are presented in descending order. Fixes #163-
Changes for Visualisation:
- Changed the default colours used in
gg_miss_case
andgg_miss_var
to lorikeet purple (from ochRe package: https://github.com/ropenscilabs/ochRe) -
gg_miss_case
- The y axis label is now …
- Default presentation is with
order_cases = TRUE
. - Gains a
show_pct
option to be consistent withgg_miss_var
#153
-
gg_miss_which
is rotated 90 degrees so it is easier to read variable names -
gg_miss_fct
uses a minimal theme and tilts the axis labels #118.
- Changed the default colours used in
imported
is_na
andare_na
fromrlang
.Added
common_na_strings
, a list of commonNA
values #168.Added some detail on alternative methods for replacing with NA in the vignette “replacing values with NA”.
# naniar 0.2.0 (2018/02/08) (“The First Joke and Other Matters”)
CRAN release: 2018-02-09
New Features
Speed improvements. Thanks to the help, contributions, and discussion with Romain François and Jim Hester, naniar now has greatly improved speed for calculating the missingness in each row. These speedups should continue to improve in future releases.
-
New “scoped variants” of
replace_with_na
, thankyou to Colin Fay for his work on this:-
replace_with_na_all
replaces all NAs across the dataframe that meet a specified condition (using the syntax~.x == -99
) -
replace_with_na_at
replaces all NAs across for specified variables -
replace_with_na_if
replaces all NAs for those variables that satisfy some predicate function (e.g., is.character)
-
added
which_na
- replacement forwhich(is.na(x))
miss_scan_count
. This makes it easier for users to search for particular occurrences of these values across their variables. #119n_miss_row
calculates the number of missing values in each row, returning a vector. There are also 3 other functions which are similar in spirit:n_complete_row
,prop_miss_row
, andprop_complete_row
, which return a vector of the number of complete obserations, the proportion of missings in a row, and the proportion of complete obserations in a rowadd_miss_cluster
is a new function that calculates a cluster of missingness for each row, usinghclust
. This can be useful in exploratory modelling of missingness, similar to Tierney et al 2015: “doi: 10.1136/bmjopen-2014-007450” and Barnett et al. 2017: “doi: 10.1136/bmjopen-2017-017284”Now exported
where_na
- a function that returns the positions of NA values. For a dataframe it returns a matrix of row and col positions of NAs, and for a vector it returns a vector of positions of NAs. (#105)
Minor changes
- Updated the vignette “Gallery of Missing Data Visualisations” to include the
facet
features andorder_cases
. -
bind_shadow
gains aonly_miss
argument. When set to FALSE (the default) it will bind a dataframe with all of the variables duplicated with their shadow. Setting this to TRUE will bind variables only those variables that contain missing values. - Cleaned up the visualisation of
gg_miss_case
to be clearer and less cluttered ( #117), also added norder_cases
option to order by cases. - Added a
facet
argument togg_miss_var
,gg_miss_case
, andgg_miss_span
. This makes it easier for users to visualise these plots across the values of another variable. In the future I will consider addingfacet
to the other shorthand plotting function, but at the moment these seemed to be the ones that would benefit the most from this feature.
Bug fix
-
oceanbuoys
now is numeric type for year, latitude, and longitude, previously it was factor. See related issue - Improved handling of
shadow_shift
when there are Inf or -Inf values (see #117)
Breaking change
Deprecated
replace_to_na
, withreplace_with_na
, as it is a more natural phrase (“replace coffee to tea” vs “replace coffee with tea”). This will be made defunct in the next version.cast_shadow
no longer works when called ascast_shadow(data)
. This action used to return all variables, and then shadow variables for the variables that only contained missing values. This was inconsistent with the use ofcast_shadow(data, var1, var2)
. A new option has been added tobind_shadow
that controls this - discussed below. See more details at issue 65.Change behaviour of
cast_shadow
so that the default option is to return only the variables that contain missings. This is different tobind_shadow
, which binds a complete shadow matrix to the dataframe. A way to think about this is that the shadow is only cast on variables that contain missing values, whereas a bind is binding a complete shadow to the data. This may change in the future to be the default option forbind_shadow
.
# naniar 0.1.0 (2017/08/09) “The Founding of naniar
”
CRAN release: 2017-08-09
- This is the first release of
naniar
onto CRAN, updates tonaniar
will happen reasonably regularly after this approximately every 1-2 months
# naniar 0.0.9.9995 (2017/08/07)
Major Change
- three new functions :
miss_case_cumsum
/miss_var_cumsum
/replace_to_na
- two new visualisations :
gg_var_cumsum
&gg_case_cumsum
Minor changes
- Reviewed documentation for all functions and improved wording, grammar, and style.
- Converted roxygen to roxygen markdown
- updated vignettes and readme
- added a new vignette “naniar-visualisation”, to give a quick overview of the visualisations provided with naniar.
- changed
label_missing*
tolabel_miss
to be more consistent with the rest of naniar - Add
pct
andprop
helpers (#78) - removed
miss_df_pct
- this was literally the same aspct_miss
orprop_miss
. - break larger files into smaller, more manageable files (#83)
-
gg_miss_var
gets ashow_pct
argument to show the percentage of missing values (Thanks Jennifer for the helpful feedback! :))
# naniar 0.0.6.9100 (2017/03/21)
- Added
prop_miss
and the complementprop_complete
. Wheren_miss
returns the number of missing values,prop_miss
returns the proportion of missing values. Likewise,prop_complete
returns the proportion of complete values.
Defunct functions
- As stated in 0.0.5.9000, to address Issue #38, I am moving towards the format miss_type_value/fun, because it makes more sense to me when tabbing through functions.
The left hand side functions have been made defunct in favour of the right hand side. - percent_missing_case()
–> miss_case_pct()
- percent_missing_var()
–> miss_var_pct()
- percent_missing_df()
–> miss_df_pct()
- summary_missing_case()
–> miss_case_summary()
- summary_missing_var()
–> miss_var_summary()
- table_missing_case()
–> miss_case_table()
- table_missing_var()
–> miss_var_table()
# naniar 0.0.5.9000 (2016/01/08)
Deprecated functions
- To address Issue #38, I am moving towards the format miss_type_value/fun, because it makes more sense to me when tabbing through functions.
-
miss_*
= I want to explore missing values -
miss_case_*
= I want to explore missing cases -
miss_case_pct
= I want to find the percentage of cases containing a missing value -
miss_case_summary
= I want to find the number / percentage of missings in each case -
miss_case_table
= I want a tabulation of the number / percentage of cases missing
This is more consistent and easier to reason with.
Thus, I have renamed the following functions: - percent_missing_case()
–> miss_case_pct()
- percent_missing_var()
–> miss_var_pct()
- percent_missing_df()
–> miss_df_pct()
- summary_missing_case()
–> miss_case_summary()
- summary_missing_var()
–> miss_var_summary()
- table_missing_case()
–> miss_case_table()
- table_missing_var()
–> miss_var_table()
These will be made defunct in the next release, 0.0.6.9000 (“The Wood Between Worlds”).
# naniar 0.0.4.9000 (2016/12/31)
# naniar 0.0.3.9901 (2016/12/18)
After a burst of effort on this package I have done some refactoring and thought hard about where this package is going to go. This meant that I had to make the decision to rename the package from ggmissing to naniar. The name may strike you as strange but it reflects the fact that there are many changes happening, and that we will be working on creating a nice utopia (like Narnia by CS Lewis) that helps us make it easier to work with missing data
New Features (under development)
add_n_miss
andadd_prop_miss
are helpers that add columns to a dataframe containing the number and proportion of missing values. An example has been provided to use decision trees to explore missing data structure as in “doi: 10.1136/bmjopen-2014-007450”geom_miss_point()
now supports transparency, thanks to @seasmith (Luke Smith)more shadows. These are mainly around
bind_shadow
andgather_shadow
, which are helper functions to assist with creating
Bug fixes
geom_missing_point()
broke after the new release of ggplot2 2.2.0, but this is now fixed by ensuring that it inherits from GeomPoint, rather than just a new Geom. Thanks to Mitchell O’hara-Wild for his help with this.missing data summaries
table_missing_var
andtable_missing_case
also now return more sensible numbers and variable names. It is possible these function names will change in the future, as these are kind of verbose.semantic versioning was incorrectly entered in the DESCRIPTION file as 0.2.9000, so I changed it to 0.0.2.9000, and then to 0.0.3.9000 now to indicate the new changes, hopefully this won’t come back to bite me later. I think I accidentally did this with visdat at some point as well. Live and learn.