Skip to contents

Data structures for missing data

Creation and Manipulation of Shadow Matrices

as_shadow()
Create shadows
as_shadow_upset()
Convert data into shadow format for doing an upset plot
bind_shadow()
Bind a shadow dataframe to original data
nabular()
Convert data into nabular form by binding shade to it
gather_shadow()
Long form representation of a shadow matrix
shade()
Create new levels of missing
shadow_long()
Reshape shadow data into a long format
unbind_shadow() unbind_data()
Unbind (remove) shadow from data, and vice versa
shadow_shift()
Shift missing values to facilitate missing data exploration/visualisation

Create special missing values

Create special missing values so that they don’t get lost! See vignette("special-missing").

recode_shadow()
Add special missing values to the shadow matrix

Visualisation

Visualise missing data

geom_miss_point()
Plot Missing Data Points
stat_miss_point()
stat_miss_point
gg_miss_case()
Plot the number of missings per case (row)
gg_miss_case_cumsum()
Plot of cumulative sum of missing for cases
gg_miss_fct()
Plot the number of missings for each variable, broken down by a factor
gg_miss_span()
Plot the number of missings in a given repeating span
gg_miss_upset()
Plot the pattern of missingness using an upset plot.
gg_miss_var()
Plot the number of missings for each variable
gg_miss_var_cumsum()
Plot of cumulative sum of missing value for each variable
gg_miss_which()
Plot which variables contain a missing value
reexports %>% is_na are_na vis_miss
Objects exported from other packages

Numerical Summaries

Provide tidy data frame summaries of missingness

miss_var_prop() complete_var_prop() miss_var_pct() complete_var_pct() miss_case_prop() complete_case_prop() miss_case_pct() complete_case_pct()
Proportion of variables containing missings or complete values
miss_case_cumsum()
Summarise the missingness in each case
miss_case_summary()
Summarise the missingness in each case
miss_case_table()
Tabulate missings in cases.
miss_prop_summary()
Proportions of missings in data, variables, and cases.
miss_scan_count()
Search and present different kinds of missing values
miss_summary()
Collate summary measures from naniar into one tibble
miss_var_cumsum()
Cumulative sum of the number of missings in each variable
miss_var_run()
Find the number of missing and complete values in a single run
miss_var_span()
Summarise the number of missings for a given repeating span on a variable
miss_var_summary()
Summarise the missingness in each variable
miss_var_table()
Tabulate the missings in the variables
miss_var_which()
Which variables contain missing values?

Handy helpers

Handy helpers

n_var_complete() n_case_complete()
The number of variables with complete values
n_var_miss() n_case_miss()
The number of variables or cases with missing values
n_complete()
Return the number of complete values
n_complete_row()
Return a vector of the number of complete values in each row
n_miss()
Return the number of missing values
n_miss_row()
Return a vector of the number of missing values in each row
prop_miss_case() prop_complete_case()
Proportion of cases that contain a missing or complete values.
prop_miss_var() prop_complete_var()
Proportion of variables containing missings or complete values
prop_complete()
Return the proportion of complete values
prop_complete_row()
Return a vector of the proportion of missing values in each row
prop_miss()
Return the proportion of missing values
prop_miss_row()
Return a vector of the proportion of missing values in each row
pct_miss_case() pct_complete_case()
Percentage of cases that contain a missing or complete values.
pct_miss_var() pct_complete_var()
Percentage of variables containing missings or complete values
pct_complete()
Return the percent of complete values
pct_miss()
Return the percent of missing values
any_na() any_miss() any_complete() all_na() all_miss() all_complete()
Identify if there are any or all missing or complete values
any_row_miss()
Helper function to determine whether there are any missings
is_shade() are_shade() any_shade()
Detect if this is a shade
which_are_shade()
Which variables are shades?
common_na_numbers
Common number values for NA
common_na_strings
Common string values for NA

Add columns

Add missing data summaries/tool columns

add_any_miss()
Add a column describing presence of any missing values
add_label_missings()
Add a column describing if there are any missings in the dataset
add_label_shadow()
Add a column describing whether there is a shadow
add_miss_cluster()
Add a column that tells us which "missingness cluster" a row belongs to
add_n_miss()
Add column containing number of missing data values
add_prop_miss()
Add column containing proportion of missing data values
add_shadow()
Add a shadow column to dataframe
add_shadow_shift()
Add a shadow shifted column to a dataset
add_span_counter()
Add a counter variable for a span of dataframe

Replacing values with and to NA

Functions to help replace certain values with NA, which includes scoped variants (_at, _if, _all) that take formulas for flexible approachs. vignette("replace-with-na")

replace_with_na()
Replace values with missings
replace_with_na_all()
Replace all values with NA where a certain condition is met
replace_with_na_at()
Replace specified variables with NA where a certain condition is met
replace_with_na_if()
Replace values with NA based on some condition, for variables that meet some predicate
replace_to_na()
Replace values with missings
replace_na_with()
Replace NA value with provided value

Imputation helpers

Simple imputation methods for exploring visualisation and missingness structure. See vignette("exploring-imputed-values") for more details.

impute_below()
Impute data with values shifted 10 percent below range.
impute_below(<numeric>)
Impute numeric values below a range for graphical exploration
impute_below_all()
Impute data with values shifted 10 percent below range.
impute_below_at()
Scoped variants of impute_below
impute_below_if()
Scoped variants of impute_below
impute_factor()
Impute a factor value into a vector with missing values
impute_fixed()
Impute a fixed value into a vector with missing values
impute_mean()
Impute the mean value into a vector with missing values
impute_median()
Impute the median value into a vector with missing values
impute_mode()
Impute the mode value into a vector with missing values
impute_zero()
Impute zero into a vector with missing values
impute_mean_all() impute_mean_at() impute_mean_if()
Scoped variants of impute_mean
impute_median_all() impute_median_at() impute_median_if()
Scoped variants of impute_median
set_prop_miss() set_n_miss()
Set a proportion or number of missing values

Package title details

Details of the package naniar

naniar-package naniar
naniar: Data Structures, Summaries, and Visualisations for Missing Data

Cast Shadows

Add shadow information to the dataframe while reducing it to the variables of interest

cast_shadow()
Add a shadow column to a dataset
cast_shadow_shift()
Add a shadow and a shadow_shift column to a dataset
cast_shadow_shift_label()
Add a shadow column and a shadow shifted column to a dataset

Misc helpers

Misc helpers

label_miss_1d()
Label a missing from one column
label_miss_2d()
label_miss_2d
label_missings()
Is there a missing value in the row of a dataframe?
where_na()
Which rows and cols contain missings?
which_na()
Which elements contain missings?
.where()
Split a call into two components with a useful verb name

Data Sources

For practice and example usecases in naniar

oceanbuoys
West Pacific Tropical Atmosphere Ocean Data, 1993 & 1997.
pedestrian
Pedestrian count information around Melbourne for 2016
riskfactors
The Behavioral Risk Factor Surveillance System (BRFSS) Survey Data, 2009.

Little’s MCAR test

For performing Little’s MCAR test

mcar_test()
Little's missing completely at random (MCAR) test

ggplot2 extensions

Custom ggplot geoms built to extend ggplot for missing values

StatMissPoint
naniar-ggproto