Use Little's (1988) test statistic to assess if data is missing completely
at random (MCAR). The null hypothesis in this test is that the data is
MCAR, and the test statistic is a chi-squared value. The example below
shows the output of mcar_test(airquality)
. Given the high statistic
value and low p-value, we can conclude the airquality
data is not
missing completely at random.
Value
A tibble::tibble()
with one row and four columns:
- statistic
Chi-squared statistic for Little's test
- df
Degrees of freedom used for chi-squared statistic
- p.value
P-value for the chi-squared statistic
- missing.patterns
Number of missing data patterns in the data
Note
Code is adapted from LittleMCAR() in the now-orphaned BaylorEdPsych
package: https://rdrr.io/cran/BaylorEdPsych/man/LittleMCAR.html. Some of
code is adapted from Eric Stemmler: https://web.archive.org/web/20201120030409/https://stats-bayes.com/post/2020/08/14/r-function-for-little-s-test-for-data-missing-completely-at-random/
using Maximum likelihood estimation from norm
.
References
Little, Roderick J. A. 1988. "A Test of Missing Completely at Random for Multivariate Data with Missing Values." Journal of the American Statistical Association 83 (404): 1198--1202. doi:10.1080/01621459.1988.10478722 .
Author
Andrew Heiss, andrew@andrewheiss.com
Examples
mcar_test(airquality)
#> # A tibble: 1 × 4
#> statistic df p.value missing.patterns
#> <dbl> <dbl> <dbl> <int>
#> 1 35.1 14 0.00142 4
mcar_test(oceanbuoys)
#> # A tibble: 1 × 4
#> statistic df p.value missing.patterns
#> <dbl> <dbl> <dbl> <int>
#> 1 747. 31 0 6
# If there are non-numeric columns, there will be a warning
mcar_test(riskfactors)
#> Warning: NAs introduced by coercion to integer range
#> # A tibble: 1 × 4
#> statistic df p.value missing.patterns
#> <dbl> <dbl> <dbl> <int>
#> 1 1741. 1319 3.32e-14 48