Add a column that tells us which "missingness cluster" a row belongs to

A way to extract the cluster of missingness that a group belongs to. For example, if you use vis_miss(airquality, cluster = TRUE), you can see some clustering in the data, but you do not have a way to identify the cluster. Future work will incorporate the seriation package to allow for better control over the clustering from the user.

Usage

add_miss_cluster(data, cluster_method = "mcquitty", n_clusters = 2)

Arguments

data: a dataframe
cluster_method: character vector of the agglomeration method to use, the default is "mcquitty". Options are taken from stats::hclust helpfile, and options include: "ward.D", "ward.D2", "single", "complete", "average" (= UPGMA), "mcquitty" (= WPGMA), "median" (= WPGMC) or "centroid" (= UPGMC).
n_clusters: numeric the number of clusters you expect. Defaults to 2.

Examples


add_miss_cluster(airquality)
#> # A tibble: 153 × 7
#>    Ozone Solar.R  Wind  Temp Month   Day miss_cluster
#>    <int>   <int> <dbl> <int> <int> <int>        <int>
#>  1    41     190   7.4    67     5     1            1
#>  2    36     118   8      72     5     2            1
#>  3    12     149  12.6    74     5     3            1
#>  4    18     313  11.5    62     5     4            1
#>  5    NA      NA  14.3    56     5     5            2
#>  6    28      NA  14.9    66     5     6            1
#>  7    23     299   8.6    65     5     7            1
#>  8    19      99  13.8    59     5     8            1
#>  9     8      19  20.1    61     5     9            1
#> 10    NA     194   8.6    69     5    10            2
#> # ℹ 143 more rows
add_miss_cluster(airquality, n_clusters = 3)
#> # A tibble: 153 × 7
#>    Ozone Solar.R  Wind  Temp Month   Day miss_cluster
#>    <int>   <int> <dbl> <int> <int> <int>        <int>
#>  1    41     190   7.4    67     5     1            1
#>  2    36     118   8      72     5     2            1
#>  3    12     149  12.6    74     5     3            1
#>  4    18     313  11.5    62     5     4            1
#>  5    NA      NA  14.3    56     5     5            2
#>  6    28      NA  14.9    66     5     6            1
#>  7    23     299   8.6    65     5     7            1
#>  8    19      99  13.8    59     5     8            1
#>  9     8      19  20.1    61     5     9            1
#> 10    NA     194   8.6    69     5    10            3
#> # ℹ 143 more rows
add_miss_cluster(airquality, cluster_method = "ward.D", n_clusters = 3)
#> # A tibble: 153 × 7
#>    Ozone Solar.R  Wind  Temp Month   Day miss_cluster
#>    <int>   <int> <dbl> <int> <int> <int>        <int>
#>  1    41     190   7.4    67     5     1            1
#>  2    36     118   8      72     5     2            1
#>  3    12     149  12.6    74     5     3            1
#>  4    18     313  11.5    62     5     4            1
#>  5    NA      NA  14.3    56     5     5            2
#>  6    28      NA  14.9    66     5     6            2
#>  7    23     299   8.6    65     5     7            1
#>  8    19      99  13.8    59     5     8            1
#>  9     8      19  20.1    61     5     9            1
#> 10    NA     194   8.6    69     5    10            3
#> # ℹ 143 more rows

Add a column that tells us which "missingness cluster" a row belongs to

Usage

Arguments

See also

Examples