A way to extract the cluster of missingness that a group belongs to. For example, if you use vis_miss(airquality, cluster = TRUE), you can see some clustering in the data, but you do not have a way to identify the cluster. Future work will incorporate the seriation package to allow for better control over the clustering from the user.

add_miss_cluster(data, cluster_method = "mcquitty", n_clusters = 2)

Arguments

data

a dataframe

cluster_method

character vector of the agglomeration method to use, the default is "mcquitty". Options are taken from stats::hclust helpfile, and options include: "ward.D", "ward.D2", "single", "complete", "average" (= UPGMA), "mcquitty" (= WPGMA), "median" (= WPGMC) or "centroid" (= UPGMC).

n_clusters

numeric the number of clusters you expect. Defaults to 2.

See also

Examples

add_miss_cluster(airquality)
#> # A tibble: 153 x 7 #> Ozone Solar.R Wind Temp Month Day miss_cluster #> <int> <int> <dbl> <int> <int> <int> <int> #> 1 41 190 7.4 67 5 1 1 #> 2 36 118 8 72 5 2 1 #> 3 12 149 12.6 74 5 3 1 #> 4 18 313 11.5 62 5 4 1 #> 5 NA NA 14.3 56 5 5 2 #> 6 28 NA 14.9 66 5 6 1 #> 7 23 299 8.6 65 5 7 1 #> 8 19 99 13.8 59 5 8 1 #> 9 8 19 20.1 61 5 9 1 #> 10 NA 194 8.6 69 5 10 2 #> # ... with 143 more rows
add_miss_cluster(airquality, cluster_method = "ward.D")
#> # A tibble: 153 x 7 #> Ozone Solar.R Wind Temp Month Day miss_cluster #> <int> <int> <dbl> <int> <int> <int> <int> #> 1 41 190 7.4 67 5 1 1 #> 2 36 118 8 72 5 2 1 #> 3 12 149 12.6 74 5 3 1 #> 4 18 313 11.5 62 5 4 1 #> 5 NA NA 14.3 56 5 5 1 #> 6 28 NA 14.9 66 5 6 1 #> 7 23 299 8.6 65 5 7 1 #> 8 19 99 13.8 59 5 8 1 #> 9 8 19 20.1 61 5 9 1 #> 10 NA 194 8.6 69 5 10 2 #> # ... with 143 more rows
add_miss_cluster(airquality, cluster_method = "ward.D", n_clusters = 3)
#> # A tibble: 153 x 7 #> Ozone Solar.R Wind Temp Month Day miss_cluster #> <int> <int> <dbl> <int> <int> <int> <int> #> 1 41 190 7.4 67 5 1 1 #> 2 36 118 8 72 5 2 1 #> 3 12 149 12.6 74 5 3 1 #> 4 18 313 11.5 62 5 4 1 #> 5 NA NA 14.3 56 5 5 2 #> 6 28 NA 14.9 66 5 6 2 #> 7 23 299 8.6 65 5 7 1 #> 8 19 99 13.8 59 5 8 1 #> 9 8 19 20.1 61 5 9 1 #> 10 NA 194 8.6 69 5 10 3 #> # ... with 143 more rows
add_miss_cluster(airquality, n_clusters = 3)
#> # A tibble: 153 x 7 #> Ozone Solar.R Wind Temp Month Day miss_cluster #> <int> <int> <dbl> <int> <int> <int> <int> #> 1 41 190 7.4 67 5 1 1 #> 2 36 118 8 72 5 2 1 #> 3 12 149 12.6 74 5 3 1 #> 4 18 313 11.5 62 5 4 1 #> 5 NA NA 14.3 56 5 5 2 #> 6 28 NA 14.9 66 5 6 1 #> 7 23 299 8.6 65 5 7 1 #> 8 19 99 13.8 59 5 8 1 #> 9 8 19 20.1 61 5 9 1 #> 10 NA 194 8.6 69 5 10 3 #> # ... with 143 more rows