Add a column that tells us which "missingness cluster" a row belongs to
Source:R/add-cols.R
add_miss_cluster.Rd
A way to extract the cluster of missingness that a group belongs to.
For example, if you use vis_miss(airquality, cluster = TRUE)
, you can
see some clustering in the data, but you do not have a way to identify
the cluster. Future work will incorporate the seriation
package to
allow for better control over the clustering from the user.
Arguments
- data
a dataframe
- cluster_method
character vector of the agglomeration method to use, the default is "mcquitty". Options are taken from
stats::hclust
helpfile, and options include: "ward.D", "ward.D2", "single", "complete", "average" (= UPGMA), "mcquitty" (= WPGMA), "median" (= WPGMC) or "centroid" (= UPGMC).- n_clusters
numeric the number of clusters you expect. Defaults to 2.
Examples
add_miss_cluster(airquality)
#> # A tibble: 153 × 7
#> Ozone Solar.R Wind Temp Month Day miss_cluster
#> <int> <int> <dbl> <int> <int> <int> <int>
#> 1 41 190 7.4 67 5 1 1
#> 2 36 118 8 72 5 2 1
#> 3 12 149 12.6 74 5 3 1
#> 4 18 313 11.5 62 5 4 1
#> 5 NA NA 14.3 56 5 5 2
#> 6 28 NA 14.9 66 5 6 1
#> 7 23 299 8.6 65 5 7 1
#> 8 19 99 13.8 59 5 8 1
#> 9 8 19 20.1 61 5 9 1
#> 10 NA 194 8.6 69 5 10 2
#> # ℹ 143 more rows
add_miss_cluster(airquality, n_clusters = 3)
#> # A tibble: 153 × 7
#> Ozone Solar.R Wind Temp Month Day miss_cluster
#> <int> <int> <dbl> <int> <int> <int> <int>
#> 1 41 190 7.4 67 5 1 1
#> 2 36 118 8 72 5 2 1
#> 3 12 149 12.6 74 5 3 1
#> 4 18 313 11.5 62 5 4 1
#> 5 NA NA 14.3 56 5 5 2
#> 6 28 NA 14.9 66 5 6 1
#> 7 23 299 8.6 65 5 7 1
#> 8 19 99 13.8 59 5 8 1
#> 9 8 19 20.1 61 5 9 1
#> 10 NA 194 8.6 69 5 10 3
#> # ℹ 143 more rows
add_miss_cluster(airquality, cluster_method = "ward.D", n_clusters = 3)
#> # A tibble: 153 × 7
#> Ozone Solar.R Wind Temp Month Day miss_cluster
#> <int> <int> <dbl> <int> <int> <int> <int>
#> 1 41 190 7.4 67 5 1 1
#> 2 36 118 8 72 5 2 1
#> 3 12 149 12.6 74 5 3 1
#> 4 18 313 11.5 62 5 4 1
#> 5 NA NA 14.3 56 5 5 2
#> 6 28 NA 14.9 66 5 6 2
#> 7 23 299 8.6 65 5 7 1
#> 8 19 99 13.8 59 5 8 1
#> 9 8 19 20.1 61 5 9 1
#> 10 NA 194 8.6 69 5 10 3
#> # ℹ 143 more rows