RStudio：get_dist() 错误消息“'x' 必须是数字”遵循聚类指南？

Question

我是 R 的新手，所以我一直在关注 guide for cluster analysis，当我开始使用 get_dist() 时，我不断收到错误 Error in colMeans(x, na.rm = TRUE) : 'x' must be numeric。当我删除包含 <chr> 数据的列时，它工作正常，但问题是，我想保留这些标签，例如 USArrests 数据集中的 "state" 标签。

我在 here, however there were no comments or answers that were helpful for me. I've seen a few posts, such as this one 上发现了一个与我的问题非常相似的问题，其中提到尝试 get_dist(x$x) 或 as.numeric(as.character(x$x))，但我必须承认这种解决方法没有多大意义，我也没有成功实施这些建议。

我无法显示我的完整数据集，但我可以提供 head() 的结果，我注意到它不同于 head(USArrests):

library(readxl)
Mother_2_ABS_Summer_2019_clean <- read_excel("~/.../Mother_2_ABS_Summer_2019_clean.xls", 
    range = "D1:H61")
head(Mother_2_ABS_Summer_2019_clean)

...1     Audience     Genre     Structure     Proofreading
<chr>    <dbl>        <dbl>     <dbl>         <dbl>
ABS-P_29_S31    2   2   2.0 3
ABS_40_S50  3   3   3.5 3
ABS_57_S47  2   2   2.0 3
ABS_86_S48  4   3   3.0 4
ABS_143_S42 2   2   2.0 3
ABS-P_152_S49   2   1   1.0 4

head(USArrests)

         Murder     Assault     UrbanPop     Rape
        <dbl>       <int>       <int>        <dbl>
Alabama 13.2    236 58  21.2
Alaska  10.0    263 48  44.5
Arizona 8.1 294 80  31.0
Arkansas    8.8 190 50  19.5
California  9.0 276 91  40.6
Colorado    7.9 204 78  38.7

所以我注意到，在 USArrests 中，状态标签未归类为 <chr>，这与我对文档的标识不同。

当我按照指南进行操作时，直到 get_dist():

之前我都没有遇到任何问题

dat1 <- na.omit(Mother_2_ABS_Summer_2019_clean)
dat1 <- scale(dat1)

distance <- get_dist(dat1)
fviz_dist(distance, gradient = list(low = "#00AFBB", mid = "white", high = "#FC4E07"))

Error in colMeans(x, na.rm = TRUE) : 'x' must be numeric

当我仅导入包含数字数据的 4 列并按照指南进行操作时，一切正常，我可以查看聚类结果。这里的问题是我想看到带有文档标识的可视化效果，否则在查看它们时结果意义不大。

如有任何意见或建议，将不胜感激。

Answer 1

未测试：您可以将这些标签指定为行名称：

library(tidyverse) Mother_2_ABS_Summer_2019_clean %>% remove_rownames %>% column_to_rownames(var="...1")

也许可以考虑更改第一列名称，使上面的内容更清晰并且更有可能工作。然后就是跟USArrests一样的格式了。

RStudio：get_dist() 错误消息“'x' 必须是数字”遵循聚类指南？

RStudio: get_dist() error message "'x' must be numeric" following clustering guide?

import

r

cluster-analysis

dataset

k-means