为什么我在 R 中的 table 中没有按字母顺序排序?只有 tidyverse
why do I not get the ordering in alphabetical order in my table, in R? Only with tidyverse
我正在尝试按字母顺序排列 'Smoking status' 类别 order.This 应该只使用 tidyverse。
这是我试过的
smoking_gender_disch_piv_count_ren <- smoking_gender_disch_piv_count %>%
dplyr::rename('Smoking Status' = smoking_status) %>%
dplyr::arrange('Smoking status')
smoking_gender_disch_piv_count_ren
如你所见,我没有先得到 Current smoker,然后得到 ex smoker 等。我认为 dplyr 中的 arrange 函数可以解决问题。但事实并非如此。
这是我的数据:
structure(list(smoking_status = structure(1:5, .Label = c("Ex smoker",
"Current smoker", "Never smoked", "Unknown", "Non smoker - smoking history unknown"
), class = "factor"), Female = c(24.0601503759398, 9.02255639097744,
35.3383458646617, 6.01503759398496, 25.5639097744361), Male = c(34.9753694581281,
13.7931034482759, 23.6453201970443, 1.97044334975369, 25.615763546798
), NSTEMI = c(31.9078947368421, 12.5, 28.2894736842105, 3.28947368421053,
24.0131578947368), STEMI = c(18.75, 6.25, 28.125, 6.25, 40.625
)), row.names = c(NA, -5L), class = c("tbl_df", "tbl", "data.frame"
))
除了将 'Smoking Status'
拼错为 'Smoking status'
之外,您 运行 还遇到了另外两个问题。
变量名与字符串
我们使用单引号('
)或双引号("
)来指定字符串:'my string'
或"my string"
.但是,要指定(不寻常的)变量名称(符号),其中包含空格,(`
):`my variable`
。由于输入这些反引号很麻烦,因此我们通常在变量名中使用下划线 (_
) 而不是空格。
当(重新)命名 列时,character
字符串与符号一样好。即
# ... %>%
dplyr::rename('Smoking Status' = smoking_status) # %>% ...
# |--------------|
# character string
等同于
# ... %>%
dplyr::rename(`Smoking Status` = smoking_status) # %>% ...
# |--------------|
# symbol
但是,当使用mutate()
或filter()
或arrange()
执行向量化运算时,任何字符串都将被视为简单的标量character
值。即
# ... %>%
mutate(test = 'Smoking Status') # %>% ...
# |--------------|
# character string
将不复制`Smoking Status`
列(一个factor
)
# A tibble: 5 x 6
... test
... <fct>
1 ... Ex smoker
2 ... Current smoker
3 ... Never smoked
4 ... Unknown
5 ... Non smoker - smoking history unknown
而是给你一个 (character
) 列,其中填充了文字字符串 'Smoking Status'
:
# A tibble: 5 x 6
... test
... <chr>
1 ... Smoking Status
2 ... Smoking Status
3 ... Smoking Status
4 ... Smoking Status
5 ... Smoking Status
同样,您的
# ... %>%
dplyr::arrange('Smoking Status')
# |----|
# Corrected typo: 'status'.
不在 `Smoking Status`
列上排序,而是在填充字符串 'Smoking Status'
的(临时)列上排序。由于该列中的所有内容都是相同的,因此根本不会发生 rear运行ging,并且 smoking_gender_disch_piv_count
数据集保持不变。
修复
要解决这个特殊的问题,请使用:
# ... %>%
dplyr::arrange(`Smoking Status`)
字符串与因子
即使解决了上述问题,您仍然会遇到问题。您的 Smoking Status
列是 factor
[1] Ex smoker Current smoker Never smoked Unknown Non smoker - smoking history unknown
Levels: Ex smoker Current smoker Never smoked Unknown Non smoker - smoking history unknown
因此,当您对此列进行排序时,它遵循 factor
级别的顺序,显然 而不是 按字母顺序排列。
修复
要按字母顺序排序,请使用 `Smoking Status`
列的 character
形式:
# ... %>%
dplyr::arrange(as.character(`Smoking Status`))
解决方案
鉴于您复制的 smoking_gender_disch_piv_count
数据集
smoking_gender_disch_piv_count <-
structure(list(smoking_status = structure(1:5, .Label = c("Ex smoker", "Current smoker", "Never smoked", "Unknown", "Non smoker - smoking history unknown"), class = "factor"),
Female = c(24.0601503759398, 9.02255639097744, 35.3383458646617, 6.01503759398496, 25.5639097744361),
Male = c(34.9753694581281, 13.7931034482759, 23.6453201970443, 1.97044334975369, 25.615763546798),
NSTEMI = c(31.9078947368421, 12.5, 28.2894736842105, 3.28947368421053, 24.0131578947368),
STEMI = c(18.75, 6.25, 28.125, 6.25, 40.625)),
row.names = c(NA, -5L), class = c("tbl_df", "tbl", "data.frame"))
以下dplyr
工作流程
smoking_gender_disch_piv_count_ren <- smoking_gender_disch_piv_count %>%
dplyr::rename(`Smoking Status` = smoking_status) %>%
dplyr::arrange(as.character(`Smoking Status`))
将为您提供 smoking_gender_disch_piv_count_ren
所需的结果
# A tibble: 5 x 5
`Smoking Status` Female Male NSTEMI STEMI
<fct> <dbl> <dbl> <dbl> <dbl>
1 Current smoker 9.02 13.8 12.5 6.25
2 Ex smoker 24.1 35.0 31.9 18.8
3 Never smoked 35.3 23.6 28.3 28.1
4 Non smoker - smoking history unknown 25.6 25.6 24.0 40.6
5 Unknown 6.02 1.97 3.29 6.25
同时仍保留 `Smoking Status`
中的 factor
信息。
我正在尝试按字母顺序排列 'Smoking status' 类别 order.This 应该只使用 tidyverse。
这是我试过的
smoking_gender_disch_piv_count_ren <- smoking_gender_disch_piv_count %>%
dplyr::rename('Smoking Status' = smoking_status) %>%
dplyr::arrange('Smoking status')
smoking_gender_disch_piv_count_ren
如你所见,我没有先得到 Current smoker,然后得到 ex smoker 等。我认为 dplyr 中的 arrange 函数可以解决问题。但事实并非如此。
这是我的数据:
structure(list(smoking_status = structure(1:5, .Label = c("Ex smoker",
"Current smoker", "Never smoked", "Unknown", "Non smoker - smoking history unknown"
), class = "factor"), Female = c(24.0601503759398, 9.02255639097744,
35.3383458646617, 6.01503759398496, 25.5639097744361), Male = c(34.9753694581281,
13.7931034482759, 23.6453201970443, 1.97044334975369, 25.615763546798
), NSTEMI = c(31.9078947368421, 12.5, 28.2894736842105, 3.28947368421053,
24.0131578947368), STEMI = c(18.75, 6.25, 28.125, 6.25, 40.625
)), row.names = c(NA, -5L), class = c("tbl_df", "tbl", "data.frame"
))
除了将 'Smoking Status'
拼错为 'Smoking status'
之外,您 运行 还遇到了另外两个问题。
变量名与字符串
我们使用单引号('
)或双引号("
)来指定字符串:'my string'
或"my string"
.但是,要指定(不寻常的)变量名称(符号),其中包含空格,`
):`my variable`
。由于输入这些反引号很麻烦,因此我们通常在变量名中使用下划线 (_
) 而不是空格。
当(重新)命名 列时,character
字符串与符号一样好。即
# ... %>%
dplyr::rename('Smoking Status' = smoking_status) # %>% ...
# |--------------|
# character string
等同于
# ... %>%
dplyr::rename(`Smoking Status` = smoking_status) # %>% ...
# |--------------|
# symbol
但是,当使用mutate()
或filter()
或arrange()
执行向量化运算时,任何字符串都将被视为简单的标量character
值。即
# ... %>%
mutate(test = 'Smoking Status') # %>% ...
# |--------------|
# character string
将不复制`Smoking Status`
列(一个factor
)
# A tibble: 5 x 6
... test
... <fct>
1 ... Ex smoker
2 ... Current smoker
3 ... Never smoked
4 ... Unknown
5 ... Non smoker - smoking history unknown
而是给你一个 (character
) 列,其中填充了文字字符串 'Smoking Status'
:
# A tibble: 5 x 6
... test
... <chr>
1 ... Smoking Status
2 ... Smoking Status
3 ... Smoking Status
4 ... Smoking Status
5 ... Smoking Status
同样,您的
# ... %>%
dplyr::arrange('Smoking Status')
# |----|
# Corrected typo: 'status'.
不在 `Smoking Status`
列上排序,而是在填充字符串 'Smoking Status'
的(临时)列上排序。由于该列中的所有内容都是相同的,因此根本不会发生 rear运行ging,并且 smoking_gender_disch_piv_count
数据集保持不变。
修复
要解决这个特殊的问题,请使用:
# ... %>%
dplyr::arrange(`Smoking Status`)
字符串与因子
即使解决了上述问题,您仍然会遇到问题。您的 Smoking Status
列是 factor
[1] Ex smoker Current smoker Never smoked Unknown Non smoker - smoking history unknown
Levels: Ex smoker Current smoker Never smoked Unknown Non smoker - smoking history unknown
因此,当您对此列进行排序时,它遵循 factor
级别的顺序,显然 而不是 按字母顺序排列。
修复
要按字母顺序排序,请使用 `Smoking Status`
列的 character
形式:
# ... %>%
dplyr::arrange(as.character(`Smoking Status`))
解决方案
鉴于您复制的 smoking_gender_disch_piv_count
数据集
smoking_gender_disch_piv_count <-
structure(list(smoking_status = structure(1:5, .Label = c("Ex smoker", "Current smoker", "Never smoked", "Unknown", "Non smoker - smoking history unknown"), class = "factor"),
Female = c(24.0601503759398, 9.02255639097744, 35.3383458646617, 6.01503759398496, 25.5639097744361),
Male = c(34.9753694581281, 13.7931034482759, 23.6453201970443, 1.97044334975369, 25.615763546798),
NSTEMI = c(31.9078947368421, 12.5, 28.2894736842105, 3.28947368421053, 24.0131578947368),
STEMI = c(18.75, 6.25, 28.125, 6.25, 40.625)),
row.names = c(NA, -5L), class = c("tbl_df", "tbl", "data.frame"))
以下dplyr
工作流程
smoking_gender_disch_piv_count_ren <- smoking_gender_disch_piv_count %>%
dplyr::rename(`Smoking Status` = smoking_status) %>%
dplyr::arrange(as.character(`Smoking Status`))
将为您提供 smoking_gender_disch_piv_count_ren
# A tibble: 5 x 5
`Smoking Status` Female Male NSTEMI STEMI
<fct> <dbl> <dbl> <dbl> <dbl>
1 Current smoker 9.02 13.8 12.5 6.25
2 Ex smoker 24.1 35.0 31.9 18.8
3 Never smoked 35.3 23.6 28.3 28.1
4 Non smoker - smoking history unknown 25.6 25.6 24.0 40.6
5 Unknown 6.02 1.97 3.29 6.25
同时仍保留 `Smoking Status`
中的 factor
信息。