为什么我在 R 中的 table 中没有按字母顺序排序?只有 tidyverse

why do I not get the ordering in alphabetical order in my table, in R? Only with tidyverse

我正在尝试按字母顺序排列 'Smoking status' 类别 order.This 应该只使用 tidyverse。

这是我试过的

smoking_gender_disch_piv_count_ren <- smoking_gender_disch_piv_count %>%
       dplyr::rename('Smoking Status' = smoking_status) %>%
       dplyr::arrange('Smoking status')
     smoking_gender_disch_piv_count_ren

如你所见,我没有先得到 Current smoker,然后得到 ex smoker 等。我认为 dplyr 中的 arrange 函数可以解决问题。但事实并非如此。

这是我的数据:

structure(list(smoking_status = structure(1:5, .Label = c("Ex smoker", 
"Current smoker", "Never smoked", "Unknown", "Non smoker - smoking history unknown"
), class = "factor"), Female = c(24.0601503759398, 9.02255639097744, 
35.3383458646617, 6.01503759398496, 25.5639097744361), Male = c(34.9753694581281, 
13.7931034482759, 23.6453201970443, 1.97044334975369, 25.615763546798
), NSTEMI = c(31.9078947368421, 12.5, 28.2894736842105, 3.28947368421053, 
24.0131578947368), STEMI = c(18.75, 6.25, 28.125, 6.25, 40.625
)), row.names = c(NA, -5L), class = c("tbl_df", "tbl", "data.frame"
))

除了将 'Smoking Status' 拼错为 'Smoking status' 之外,您 运行 还遇到了另外两个问题。

变量名与字符串

我们使用单引号(')或双引号(")来指定字符串'my string'"my string" .但是,要指定(不寻常的)变量名称(符号),其中包含空格,`):`my variable`。由于输入这些反引号很麻烦,因此我们通常在变量名中使用下划线 (_) 而不是空格。

(重新)命名 列时,character 字符串与符号一样好。即

  # ... %>%
  dplyr::rename('Smoking Status' = smoking_status) # %>% ...
  #             |--------------|
  #             character string

等同于

  # ... %>%
  dplyr::rename(`Smoking Status` = smoking_status) # %>% ...
  #             |--------------|
  #                  symbol

但是,当使用mutate()filter()arrange()执行向量化运算时,任何字符串都将被视为简单的标量character值。即

  # ... %>%
  mutate(test = 'Smoking Status') # %>% ...
  #             |--------------|
  #             character string

复制`Smoking Status`列(一个factor

# A tibble: 5 x 6
  ... test                                
  ... <fct>                               
1 ... Ex smoker                           
2 ... Current smoker                      
3 ... Never smoked                        
4 ... Unknown                             
5 ... Non smoker - smoking history unknown

而是给你一个 (character) 列,其中填充了文字字符串 'Smoking Status':

# A tibble: 5 x 6
  ... test          
  ... <chr>         
1 ... Smoking Status
2 ... Smoking Status
3 ... Smoking Status
4 ... Smoking Status
5 ... Smoking Status

同样,您的

  # ... %>%
  dplyr::arrange('Smoking Status')
  #                       |----|
  #      Corrected typo: 'status'.

不在 `Smoking Status` 列上排序,而是在填充字符串 'Smoking Status' 的(临时)列上排序。由于该列中的所有内容都是相同的,因此根本不会发生 rear运行ging,并且 smoking_gender_disch_piv_count 数据集保持不变。

修复

要解决这个特殊的问题,请使用:

  # ... %>%
  dplyr::arrange(`Smoking Status`)

字符串与因子

即使解决了上述问题,您仍然会遇到问题。您的 Smoking Status 列是 factor

[1] Ex smoker                            Current smoker                       Never smoked                         Unknown                              Non smoker - smoking history unknown
Levels: Ex smoker Current smoker Never smoked Unknown Non smoker - smoking history unknown

因此,当您对此列进行排序时,它遵循 factor 级别的顺序,显然 而不是 按字母顺序排列。

修复

要按字母顺序排序,请使用 `Smoking Status` 列的 character 形式:

  # ... %>%
  dplyr::arrange(as.character(`Smoking Status`))

解决方案

鉴于您复制的 smoking_gender_disch_piv_count 数据集

smoking_gender_disch_piv_count <-
  structure(list(smoking_status = structure(1:5, .Label = c("Ex smoker", "Current smoker", "Never smoked", "Unknown", "Non smoker - smoking history unknown"), class = "factor"),
                 Female = c(24.0601503759398, 9.02255639097744, 35.3383458646617, 6.01503759398496, 25.5639097744361),
                 Male = c(34.9753694581281, 13.7931034482759, 23.6453201970443, 1.97044334975369, 25.615763546798),
                 NSTEMI = c(31.9078947368421, 12.5, 28.2894736842105, 3.28947368421053, 24.0131578947368),
                 STEMI = c(18.75, 6.25, 28.125, 6.25, 40.625)),
            row.names = c(NA, -5L), class = c("tbl_df", "tbl", "data.frame"))

以下dplyr工作流程

smoking_gender_disch_piv_count_ren <- smoking_gender_disch_piv_count %>%
  dplyr::rename(`Smoking Status` = smoking_status) %>%
  dplyr::arrange(as.character(`Smoking Status`))

将为您提供 smoking_gender_disch_piv_count_ren

所需的结果
# A tibble: 5 x 5
  `Smoking Status`                     Female  Male NSTEMI STEMI
  <fct>                                 <dbl> <dbl>  <dbl> <dbl>
1 Current smoker                         9.02 13.8   12.5   6.25
2 Ex smoker                             24.1  35.0   31.9  18.8 
3 Never smoked                          35.3  23.6   28.3  28.1 
4 Non smoker - smoking history unknown  25.6  25.6   24.0  40.6 
5 Unknown                                6.02  1.97   3.29  6.25

同时仍保留 `Smoking Status` 中的 factor 信息。