您如何按组以特定顺序排列相似的名称?

How do you arrange similar names in a particular order by group?

我想按特定顺序排列姓名列表。

比如我有如下df:

structure(list(group = c("A", "A", "A", "B", "B", "B", "C", "D", 
"D", "E", "E"), order = c(1, 2, 3, 1, 2, 3, 1, 1, 2, 1, 2), name = c("Kate M. Smith", 
"Kate Marie Smith", "Kate Smith", "Ben Frederick Jones", "Ben Jones", 
"Ben F. Jones", "Charles Lane", "Renee Perez", "Renee G. Perez", 
"Henry Paul Poss", "Henry Poss")), class = c("tbl_df", "tbl", 
"data.frame"), row.names = c(NA, -11L))

   group order name               
   <chr> <dbl> <chr>              
 1 A         1 Kate M. Smith      
 2 A         2 Kate Marie Smith   
 3 A         3 Kate Smith         
 4 B         1 Ben Frederick Jones
 5 B         2 Ben Jones          
 6 B         3 Ben F. Jones       
 7 C         1 Charles Lane       
 8 D         1 Renee Perez        
 9 D         2 Renee G. Perez     
10 E         1 Henry Paul Poss    
11 E         2 Henry Poss 

我想将每个组的顺序重新排列为“名字、姓氏”、“名字、中间名首字母、姓氏”和“名字、中间名、姓氏”。最终结果如下所示:

structure(list(group = c("A", "A", "A", "B", "B", "B", "C", "D", 
"D", "E", "E"), order = c(1, 2, 3, 1, 2, 3, 1, 1, 2, 1, 2), name = c("Kate Smith", 
"Kate M. Smith", "Kate Marie Smith", "Ben Jones", "Ben F. Jones", 
"Ben Frederick Jones", "Charles Lane", "Renee Perez", "Renee G. Perez", 
"Henry Poss", "Henry Paul Poss")), class = c("tbl_df", "tbl", 
"data.frame"), row.names = c(NA, -11L))

   group order name               
   <chr> <dbl> <chr>              
 1 A         1 Kate Smith         
 2 A         2 Kate M. Smith      
 3 A         3 Kate Marie Smith   
 4 B         1 Ben Jones          
 5 B         2 Ben F. Jones       
 6 B         3 Ben Frederick Jones
 7 C         1 Charles Lane       
 8 D         1 Renee Perez        
 9 D         2 Renee G. Perez     
10 E         1 Henry Poss         
11 E         2 Henry Paul Poss  

注意 A 组来自:

  1. 凯特·史密斯
  2. 凯特·玛丽·史密斯
  3. 凯特·史密斯

收件人:

  1. 凯特·史密斯
  2. 凯特·史密斯
  3. 凯特·玛丽·史密斯

我试过使用 arrange,但它似乎并不总能捕捉到准确的顺序。

任何指导将不胜感激!

我们可能必须通过计算 arrange 中的单词数和字符数来完成此操作,然后在按 [= 分组后将 'order' 列值更改为 row_number() 19=]

library(dplyr)
library(stringr)
df %>% 
    arrange(group, str_count(name, "\w+"), nchar(name)) %>%
    group_by(group) %>%
    mutate(order = row_number()) %>%
    ungroup

-输出

# A tibble: 11 × 3
   group order name               
   <chr> <int> <chr>              
 1 A         1 Kate Smith         
 2 A         2 Kate M. Smith      
 3 A         3 Kate Marie Smith   
 4 B         1 Ben Jones          
 5 B         2 Ben F. Jones       
 6 B         3 Ben Frederick Jones
 7 C         1 Charles Lane       
 8 D         1 Renee Perez        
 9 D         2 Renee G. Perez     
10 E         1 Henry Poss         
11 E         2 Henry Paul Poss    

按每个组中名称字符串中的字符数排序应该会给出所需的结果。

使用data.table:

library(data.table)
dt <- structure(list(group = c("A", "A", "A", "B", "B", "B", "C", "D", 
"D", "E", "E"), order = c(1, 2, 3, 1, 2, 3, 1, 1, 2, 1, 2), name = c("Kate M. Smith", 
"Kate Marie Smith", "Kate Smith", "Ben Frederick Jones", "Ben Jones", 
"Ben F. Jones", "Charles Lane", "Renee Perez", "Renee G. Perez", 
"Henry Paul Poss", "Henry Poss")), class = c("tbl_df", "tbl", 
"data.frame"), row.names = c(NA, -11L))
setDT(dt)

dt[order(group, nchar(name))]

结果:

    group order                name
 1:     A     3          Kate Smith
 2:     A     1       Kate M. Smith
 3:     A     2    Kate Marie Smith
 4:     B     2           Ben Jones
 5:     B     3        Ben F. Jones
 6:     B     1 Ben Frederick Jones
 7:     C     1        Charles Lane
 8:     D     1         Renee Perez
 9:     D     2      Renee G. Perez
10:     E     2          Henry Poss
11:     E     1     Henry Paul Poss