您如何按组以特定顺序排列相似的名称?
How do you arrange similar names in a particular order by group?
我想按特定顺序排列姓名列表。
比如我有如下df:
structure(list(group = c("A", "A", "A", "B", "B", "B", "C", "D",
"D", "E", "E"), order = c(1, 2, 3, 1, 2, 3, 1, 1, 2, 1, 2), name = c("Kate M. Smith",
"Kate Marie Smith", "Kate Smith", "Ben Frederick Jones", "Ben Jones",
"Ben F. Jones", "Charles Lane", "Renee Perez", "Renee G. Perez",
"Henry Paul Poss", "Henry Poss")), class = c("tbl_df", "tbl",
"data.frame"), row.names = c(NA, -11L))
group order name
<chr> <dbl> <chr>
1 A 1 Kate M. Smith
2 A 2 Kate Marie Smith
3 A 3 Kate Smith
4 B 1 Ben Frederick Jones
5 B 2 Ben Jones
6 B 3 Ben F. Jones
7 C 1 Charles Lane
8 D 1 Renee Perez
9 D 2 Renee G. Perez
10 E 1 Henry Paul Poss
11 E 2 Henry Poss
我想将每个组的顺序重新排列为“名字、姓氏”、“名字、中间名首字母、姓氏”和“名字、中间名、姓氏”。最终结果如下所示:
structure(list(group = c("A", "A", "A", "B", "B", "B", "C", "D",
"D", "E", "E"), order = c(1, 2, 3, 1, 2, 3, 1, 1, 2, 1, 2), name = c("Kate Smith",
"Kate M. Smith", "Kate Marie Smith", "Ben Jones", "Ben F. Jones",
"Ben Frederick Jones", "Charles Lane", "Renee Perez", "Renee G. Perez",
"Henry Poss", "Henry Paul Poss")), class = c("tbl_df", "tbl",
"data.frame"), row.names = c(NA, -11L))
group order name
<chr> <dbl> <chr>
1 A 1 Kate Smith
2 A 2 Kate M. Smith
3 A 3 Kate Marie Smith
4 B 1 Ben Jones
5 B 2 Ben F. Jones
6 B 3 Ben Frederick Jones
7 C 1 Charles Lane
8 D 1 Renee Perez
9 D 2 Renee G. Perez
10 E 1 Henry Poss
11 E 2 Henry Paul Poss
注意 A 组来自:
- 凯特·史密斯
- 凯特·玛丽·史密斯
- 凯特·史密斯
收件人:
- 凯特·史密斯
- 凯特·史密斯
- 凯特·玛丽·史密斯
我试过使用 arrange
,但它似乎并不总能捕捉到准确的顺序。
任何指导将不胜感激!
我们可能必须通过计算 arrange
中的单词数和字符数来完成此操作,然后在按 [= 分组后将 'order' 列值更改为 row_number()
19=]
library(dplyr)
library(stringr)
df %>%
arrange(group, str_count(name, "\w+"), nchar(name)) %>%
group_by(group) %>%
mutate(order = row_number()) %>%
ungroup
-输出
# A tibble: 11 × 3
group order name
<chr> <int> <chr>
1 A 1 Kate Smith
2 A 2 Kate M. Smith
3 A 3 Kate Marie Smith
4 B 1 Ben Jones
5 B 2 Ben F. Jones
6 B 3 Ben Frederick Jones
7 C 1 Charles Lane
8 D 1 Renee Perez
9 D 2 Renee G. Perez
10 E 1 Henry Poss
11 E 2 Henry Paul Poss
按每个组中名称字符串中的字符数排序应该会给出所需的结果。
使用data.table:
library(data.table)
dt <- structure(list(group = c("A", "A", "A", "B", "B", "B", "C", "D",
"D", "E", "E"), order = c(1, 2, 3, 1, 2, 3, 1, 1, 2, 1, 2), name = c("Kate M. Smith",
"Kate Marie Smith", "Kate Smith", "Ben Frederick Jones", "Ben Jones",
"Ben F. Jones", "Charles Lane", "Renee Perez", "Renee G. Perez",
"Henry Paul Poss", "Henry Poss")), class = c("tbl_df", "tbl",
"data.frame"), row.names = c(NA, -11L))
setDT(dt)
dt[order(group, nchar(name))]
结果:
group order name
1: A 3 Kate Smith
2: A 1 Kate M. Smith
3: A 2 Kate Marie Smith
4: B 2 Ben Jones
5: B 3 Ben F. Jones
6: B 1 Ben Frederick Jones
7: C 1 Charles Lane
8: D 1 Renee Perez
9: D 2 Renee G. Perez
10: E 2 Henry Poss
11: E 1 Henry Paul Poss
我想按特定顺序排列姓名列表。
比如我有如下df:
structure(list(group = c("A", "A", "A", "B", "B", "B", "C", "D",
"D", "E", "E"), order = c(1, 2, 3, 1, 2, 3, 1, 1, 2, 1, 2), name = c("Kate M. Smith",
"Kate Marie Smith", "Kate Smith", "Ben Frederick Jones", "Ben Jones",
"Ben F. Jones", "Charles Lane", "Renee Perez", "Renee G. Perez",
"Henry Paul Poss", "Henry Poss")), class = c("tbl_df", "tbl",
"data.frame"), row.names = c(NA, -11L))
group order name
<chr> <dbl> <chr>
1 A 1 Kate M. Smith
2 A 2 Kate Marie Smith
3 A 3 Kate Smith
4 B 1 Ben Frederick Jones
5 B 2 Ben Jones
6 B 3 Ben F. Jones
7 C 1 Charles Lane
8 D 1 Renee Perez
9 D 2 Renee G. Perez
10 E 1 Henry Paul Poss
11 E 2 Henry Poss
我想将每个组的顺序重新排列为“名字、姓氏”、“名字、中间名首字母、姓氏”和“名字、中间名、姓氏”。最终结果如下所示:
structure(list(group = c("A", "A", "A", "B", "B", "B", "C", "D",
"D", "E", "E"), order = c(1, 2, 3, 1, 2, 3, 1, 1, 2, 1, 2), name = c("Kate Smith",
"Kate M. Smith", "Kate Marie Smith", "Ben Jones", "Ben F. Jones",
"Ben Frederick Jones", "Charles Lane", "Renee Perez", "Renee G. Perez",
"Henry Poss", "Henry Paul Poss")), class = c("tbl_df", "tbl",
"data.frame"), row.names = c(NA, -11L))
group order name
<chr> <dbl> <chr>
1 A 1 Kate Smith
2 A 2 Kate M. Smith
3 A 3 Kate Marie Smith
4 B 1 Ben Jones
5 B 2 Ben F. Jones
6 B 3 Ben Frederick Jones
7 C 1 Charles Lane
8 D 1 Renee Perez
9 D 2 Renee G. Perez
10 E 1 Henry Poss
11 E 2 Henry Paul Poss
注意 A 组来自:
- 凯特·史密斯
- 凯特·玛丽·史密斯
- 凯特·史密斯
收件人:
- 凯特·史密斯
- 凯特·史密斯
- 凯特·玛丽·史密斯
我试过使用 arrange
,但它似乎并不总能捕捉到准确的顺序。
任何指导将不胜感激!
我们可能必须通过计算 arrange
中的单词数和字符数来完成此操作,然后在按 [= 分组后将 'order' 列值更改为 row_number()
19=]
library(dplyr)
library(stringr)
df %>%
arrange(group, str_count(name, "\w+"), nchar(name)) %>%
group_by(group) %>%
mutate(order = row_number()) %>%
ungroup
-输出
# A tibble: 11 × 3
group order name
<chr> <int> <chr>
1 A 1 Kate Smith
2 A 2 Kate M. Smith
3 A 3 Kate Marie Smith
4 B 1 Ben Jones
5 B 2 Ben F. Jones
6 B 3 Ben Frederick Jones
7 C 1 Charles Lane
8 D 1 Renee Perez
9 D 2 Renee G. Perez
10 E 1 Henry Poss
11 E 2 Henry Paul Poss
按每个组中名称字符串中的字符数排序应该会给出所需的结果。
使用data.table:
library(data.table)
dt <- structure(list(group = c("A", "A", "A", "B", "B", "B", "C", "D",
"D", "E", "E"), order = c(1, 2, 3, 1, 2, 3, 1, 1, 2, 1, 2), name = c("Kate M. Smith",
"Kate Marie Smith", "Kate Smith", "Ben Frederick Jones", "Ben Jones",
"Ben F. Jones", "Charles Lane", "Renee Perez", "Renee G. Perez",
"Henry Paul Poss", "Henry Poss")), class = c("tbl_df", "tbl",
"data.frame"), row.names = c(NA, -11L))
setDT(dt)
dt[order(group, nchar(name))]
结果:
group order name
1: A 3 Kate Smith
2: A 1 Kate M. Smith
3: A 2 Kate Marie Smith
4: B 2 Ben Jones
5: B 3 Ben F. Jones
6: B 1 Ben Frederick Jones
7: C 1 Charles Lane
8: D 1 Renee Perez
9: D 2 Renee G. Perez
10: E 2 Henry Poss
11: E 1 Henry Paul Poss