R将DF中的Columns与字符串结合起来,并根据特定的Columnorder按字母顺序排序
R combine Columns in DF with character strings and sort them alphabetically based on specific Columnorder
我有一个数据框,其中有 4 列指定了 2 个人的名字和姓氏:
Surname Firstname Surname2 Firstname2
1 Wolf Stefan Schmit Paul
2 Schmit Paul Wolf Stefan
3 Schmit Paul Fore Sabine
4 Fore Sabine Schmit Hans
5 Schmit Hans Wolf Stefan
6 Schmit Paul Schmit Hans
7 Bracht Armin Brecht Alwin
8 Brecht Alwin Bracht Armin
现在我想添加第五个额外的列,其中两个人根据姓氏按字母顺序分组,但如果这相同则根据名字...在新的第五列中应该是两个人, 首先是名字,然后是姓氏,并用逗号分隔 f.e:
Surname Firstname Surname2 Firstname2 Team
1 Wolf Stefan Schmit Paul Paul Schmit , Stefan Wolf
2 Schmit Paul Wolf Stefan Paul Schmit , Stefan Wolf
3 Schmit Paul Fore Sabine Sabine Fore , Paul Schmit
4 Fore Sabine Schmit Hans Sabine Fore , Hans Schmit
5 Schmit Hans Wolf Stefan Hans Schmit , Stefan Wolf
6 Schmit Paul Schmit Hans Hans Schmit , Paul Schmit
7 Bracht Armin Brecht Alwin Armin Bracht , Alwin Brecht
8 Brecht Alwin Bracht Armin Armin Bracht , Alwin Brecht
我有一个基于 for 循环的工作代码,但我正在寻找一个更有效的版本来处理更大的数据帧并且使用起来更舒适,因为每个名称的单独列可能超过 2...
# Simple Code:
Surname <- c("Wolf", "Schmit", "Schmit", "Fore", "Schmit", "Schmit", "Bracht", "Brecht")
Firstname <- c("Stefan", "Paul", "Paul", "Sabine", "Hans", "Paul", "Armin", "Alwin")
Surname2 <- c("Schmit", "Wolf", "Fore", "Schmit", "Wolf", "Schmit", "Brecht", "Bracht")
Firstname2 <- c("Paul", "Stefan", "Sabine", "Hans", "Stefan", "Hans", "Alwin", "Armin")
library(reshape2)
tester <- melt(data.frame(Surname, Firstname, Surname2, Firstname2))
tester[] <- lapply(tester, as.character)
tester
namescomp <- function(data, i){
if (data[i, "Surname"] < data[i, "Surname2"]){
paste(data[i, "Firstname"], data[i, "Surname"], ", ", data[i, "Firstname2"], data[i, "Surname2"])
} else if (data[i, "Surname"] > data[i, "Surname2"]){
paste(data[i, "Firstname2"], data[i, "Surname2"], ", ", data[i, "Firstname"], data[i, "Surname"])
} else
{ if(data[i, "Firstname"] < data[i, "Firstname2"]){
paste(data[i, "Firstname"], data[i, "Surname"], ", ", data [i, "Firstname2"], data[i, "Surname2"])
} else {
paste(data[i, "Firstname2"], data[i, "Surname2"], ", ", data[i, "Firstname"], data[i, "Surname"])
}
}
}
for(y in 1:nrow(tester)){
i <- y
tester[i, "Team"] <- namescomp(tester, i)
}
tester
一个tidyverse
解决方案:
library(tibble)
library(dplyr)
library(tidyr)
library(stringr)
Surname <- c("Wolf", "Schmit", "Schmit", "Fore", "Schmit", "Schmit", "Bracht", "Brecht")
Firstname <- c("Stefan", "Paul", "Paul", "Sabine", "Hans", "Paul", "Armin", "Alwin")
Surname2 <- c("Schmit", "Wolf", "Fore", "Schmit", "Wolf", "Schmit", "Brecht", "Bracht")
Firstname2 <- c("Paul", "Stefan", "Sabine", "Hans", "Stefan", "Hans", "Alwin", "Armin")
df <- data_frame(Surname, Firstname, Surname2, Firstname2)
df %>%
# create an identifier for each team
rownames_to_column(var = 'team_id') %>%
# split all name components into separate rows
gather(component, value, -team_id) %>%
# extract a person_id from the number behind first/last name. If there's no number there, use 1
mutate(person_id = coalesce(as.numeric(str_extract(component, '[0-9]+$')), 1)) %>%
# remove the number from the first/last name, then pivot the data.frame so that there's a row for every team x person
mutate(component = str_replace(component, '[0-9]+$', '')) %>%
spread(component, value) %>%
# order by team_id (not strictly necessary), then by Surname, then by Firstname (if you want the order reversed, wrap the variable in `desc()`)
arrange(team_id, Surname, Firstname) %>%
# collapse Surname and Firstname into a `full_name` column
unite(full_name, Firstname, Surname, sep = ' ') %>%
# collapse the full names within each team into a single line, separated by commas
group_by(team_id) %>%
summarize(Team = paste(full_name, collapse=', '))
不会完全产生您想要的输出,但您可以将它产生的输出加入行名上的原始 table。
我有一个数据框,其中有 4 列指定了 2 个人的名字和姓氏:
Surname Firstname Surname2 Firstname2
1 Wolf Stefan Schmit Paul
2 Schmit Paul Wolf Stefan
3 Schmit Paul Fore Sabine
4 Fore Sabine Schmit Hans
5 Schmit Hans Wolf Stefan
6 Schmit Paul Schmit Hans
7 Bracht Armin Brecht Alwin
8 Brecht Alwin Bracht Armin
现在我想添加第五个额外的列,其中两个人根据姓氏按字母顺序分组,但如果这相同则根据名字...在新的第五列中应该是两个人, 首先是名字,然后是姓氏,并用逗号分隔 f.e:
Surname Firstname Surname2 Firstname2 Team
1 Wolf Stefan Schmit Paul Paul Schmit , Stefan Wolf
2 Schmit Paul Wolf Stefan Paul Schmit , Stefan Wolf
3 Schmit Paul Fore Sabine Sabine Fore , Paul Schmit
4 Fore Sabine Schmit Hans Sabine Fore , Hans Schmit
5 Schmit Hans Wolf Stefan Hans Schmit , Stefan Wolf
6 Schmit Paul Schmit Hans Hans Schmit , Paul Schmit
7 Bracht Armin Brecht Alwin Armin Bracht , Alwin Brecht
8 Brecht Alwin Bracht Armin Armin Bracht , Alwin Brecht
我有一个基于 for 循环的工作代码,但我正在寻找一个更有效的版本来处理更大的数据帧并且使用起来更舒适,因为每个名称的单独列可能超过 2...
# Simple Code:
Surname <- c("Wolf", "Schmit", "Schmit", "Fore", "Schmit", "Schmit", "Bracht", "Brecht")
Firstname <- c("Stefan", "Paul", "Paul", "Sabine", "Hans", "Paul", "Armin", "Alwin")
Surname2 <- c("Schmit", "Wolf", "Fore", "Schmit", "Wolf", "Schmit", "Brecht", "Bracht")
Firstname2 <- c("Paul", "Stefan", "Sabine", "Hans", "Stefan", "Hans", "Alwin", "Armin")
library(reshape2)
tester <- melt(data.frame(Surname, Firstname, Surname2, Firstname2))
tester[] <- lapply(tester, as.character)
tester
namescomp <- function(data, i){
if (data[i, "Surname"] < data[i, "Surname2"]){
paste(data[i, "Firstname"], data[i, "Surname"], ", ", data[i, "Firstname2"], data[i, "Surname2"])
} else if (data[i, "Surname"] > data[i, "Surname2"]){
paste(data[i, "Firstname2"], data[i, "Surname2"], ", ", data[i, "Firstname"], data[i, "Surname"])
} else
{ if(data[i, "Firstname"] < data[i, "Firstname2"]){
paste(data[i, "Firstname"], data[i, "Surname"], ", ", data [i, "Firstname2"], data[i, "Surname2"])
} else {
paste(data[i, "Firstname2"], data[i, "Surname2"], ", ", data[i, "Firstname"], data[i, "Surname"])
}
}
}
for(y in 1:nrow(tester)){
i <- y
tester[i, "Team"] <- namescomp(tester, i)
}
tester
一个tidyverse
解决方案:
library(tibble)
library(dplyr)
library(tidyr)
library(stringr)
Surname <- c("Wolf", "Schmit", "Schmit", "Fore", "Schmit", "Schmit", "Bracht", "Brecht")
Firstname <- c("Stefan", "Paul", "Paul", "Sabine", "Hans", "Paul", "Armin", "Alwin")
Surname2 <- c("Schmit", "Wolf", "Fore", "Schmit", "Wolf", "Schmit", "Brecht", "Bracht")
Firstname2 <- c("Paul", "Stefan", "Sabine", "Hans", "Stefan", "Hans", "Alwin", "Armin")
df <- data_frame(Surname, Firstname, Surname2, Firstname2)
df %>%
# create an identifier for each team
rownames_to_column(var = 'team_id') %>%
# split all name components into separate rows
gather(component, value, -team_id) %>%
# extract a person_id from the number behind first/last name. If there's no number there, use 1
mutate(person_id = coalesce(as.numeric(str_extract(component, '[0-9]+$')), 1)) %>%
# remove the number from the first/last name, then pivot the data.frame so that there's a row for every team x person
mutate(component = str_replace(component, '[0-9]+$', '')) %>%
spread(component, value) %>%
# order by team_id (not strictly necessary), then by Surname, then by Firstname (if you want the order reversed, wrap the variable in `desc()`)
arrange(team_id, Surname, Firstname) %>%
# collapse Surname and Firstname into a `full_name` column
unite(full_name, Firstname, Surname, sep = ' ') %>%
# collapse the full names within each team into a single line, separated by commas
group_by(team_id) %>%
summarize(Team = paste(full_name, collapse=', '))
不会完全产生您想要的输出,但您可以将它产生的输出加入行名上的原始 table。