将行合并为一个有条件的行,并将一行中的值替换为另一行中的值 R
Merge Row into one with condition and replace value in one row with value in the other R
我在 R 中有一个看起来像这样的数据集:
A <- c("X", "Y", "Z", "W", "U")
B <- c("apple", "pear", "apple", "pear", "pear")
C <- c("december", "december" ,"June", "june", "march")
D <- c("Winter", "Summer" ,"Winter", "Summer", "Summer")
df <- data.frame(A,B,C,D);df
A B C D
1 X apple december Winter
2 Y pear december Summer
3 Z apple June Winter
4 W pear june Summer
5 U pear march Summer
我想按 C 列合并行(将第 1 行与第 2 行混合,第 3 行与第 4 行混合),但我还想替换 B 行中的值,同时考虑到 D 列。基本上,当C 中的 2 个值是相同的(例如“十二月”),当 D 为“夏季”(“梨”)时 B 中的值总是替换为 D 为“冬季”(苹果)时 B 中的值
我想在最后有一个像这样的数据框:
A B C D
1 X apple december Winter,Summer
2 Z apple june Winter,Summer
3 U pear march Summer
当合并 2 行时,我真的想保留 D 列中的 2 个值。
有人有想法吗?
一个data.table
选项
setDT(df)[
,
c(
lapply(
setNames(.(A, B), c("A", "B")),
function(x) if ("Winter" %in% D) replace(x, D == "Summer", x[D == "Winter"]) else x
),
.(D = D)
),
C
][
,
lapply(.SD, function(x) toString(unique(x))),
C
][,
.SD,
.SDcols = names(df)
]
给予
A B C D
1: X apple december Winter, Summer
2: Z apple june Winter, Summer
3: U pear march Summer
数据
> dput(df)
structure(list(A = c("X", "Y", "Z", "W", "U"), B = c("apple",
"pear", "apple", "pear", "pear"), C = c("december", "december",
"june", "june", "march"), D = c("Winter", "Summer", "Winter",
"Summer", "Summer")), class = "data.frame", row.names = c(NA,
-5L))
选项dplyr
library(dplyr)
library(tidyr)
df %>%
group_by(C = tolower(C)) %>%
mutate(across(c(A, B), ~ if(n_distinct(D) > 1) replace(., D %in% 'Summer', NA) else
.)) %>%
fill(c(A, B)) %>%
summarise(across(c(A, B), first), D = toString(D), .groups = 'drop')
# A tibble: 3 x 4
# C A B D
#* <chr> <chr> <chr> <chr>
#1 december X apple Winter, Summer
#2 june Z apple Winter, Summer
#3 march U pear Summer
我在 R 中有一个看起来像这样的数据集:
A <- c("X", "Y", "Z", "W", "U")
B <- c("apple", "pear", "apple", "pear", "pear")
C <- c("december", "december" ,"June", "june", "march")
D <- c("Winter", "Summer" ,"Winter", "Summer", "Summer")
df <- data.frame(A,B,C,D);df
A B C D
1 X apple december Winter
2 Y pear december Summer
3 Z apple June Winter
4 W pear june Summer
5 U pear march Summer
我想按 C 列合并行(将第 1 行与第 2 行混合,第 3 行与第 4 行混合),但我还想替换 B 行中的值,同时考虑到 D 列。基本上,当C 中的 2 个值是相同的(例如“十二月”),当 D 为“夏季”(“梨”)时 B 中的值总是替换为 D 为“冬季”(苹果)时 B 中的值 我想在最后有一个像这样的数据框:
A B C D
1 X apple december Winter,Summer
2 Z apple june Winter,Summer
3 U pear march Summer
当合并 2 行时,我真的想保留 D 列中的 2 个值。
有人有想法吗?
一个data.table
选项
setDT(df)[
,
c(
lapply(
setNames(.(A, B), c("A", "B")),
function(x) if ("Winter" %in% D) replace(x, D == "Summer", x[D == "Winter"]) else x
),
.(D = D)
),
C
][
,
lapply(.SD, function(x) toString(unique(x))),
C
][,
.SD,
.SDcols = names(df)
]
给予
A B C D
1: X apple december Winter, Summer
2: Z apple june Winter, Summer
3: U pear march Summer
数据
> dput(df)
structure(list(A = c("X", "Y", "Z", "W", "U"), B = c("apple",
"pear", "apple", "pear", "pear"), C = c("december", "december",
"june", "june", "march"), D = c("Winter", "Summer", "Winter",
"Summer", "Summer")), class = "data.frame", row.names = c(NA,
-5L))
选项dplyr
library(dplyr)
library(tidyr)
df %>%
group_by(C = tolower(C)) %>%
mutate(across(c(A, B), ~ if(n_distinct(D) > 1) replace(., D %in% 'Summer', NA) else
.)) %>%
fill(c(A, B)) %>%
summarise(across(c(A, B), first), D = toString(D), .groups = 'drop')
# A tibble: 3 x 4
# C A B D
#* <chr> <chr> <chr> <chr>
#1 december X apple Winter, Summer
#2 june Z apple Winter, Summer
#3 march U pear Summer