使用条件将值替换为上一行
Replace values into previous row with a condition
我想获取 ID 列不以 00 开头的数据,并将 ID 列的此值附加到上一行中描述列的末尾。
然后将剩余的值替换到上一行的Name列之后。我怎样才能用 R 做到这一点?
这是虚拟数据的来源:https://docs.google.com/spreadsheets/d/1SbmaM8hXck-z5nsNfDMbhwijvAGPkPPBgQ_eY4JAMC8/edit?usp=sharing
ID Year Description Name User Factor_1 Factor_2 Factor_3
0011 2016 blue colour AA James Xfac NA NA
is nice XXX XLM Yfac different Yfac NA NA
0024 2017 red colour DD Mark Zfac NA NA
is good YYY STM Lfac unique Zfac NA NA
我想要的:
ID Year Description Name User Factor_1 Factor_2 Factor_3
0011 2016 blue colour is nice XXX XLM Yfac different Yfac
0024 2017 red colour is good YYY STM Lfac unique Zfac
使用 -
bools <- !substr(df$ID,1,2)=="00"
values <- df[bools,1]
df <- df[!bools,]
df$Description <- paste(df[substr(df$ID,1,2)=="00","Description"],values,sep=" ")
df
输出
ID Year Description Name User Factor_1 Factor_2
1 0011 2016 blue colour is nice AA James Xfac NA
3 0024 2017 red colour is good DD Mark Zfac NA
Factor_3
1 NA
3 NA
这是 dplyr
的解决方案:
library(dplyr)
df %>%
bind_cols(df %>% rename_all(function(x) paste0(x, "_dummy"))) %>%
mutate(
Description = ifelse(substr(lead(ID), 1, 2) != "00",
paste(Description, lead(ID)), Description),
Name = lead(Year_dummy),
User = lead(Description_dummy),
Factor_1 = lead(Name_dummy),
Factor_2 = lead(User_dummy),
Factor_3 = lead(Factor_1_dummy)
) %>% select(-ends_with("dummy")) %>%
filter(substr(ID, 1, 2) == "00")
输出:
ID Year Description Name User Factor_1 Factor_2 Factor_3
1 0011 2016 blue colour is nice XXX XLM Yfac different Yfac
2 0024 2017 red colour is good YYY STM Lfac unique Zfac
如果您要处理大量列,dplyr
和 base
R 的组合可以做到:
library(dplyr)
df_combo <- cbind(df, df)
df$Description <- ifelse(substr(lead(df$ID), 1, 2) != "00",
paste(df$Description, lead(df$ID)), df$Description)
for (i in (ncol(df) + 4):ncol(df_combo)) {
df_combo[[i]] <- lead(df_combo[[i - ncol(df) - 2]])
}
df_combo <- subset(df_combo, substr(ID, 1, 2) == "00")
df_descr <- subset(df, substr(ID, 1, 2) == "00")
df_final <- df_combo[, (ncol(df) + 1):ncol(df_combo)]
df_final$Description <- df_descr$Description
rm(df_descr, df_combo)
输出:
ID Year Description Name User Factor_1 Factor_2 Factor_3
1: 0011 2016 blue colour is nice XXX XLM Yfac different Yfac
2: 0024 2017 red colour is good YYY STM Lfac unique Zfac
第一部分是您要将描述粘贴在一起的部分,
还有您想要移动变量的部分,如您所愿 "XXX" 和 "YYY" 在您的 "user" 列中。
此外,在 Viveks 的回答中,所有错误的行都粘贴了所有 "right" 行,这在您的示例中有效,但如果您有几行正确的行,然后是错误的行,则不会。
使用布尔值 (TRUE/FALSE) 有时效果很好,但在这种情况下,我认为您希望使用整数索引,因为这样更容易引用 "the previous line"。这给了我代码:
rmlines <- which(!substr(df$ID,1,2)=="00")
df$Description[rmlines-1] <- paste(df$Description[rmlines-1], df[rmlines,1], sep=" ")
df[rmlines-1, 4:8] <- df[rmlines, 2:6]
df <- df[-rmlines,]
但是还有一个问题需要考虑:class您的专栏是什么?
当我尝试时,我将所有内容都视为一个字符,这意味着您可以很好地移动列。在您的数据中,有些可能是因素或其他因素,因此您可能想要更改 classes。我认为最简单的方法是先将其全部更改为字符,然后再将其(返回)更改为您希望列的最终 class。
# To change everything to character:
df <- as.data.frame(lapply(df, as.character), stringsAsFactors = FALSE)
# And to assign the right classes, you need to decide case-by-case:
df$Year <- as.integer(df$Year)
df$Factor_1 <- as.factor(df$Factor1) # Optionally provide levels
我想获取 ID 列不以 00 开头的数据,并将 ID 列的此值附加到上一行中描述列的末尾。
然后将剩余的值替换到上一行的Name列之后。我怎样才能用 R 做到这一点?
这是虚拟数据的来源:https://docs.google.com/spreadsheets/d/1SbmaM8hXck-z5nsNfDMbhwijvAGPkPPBgQ_eY4JAMC8/edit?usp=sharing
ID Year Description Name User Factor_1 Factor_2 Factor_3
0011 2016 blue colour AA James Xfac NA NA
is nice XXX XLM Yfac different Yfac NA NA
0024 2017 red colour DD Mark Zfac NA NA
is good YYY STM Lfac unique Zfac NA NA
我想要的:
ID Year Description Name User Factor_1 Factor_2 Factor_3
0011 2016 blue colour is nice XXX XLM Yfac different Yfac
0024 2017 red colour is good YYY STM Lfac unique Zfac
使用 -
bools <- !substr(df$ID,1,2)=="00"
values <- df[bools,1]
df <- df[!bools,]
df$Description <- paste(df[substr(df$ID,1,2)=="00","Description"],values,sep=" ")
df
输出
ID Year Description Name User Factor_1 Factor_2
1 0011 2016 blue colour is nice AA James Xfac NA
3 0024 2017 red colour is good DD Mark Zfac NA
Factor_3
1 NA
3 NA
这是 dplyr
的解决方案:
library(dplyr)
df %>%
bind_cols(df %>% rename_all(function(x) paste0(x, "_dummy"))) %>%
mutate(
Description = ifelse(substr(lead(ID), 1, 2) != "00",
paste(Description, lead(ID)), Description),
Name = lead(Year_dummy),
User = lead(Description_dummy),
Factor_1 = lead(Name_dummy),
Factor_2 = lead(User_dummy),
Factor_3 = lead(Factor_1_dummy)
) %>% select(-ends_with("dummy")) %>%
filter(substr(ID, 1, 2) == "00")
输出:
ID Year Description Name User Factor_1 Factor_2 Factor_3
1 0011 2016 blue colour is nice XXX XLM Yfac different Yfac
2 0024 2017 red colour is good YYY STM Lfac unique Zfac
如果您要处理大量列,dplyr
和 base
R 的组合可以做到:
library(dplyr)
df_combo <- cbind(df, df)
df$Description <- ifelse(substr(lead(df$ID), 1, 2) != "00",
paste(df$Description, lead(df$ID)), df$Description)
for (i in (ncol(df) + 4):ncol(df_combo)) {
df_combo[[i]] <- lead(df_combo[[i - ncol(df) - 2]])
}
df_combo <- subset(df_combo, substr(ID, 1, 2) == "00")
df_descr <- subset(df, substr(ID, 1, 2) == "00")
df_final <- df_combo[, (ncol(df) + 1):ncol(df_combo)]
df_final$Description <- df_descr$Description
rm(df_descr, df_combo)
输出:
ID Year Description Name User Factor_1 Factor_2 Factor_3
1: 0011 2016 blue colour is nice XXX XLM Yfac different Yfac
2: 0024 2017 red colour is good YYY STM Lfac unique Zfac
第一部分是您要将描述粘贴在一起的部分,
还有您想要移动变量的部分,如您所愿 "XXX" 和 "YYY" 在您的 "user" 列中。
此外,在 Viveks 的回答中,所有错误的行都粘贴了所有 "right" 行,这在您的示例中有效,但如果您有几行正确的行,然后是错误的行,则不会。 使用布尔值 (TRUE/FALSE) 有时效果很好,但在这种情况下,我认为您希望使用整数索引,因为这样更容易引用 "the previous line"。这给了我代码:
rmlines <- which(!substr(df$ID,1,2)=="00")
df$Description[rmlines-1] <- paste(df$Description[rmlines-1], df[rmlines,1], sep=" ")
df[rmlines-1, 4:8] <- df[rmlines, 2:6]
df <- df[-rmlines,]
但是还有一个问题需要考虑:class您的专栏是什么?
当我尝试时,我将所有内容都视为一个字符,这意味着您可以很好地移动列。在您的数据中,有些可能是因素或其他因素,因此您可能想要更改 classes。我认为最简单的方法是先将其全部更改为字符,然后再将其(返回)更改为您希望列的最终 class。
# To change everything to character:
df <- as.data.frame(lapply(df, as.character), stringsAsFactors = FALSE)
# And to assign the right classes, you need to decide case-by-case:
df$Year <- as.integer(df$Year)
df$Factor_1 <- as.factor(df$Factor1) # Optionally provide levels