R - 用任何其他列的值填充列
R - Fill Column with values from any other columns
我有一个包含 5 列的数据框:4 列有值,1 列为空。我想用 4 列中任意一列的值填充空列。
假设这是我的数据框 df
:
Col1 Col2 Col3 Col4 Col5
11 11
2 2 2
23
4 4
15 15
我希望我的结果如下所示:
Col1 Col2 Col3 Col4 Col5
11 11 11
2 2 2 2
23 23
4 4 4
15 15 15
EDIT 我应用了每个人提供的答案,但由于某种原因仍然无法正常工作。如果有帮助,这是我实际数据的 dput(head(df))
:
structure(list(Treat_One = c(" ", "5 2012", "4 2008", "4 2010",
" ", "2 2008"), Treat_Two = c("8 2010", "5 2012", "4 2008",
"4 2010", "8 2011", "2 2008"), Treat_Three = c(" ", "5 2012",
"4 2008", "4 2010", "8 2011", "2 2008"), Treat_Four = c(" ",
" ", " ", " ", " ", " ")), .Names = c("Treat_One",
"Treat_Two", "Treat_Three", "Treat_Four"), row.names = c(NA,
6L), class = "data.frame")
编辑包括str(df)
'data.frame': 209 obs. of 4 variables:
$ Treat_One : chr " " "5 2012" "4 2008" "4 2010" ...
$ Treat_Two : chr "8 2010" "5 2012" "4 2008" "4 2010" ...
$ Treat_Three: chr " " "5 2012" "4 2008" "4 2010" ...
$ Treat_Four : chr " " " " " " " " ...
您可以简单地输入以下内容:
df$Col5 <- 1:5
df$Col5
将在 df
中创建一个 Col5
,而 1:5
只需在其中添加序列号。
根据 OP 提供的新数据,我们可以使用 trimws
删除 leading/trailing 个空格
df$Treat_Four <- apply(df, 1, function(x) sample(x[trimws(x) != ""], 1))
df
# Treat_One Treat_Two Treat_Three Treat_Four
#1 8 2010 8 2010
#2 5 2012 5 2012 5 2012 5 2012
#3 4 2008 4 2008 4 2008 4 2008
#4 4 2010 4 2010 4 2010 4 2010
#5 8 2011 8 2011 8 2011
#6 2 2008 2 2008 2 2008 2 2008
原答案
我们可以用apply
row-wise取1sample
不等于空串的元素
df$Col5 <- apply(df, 1, function(x) sample(x[x != ""], 1))
df
# Col1 Col2 Col3 Col4 Col5
#1 1 1 1
#2 2 2 2 2
#3 3 3
#4 4 4 4
#5 5 5 5
如果有 NA
个值而不是空白,我们可以使用相同的逻辑
apply(df, 1, function(x) sample(x[!is.na(x)], 1))
试试这个:
df <- data.frame(col1 = c(1, NA, 3), col2 = c(1, 2, NA), col3 = c(NA, 2, 3),col4 = rep(NA, 3))
for (i in 1:nrow(df)) {
df[i, 4] <- df[i, which(!is.na(df[i,]))][, 1]
}
df
这将产生:
> df <- data.frame(col1 = c(1, NA, 3), col2 = c(1, 2, NA), col3 = c(NA, 2, 3), col4 = rep(NA, 3))
> df
col1 col2 col3 col4
1 1 1 NA NA
2 NA 2 2 NA
3 3 NA 3 NA
> for (i in 1:nrow(df)) {
+ df[i, 4] <- df[i, which(!is.na(df[i,]))][, 1]
+ }
+ df
+
col1 col2 col3 col4
1 1 1 NA 1
2 NA 2 2 2
3 3 NA 3 3
这是一个矢量化选项 max.col
df$Treat_Four <- df[1:3][cbind(1:nrow(df), max.col(sapply(df[1:3], trimws)!='', "first"))]
df
# Treat_One Treat_Two Treat_Three Treat_Four
#1 8 2010 8 2010
#2 5 2012 5 2012 5 2012 5 2012
#3 4 2008 4 2008 4 2008 4 2008
#4 4 2010 4 2010 4 2010 4 2010
#5 8 2011 8 2011 8 2011
#6 2 2008 2 2008 2 2008 2 2008
我有一个包含 5 列的数据框:4 列有值,1 列为空。我想用 4 列中任意一列的值填充空列。
假设这是我的数据框 df
:
Col1 Col2 Col3 Col4 Col5
11 11
2 2 2
23
4 4
15 15
我希望我的结果如下所示:
Col1 Col2 Col3 Col4 Col5
11 11 11
2 2 2 2
23 23
4 4 4
15 15 15
EDIT 我应用了每个人提供的答案,但由于某种原因仍然无法正常工作。如果有帮助,这是我实际数据的 dput(head(df))
:
structure(list(Treat_One = c(" ", "5 2012", "4 2008", "4 2010",
" ", "2 2008"), Treat_Two = c("8 2010", "5 2012", "4 2008",
"4 2010", "8 2011", "2 2008"), Treat_Three = c(" ", "5 2012",
"4 2008", "4 2010", "8 2011", "2 2008"), Treat_Four = c(" ",
" ", " ", " ", " ", " ")), .Names = c("Treat_One",
"Treat_Two", "Treat_Three", "Treat_Four"), row.names = c(NA,
6L), class = "data.frame")
编辑包括str(df)
'data.frame': 209 obs. of 4 variables:
$ Treat_One : chr " " "5 2012" "4 2008" "4 2010" ...
$ Treat_Two : chr "8 2010" "5 2012" "4 2008" "4 2010" ...
$ Treat_Three: chr " " "5 2012" "4 2008" "4 2010" ...
$ Treat_Four : chr " " " " " " " " ...
您可以简单地输入以下内容:
df$Col5 <- 1:5
df$Col5
将在 df
中创建一个 Col5
,而 1:5
只需在其中添加序列号。
根据 OP 提供的新数据,我们可以使用 trimws
df$Treat_Four <- apply(df, 1, function(x) sample(x[trimws(x) != ""], 1))
df
# Treat_One Treat_Two Treat_Three Treat_Four
#1 8 2010 8 2010
#2 5 2012 5 2012 5 2012 5 2012
#3 4 2008 4 2008 4 2008 4 2008
#4 4 2010 4 2010 4 2010 4 2010
#5 8 2011 8 2011 8 2011
#6 2 2008 2 2008 2 2008 2 2008
原答案
我们可以用apply
row-wise取1sample
不等于空串的元素
df$Col5 <- apply(df, 1, function(x) sample(x[x != ""], 1))
df
# Col1 Col2 Col3 Col4 Col5
#1 1 1 1
#2 2 2 2 2
#3 3 3
#4 4 4 4
#5 5 5 5
如果有 NA
个值而不是空白,我们可以使用相同的逻辑
apply(df, 1, function(x) sample(x[!is.na(x)], 1))
试试这个:
df <- data.frame(col1 = c(1, NA, 3), col2 = c(1, 2, NA), col3 = c(NA, 2, 3),col4 = rep(NA, 3))
for (i in 1:nrow(df)) {
df[i, 4] <- df[i, which(!is.na(df[i,]))][, 1]
}
df
这将产生:
> df <- data.frame(col1 = c(1, NA, 3), col2 = c(1, 2, NA), col3 = c(NA, 2, 3), col4 = rep(NA, 3))
> df
col1 col2 col3 col4
1 1 1 NA NA
2 NA 2 2 NA
3 3 NA 3 NA
> for (i in 1:nrow(df)) {
+ df[i, 4] <- df[i, which(!is.na(df[i,]))][, 1]
+ }
+ df
+
col1 col2 col3 col4
1 1 1 NA 1
2 NA 2 2 2
3 3 NA 3 3
这是一个矢量化选项 max.col
df$Treat_Four <- df[1:3][cbind(1:nrow(df), max.col(sapply(df[1:3], trimws)!='', "first"))]
df
# Treat_One Treat_Two Treat_Three Treat_Four
#1 8 2010 8 2010
#2 5 2012 5 2012 5 2012 5 2012
#3 4 2008 4 2008 4 2008 4 2008
#4 4 2010 4 2010 4 2010 4 2010
#5 8 2011 8 2011 8 2011
#6 2 2008 2 2008 2 2008 2 2008