根据列值应用函数

Question

如果要表示的元素满足给定条件（年龄 <18 岁），我想首先表示给定数据框的列。为了清楚起见，我想要做的是获得一个列，该列是 18 岁以下儿童的平均年龄，否则报告 0。我尝试了以下代码：

df <- dataframe (A1 = c("58"; "51", "5", "88", "16", "24"), 
                  A2 = c ("75", "57", "44", "2", "81", "4"),
                  A3 = c ("1" ,"51", "65", "54", "88", "12"),
                  A4 = c ("24" ,"8", "81", "32", "5", "86"),
                  D1 = c("1", "0", "0", "1", "0", "0"),
                  D2 = c ("0", "0", "0", "1", "1", "0"),
                  D3 = c ("1", "0", "1", "1", "0", "0"),
                  D4 = c ("1", "1", "1", "0", "0", "0"))

df$X_mean <- apply(df[, c ("A1", "A2", "A3", "A4")],
                                   1, function(x) mean (which(x<18)))

我也试过了:

my.fun<-function(x,y){
  if(x<18){
    mean}
}

df$X_mean<-apply(df,MAR=1,FUN=my.fun,x=df[, c ("A1", "A2", "A3", "A4")] )

或者，

df[, c ("A1", "A2", "A3", "A4")] %>%  mutate_if(x<18, mean)

所有这些行都不起作用。

此外，我想创建 4 列 (Cond_Di)，具体取决于特定列 Ai 和 Di（i=1 到 4）

如果 Ai < 18 且 Di == 1，则 Cond_Di = 1 否则为 0。并为所有 Ai 和相应的 Di 生成它。也就是说：A1 和 D1 ==> Cond_D1，A2 和 D2 ==> Cond_D2 等

总而言之，我想创建一个列，它是 18 岁以下值的平均值，另外四列是 1 = 18 岁以下个人的疾病，O = 否则所以输出将是：

> df
  A1 A2 A3 A4 D1 D2 D3 D4 mean_under_18 cond_D1 cond_D2 cond_D3 cond_D4
1 58 75  1 24  1  0  1  1           1.0       0       0       1       0
2 51 12 51  8  0  0  0  1          10.0       0       0       0       1
3  5 44 65 81  0  0  1  1           5.0       0       0       0       0
4 88  2 15 32  1  1  1  0          13.5       0       1       1       0
5 16 81 88  5  0  1  0  0          10.5       0       0       0       0
6 24  4 12 86  0  1  0  0           8.0       0       1       0       0

Answer 1

不确定这是否是您想要的，但我认为它可以实现。关于你的后续问题；您可以生成一个您希望看到的示例输出吗？

rm(list = ls())
df <- data.frame (A1 = c (58, 51, 5,  88, 16, 24), 
                  A2 = c (75, 57, 44, 2 , 81, 4),
                  A3 = c (1 , 51, 65, 54, 88, 12),
                  A4 = c (24, 8 , 81, 32, 5 , 86),
                  D1 = c (1 , 0 , 0 , 1 , 0 , 0),
                  D2 = c (0 , 0 , 0 , 1 , 1 , 0),
                  D3 = c (1 , 0 , 1 , 1 , 0 , 0),
                  D4 = c (1 , 1 , 1 , 0 , 0 , 0))

returnval <- numeric(ncol(df))
count <- 1
for (i in colnames(df)){
  for (j in 1:length(df[[i]])){
    if (df[[i]][j] < 18){
      returnval[count] <- mean(df[[i]])
      count = count + 1
      break
    }
  }
}
returnval

Answer 2

您可以使用下面的代码（请注意，您定义的第一个data.frame与您指定的结果不同，我在下面的代码中使用了第一个）

df<- data.frame(A1 = c(58, 51, 5, 88, 16, 24), 
                A2 = c(75, 57, 44, 2, 81, 4),
                A3 = c(1, 51, 65, 54, 88, 12),
                A4 = c(24, 8, 81, 32, 5, 86),
                D1 = c(1, 0, 0, 1, 0, 0),
                D2 = c(0, 0, 0, 1, 1, 0),
                D3 = c(1, 0, 1, 1, 0, 0),
                D4 = c(1, 1, 1, 0, 0, 0))

df$mean_under_18 <- apply(df[, paste0("A", 1:4)], 1, function(x) {
  under_18 <- x < 18
  ifelse(any(under_18), mean(x[under_18]), 0)
})

for (i in 1:4) {
  # Get cols A and D
  col_A <- df[, paste0("A", i)]
  col_D <- df[, paste0("D", i)]
  
  # Mask for under 18 observations
  under_18 <- col_A < 18
  
  # Create cond_D column
  df[[paste0("cond_D", i)]] <- as.integer(under_18 & col_D)
}

结果data.frame：

> df
  A1 A2 A3 A4 D1 D2 D3 D4 mean_under_18 cond_D1 cond_D2 cond_D3 cond_D4
1 58 75  1 24  1  0  1  1           1.0       0       0       1       0
2 51 57 51  8  0  0  0  1           8.0       0       0       0       1
3  5 44 65 81  0  0  1  1           5.0       0       0       0       0
4 88  2 54 32  1  1  1  0           2.0       0       1       0       0
5 16 81 88  5  0  1  0  0          10.5       0       0       0       0
6 24  4 12 86  0  0  0  0           8.0       0       0       0       0

根据列值应用函数

apply function depending on columns value

if-statement

r

apply