循环遍历数据帧的连续列

Looping through consecutive columns of a dataframe

我正在尝试遍历列的数据框,并将计算结果存入矩阵。

可以使用以下示例数据复制该场景:

df = data.frame(replicate(10,sample(0:20,10,rep=TRUE))) # the columns to be calculated on

M1 = as.data.frame(matrix(0, nrow = 10, ncol = 10)) # a matrix to hold the results.
rownames(M1) = colnames(df)
colnames(M1) = colnames(df)

并出现如下:

> df # Frame with columns of data, X1 to X10

   X1 X2 X3 X4 X5 X6 X7 X8 X9 X10
1   1 19  2  6  6  5  0  2  5  10
2  16  7 14 16 16 18 11  2 18  11
3   7  6 11  4  4  1 18 11 10  16
4  20  2  4 20  4  6 10  5 16   7
5   9  8 16 19 11  2 14  7 13   7
6   5 16  6  8 20 15  5 11  4   0
7  11 16 12  8 18 20 20 20 10  14
8  17 14 10  4  3 10 13 11  5   1
9   9 20 10  5  1  7 12 10  5   6
10  8 14  3 14 20 10 17 20  9  14

> M1 # Output frame to hold results

    X1 X2 X3 X4 X5 X6 X7 X8 X9 X10
X1   0  0  0  0  0  0  0  0  0   0
X2   0  0  0  0  0  0  0  0  0   0
X3   0  0  0  0  0  0  0  0  0   0
X4   0  0  0  0  0  0  0  0  0   0
X5   0  0  0  0  0  0  0  0  0   0
X6   0  0  0  0  0  0  0  0  0   0
X7   0  0  0  0  0  0  0  0  0   0
X8   0  0  0  0  0  0  0  0  0   0
X9   0  0  0  0  0  0  0  0  0   0
X10  0  0  0  0  0  0  0  0  0   0

df列中,X1和X2被输入计算,然后是X1和X3,然后是X1和X4,等等。然后循环将循环X2和X3,然后是X2和X4,等等。

nm 被输入到 calculation/loop 中,结果应该放在适当的位置对应于列 n x m 的矩阵。计算本身简单地将 Xn 和 Xm 之间的区域确定为绘制的线。我不确定如何正确构造循环来执行此操作:

  # The first iteration of the calculation, column X1 and X2 (X1 and X1 would = 0)

  y = seq(1,10,1)
  f1 = approxfun(y, df[,1] - df[,2]) # takes two columns as inputs
  f2 = function(x) abs(f1(x))

  area1 = integrate(f2, 1, 10, subdivisions = 500)
  M1[2,1] = area1$value

结果帧将产生 "half-matrix"(这就是所需要的,因为镜像的一半是相同的):

> M1
    X1 X2 X3 X4 X5 X6 X7 X8 X9 X10
X1   0  0  0  0  0  0  0  0  0   0
X2   A  0  0  0  0  0  0  0  0   0
X3   A  A  0  0  0  0  0  0  0   0
X4   A  A  A  0  0  0  0  0  0   0
X5   A  A  A  A  0  0  0  0  0   0
X6   A  A  A  A  A  0  0  0  0   0
X7   A  A  A  A  A  A  0  0  0   0
X8   A  A  A  A  A  A  A  0  0   0
X9   A  A  A  A  A  A  A  A  0   0
X10  A  A  A  A  A  A  A  A  A   0

我开始构建一个 for 循环,但我在使用 i 和 j 来保持 X1 直到它循环通过 X2-X10,然后继续到 X2 等时遇到了麻烦

谢谢!

我无法让你的功能正常工作。所以使用随机替换函数,这个循环对我有用:

area=list()  # because the actual function doesn't work
for(i in 1:ncol(df)){
  for(j in 1:ncol(df)){
    if(i==j){M[i,i]=0;next}
    selection=df[,c(i,j)]
    #area=integrate(f2, 1, 200, subdivisions = 500)
    area$value=mean(colSums(selection)) # something random to check
    M[i,j]=area$value
    M[j,i]=area$value
  }
}

但是循环通常不是最有效的处理方式。因此,您可能更喜欢这个选项:

df = data.frame(replicate(10,sample(0:20,10,rep=TRUE))) # the columns to be calculated on
my.f = function(x) abs(x[,1]-x[,2])

#y = t(as.matrix(combn(ncol(df), 2L, function(y) integrate(my.f(df[y]), 1, 200, subdivisions = 500),simplify=F))) # This doesn't work, but should be close to what you want to do

y = t(as.matrix(combn(ncol(df), 2L, function(y) mean(f(df[y]),simplify=F)))) # this works, but is just an example

N = seq_len(ncol(y))
nams = colnames(df)
M = matrix(ncol = length(nams), nrow = length(nams))
M[lower.tri(M)] = y
M = t(out)
M[lower.tri(M)] = y
M = t(M)
diag(M) = 0
rownames(M) = colnames(out) = colnames(df)
M

    X1  X2   X3  X4  X5  X6  X7  X8   X9 X10
X1  0.0 8.6  6.4 8.8 7.1 6.6 7.0 4.0  7.0 3.7
X2  8.6 0.0  5.0 4.4 5.5 5.4 4.4 9.2  8.0 7.7
X3  6.4 5.0  0.0 7.2 5.9 5.8 7.6 7.0 10.4 6.5
X4  8.8 4.4  7.2 0.0 5.9 4.4 5.4 9.6  8.4 7.3
X5  7.1 5.5  5.9 5.9 0.0 7.3 5.3 9.1  8.5 8.0
X6  6.6 5.4  5.8 4.4 7.3 0.0 6.0 8.4  5.6 3.7
X7  7.0 4.4  7.6 5.4 5.3 6.0 0.0 8.8  4.4 5.7
X8  4.0 9.2  7.0 9.6 9.1 8.4 8.8 0.0  9.6 6.9
X9  7.0 8.0 10.4 8.4 8.5 5.6 4.4 9.6  0.0 5.5
X10 3.7 7.7  6.5 7.3 8.0 3.7 5.7 6.9  5.5 0.0