R 中数据帧之间的对称百分比变化
Symmetric percent change between data frames in R
我有两个数据框。像这样计算从 t1 到 t2 的百分比变化很容易:
t1 <- data.frame("gene1" = c(1,5,10), "gene2" = c(1,1,1), "gene3" = c(5,5,20))
row.names(t1) <- c("patient1", "patient2", "patient3")
t2 <- data.frame("gene1" = c(0.5,5,20), "gene2" = c(2,4,8), "gene3" = c(2.5,20,5))
row.names(t2) <- c("patient1", "patient2", "patient3")
t3 <- (t2-t1)/t1 *100
t3
#> gene1 gene2 gene3
#> patient1 -50 100 -50
#> patient2 0 300 300
#> patient3 100 700 -75
但是如果我想做 对称百分比变化 使得从 20 到 5 的值变化不会是 -75,而是 -300。我试过这个:
t3 <- ifelse(t2 > t1, ((t2-t1)/t1) * 100, ((t2-t1)/t2) * 100)
但这给了我一些奇怪的 3x9 列表。
原则上使用 ifelse 应该可以。如果我降低复杂性那么它就可以正常工作
t3 <- ifelse(t2 > t1, "a", "b")
t3
#> gene1 gene2 gene3
#> patient1 b a b
#> patient2 b a a
#> patient3 a a b
理想情况下我的输出是:
t3
#> gene1 gene2 gene3
#> patient1 -100 100 -100
#> patient2 0 300 300
#> patient3 100 700 -300
这个怎么样?
# recreate your data
t1 <- data.frame("gene1" = c(1,5,10), "gene2" = c(1,1,1), "gene3" = c(5,5,20))
row.names(t1) <- c("patient1", "patient2", "patient3")
t2 <- data.frame("gene1" = c(0.5,5,20), "gene2" = c(2,4,8), "gene3" = c(2.5,20,5))
row.names(t2) <- c("patient1", "patient2", "patient3")
t1
#> gene1 gene2 gene3
#> patient1 1 1 5
#> patient2 5 1 5
#> patient3 10 1 20
t2
#> gene1 gene2 gene3
#> patient1 0.5 2 2.5
#> patient2 5.0 4 20.0
#> patient3 20.0 8 5.0
# iterate over each column and compute the ifelse...
res <- lapply(seq_len(ncol(t1)), function(i) {
x <- t2[, i]
y <- t1[, i]
diff <- x - y
ifelse(x > y, diff / y, diff / x) * 100
})
# convert to data.frame and reset the names and rownames
res <- as.data.frame(res)
rownames(res) <- rownames(t1)
names(res) <- names(t1)
res
#> gene1 gene2 gene3
#> patient1 -100 100 -100
#> patient2 0 300 300
#> patient3 100 700 -300
由 reprex package (v0.3.0)
于 2020-10-14 创建
编辑
更好,可能更快:
t3 <- (t2 - t1) / pmin(t1, t2) * 100
t3
#> gene1 gene2 gene3
#> patient1 -100 100 -100
#> patient2 0 300 300
#> patient3 100 700 -300
注意 pmin
,类似于 ifelse
将 min
函数元素明智地应用于其输入元素的每次迭代,因此 pmin(t1, t2)
returns每个位置的 data.frame 个最小值,为我们节省了 ifelse 语句。
我有两个数据框。像这样计算从 t1 到 t2 的百分比变化很容易:
t1 <- data.frame("gene1" = c(1,5,10), "gene2" = c(1,1,1), "gene3" = c(5,5,20))
row.names(t1) <- c("patient1", "patient2", "patient3")
t2 <- data.frame("gene1" = c(0.5,5,20), "gene2" = c(2,4,8), "gene3" = c(2.5,20,5))
row.names(t2) <- c("patient1", "patient2", "patient3")
t3 <- (t2-t1)/t1 *100
t3
#> gene1 gene2 gene3
#> patient1 -50 100 -50
#> patient2 0 300 300
#> patient3 100 700 -75
但是如果我想做 对称百分比变化 使得从 20 到 5 的值变化不会是 -75,而是 -300。我试过这个:
t3 <- ifelse(t2 > t1, ((t2-t1)/t1) * 100, ((t2-t1)/t2) * 100)
但这给了我一些奇怪的 3x9 列表。
原则上使用 ifelse 应该可以。如果我降低复杂性那么它就可以正常工作
t3 <- ifelse(t2 > t1, "a", "b")
t3
#> gene1 gene2 gene3
#> patient1 b a b
#> patient2 b a a
#> patient3 a a b
理想情况下我的输出是:
t3
#> gene1 gene2 gene3
#> patient1 -100 100 -100
#> patient2 0 300 300
#> patient3 100 700 -300
这个怎么样?
# recreate your data
t1 <- data.frame("gene1" = c(1,5,10), "gene2" = c(1,1,1), "gene3" = c(5,5,20))
row.names(t1) <- c("patient1", "patient2", "patient3")
t2 <- data.frame("gene1" = c(0.5,5,20), "gene2" = c(2,4,8), "gene3" = c(2.5,20,5))
row.names(t2) <- c("patient1", "patient2", "patient3")
t1
#> gene1 gene2 gene3
#> patient1 1 1 5
#> patient2 5 1 5
#> patient3 10 1 20
t2
#> gene1 gene2 gene3
#> patient1 0.5 2 2.5
#> patient2 5.0 4 20.0
#> patient3 20.0 8 5.0
# iterate over each column and compute the ifelse...
res <- lapply(seq_len(ncol(t1)), function(i) {
x <- t2[, i]
y <- t1[, i]
diff <- x - y
ifelse(x > y, diff / y, diff / x) * 100
})
# convert to data.frame and reset the names and rownames
res <- as.data.frame(res)
rownames(res) <- rownames(t1)
names(res) <- names(t1)
res
#> gene1 gene2 gene3
#> patient1 -100 100 -100
#> patient2 0 300 300
#> patient3 100 700 -300
由 reprex package (v0.3.0)
于 2020-10-14 创建编辑
更好,可能更快:
t3 <- (t2 - t1) / pmin(t1, t2) * 100
t3
#> gene1 gene2 gene3
#> patient1 -100 100 -100
#> patient2 0 300 300
#> patient3 100 700 -300
注意 pmin
,类似于 ifelse
将 min
函数元素明智地应用于其输入元素的每次迭代,因此 pmin(t1, t2)
returns每个位置的 data.frame 个最小值,为我们节省了 ifelse 语句。