计算两个数据帧之间的百分比
Calculate percentage between two datafarmes
第一个数据帧 df1
是,
df1 = data.frame('gen' = c('a', 'b', 'c', 'd'), 'mm' = c(10, 20, 30, 40), 'nn' = c(50,60,70,80))
gen mm nn
1 a 10 50
2 b 20 60
3 c 30 70
4 d 40 80
第二个数据帧df2
是,
df2 = data.frame('gen' = c('x', 'y'), 'mm' = c(10,20), 'nn' = c(20,30))
gen mm nn
1 x 10 20
2 y 20 30
我想计算 df1
在所有 df2
值中的百分比。
异常输出,
gen x.1 y.1 x.2 y.2
<chr> <dbl> <dbl> <dbl> <dbl>
1 a 0 -50 150 66.67
2 b 100 0 200 100.00
3 c 200 50 250 133.33
4 d 300 100 300 167.67
例如,
通式
(df1-df2)/df2*100
考虑a
(10-10)/10*100 = 0 (x.1
)
(10-20)/20*100 = -50 (y.1
)
(50-20)/20*100 = 150 (x.2
)
(50-30)/30*100 = 66.67 (y.2
)
等等...
谢谢...
你可以使用
library(dplyr)
library(tidyr)
df %>%
pivot_longer(-gen) %>%
left_join(df2 %>% pivot_longer(-gen), by = "name") %>%
mutate(value.y = (value.x - value.y) / value.y * 100, .keep = "unused") %>%
pivot_wider(names_from = c("gen.y", "name"), values_from = "value.y") %>%
rename(gen = gen.x, x.1 = x_mm, y.1 = y_mm, x.2 = x_nn, y.2 = y_nn)
这个returns
# A tibble: 4 x 5
gen x.1 y.1 x.2 y.2
<chr> <dbl> <dbl> <dbl> <dbl>
1 a 0 -50 150 66.7
2 b 100 0 200 100
3 c 200 50 250 133.
4 d 300 100 300 167.
这是一个data.table
方法
library(data.table)
# Convert df1 and df2 to data.table format
setDT(df)
setDT(df2, keep.rownames = c("id"))
# Melt df1 and df2 to long format
df.melt <- melt(df, id.vars = "gen", variable.factor = FALSE)
df2.melt <- melt(df2, id.vars = c("id", "gen"), variable.factor = FALSE)
# Perform left join
ans <- df2.melt[df.melt, on = .(variable), allow.cartesian = TRUE]
# Create new colnames
ans[, id2 := rowid(i.gen, gen)]
ans[, name := paste(gen, id2, sep = ".")]
# Perform calulation
ans[, new.value := 100 * (i.value - value) / value]
# Cast to wide format
dcast(ans, i.gen ~ name, value.var = "new.value")
# i.gen x.1 x.2 y.1 y.2
# 1: a 0 150 -50 66.66667
# 2: b 100 200 0 100.00000
# 3: c 200 250 50 133.33333
# 4: d 300 300 100 166.66667
第一个数据帧 df1
是,
df1 = data.frame('gen' = c('a', 'b', 'c', 'd'), 'mm' = c(10, 20, 30, 40), 'nn' = c(50,60,70,80))
gen mm nn
1 a 10 50
2 b 20 60
3 c 30 70
4 d 40 80
第二个数据帧df2
是,
df2 = data.frame('gen' = c('x', 'y'), 'mm' = c(10,20), 'nn' = c(20,30))
gen mm nn
1 x 10 20
2 y 20 30
我想计算 df1
在所有 df2
值中的百分比。
异常输出,
gen x.1 y.1 x.2 y.2
<chr> <dbl> <dbl> <dbl> <dbl>
1 a 0 -50 150 66.67
2 b 100 0 200 100.00
3 c 200 50 250 133.33
4 d 300 100 300 167.67
例如,
通式
(df1-df2)/df2*100
考虑a
(10-10)/10*100 = 0 (x.1
)
(10-20)/20*100 = -50 (y.1
)
(50-20)/20*100 = 150 (x.2
)
(50-30)/30*100 = 66.67 (y.2
)
等等...
谢谢...
你可以使用
library(dplyr)
library(tidyr)
df %>%
pivot_longer(-gen) %>%
left_join(df2 %>% pivot_longer(-gen), by = "name") %>%
mutate(value.y = (value.x - value.y) / value.y * 100, .keep = "unused") %>%
pivot_wider(names_from = c("gen.y", "name"), values_from = "value.y") %>%
rename(gen = gen.x, x.1 = x_mm, y.1 = y_mm, x.2 = x_nn, y.2 = y_nn)
这个returns
# A tibble: 4 x 5
gen x.1 y.1 x.2 y.2
<chr> <dbl> <dbl> <dbl> <dbl>
1 a 0 -50 150 66.7
2 b 100 0 200 100
3 c 200 50 250 133.
4 d 300 100 300 167.
这是一个data.table
方法
library(data.table)
# Convert df1 and df2 to data.table format
setDT(df)
setDT(df2, keep.rownames = c("id"))
# Melt df1 and df2 to long format
df.melt <- melt(df, id.vars = "gen", variable.factor = FALSE)
df2.melt <- melt(df2, id.vars = c("id", "gen"), variable.factor = FALSE)
# Perform left join
ans <- df2.melt[df.melt, on = .(variable), allow.cartesian = TRUE]
# Create new colnames
ans[, id2 := rowid(i.gen, gen)]
ans[, name := paste(gen, id2, sep = ".")]
# Perform calulation
ans[, new.value := 100 * (i.value - value) / value]
# Cast to wide format
dcast(ans, i.gen ~ name, value.var = "new.value")
# i.gen x.1 x.2 y.1 y.2
# 1: a 0 150 -50 66.66667
# 2: b 100 200 0 100.00000
# 3: c 200 250 50 133.33333
# 4: d 300 300 100 166.66667