"recursive" 自己加入 data.table
"recursive" self join in data.table
我有一个包含 3 列的组件列表:产品、组件和使用的组件数量:
a <- structure(list(prodName = c("prod1", "prod1", "prod2", "prod3",
"prod3", "int1", "int1", "int2", "int2"), component = c("a",
"int1", "b", "b", "int2", "a", "b", "int1", "d"), qty = c(1L,
2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L)), row.names = c(NA, -9L), class = c("data.table",
"data.frame"))
prodName component qty
1 prod1 a 1
2 prod1 int1 2
3 prod2 b 3
4 prod3 b 4
5 prod3 int2 5
6 int1 a 6
7 int1 b 7
8 int2 int1 8
9 int2 d 9
以prod
开头的产品为成品,以int
开头的为中间产品,带字母的为原材料。
我需要最终产品的完整成分列表,只有原材料作为成分。也就是我要将任意int
转化为原材料
- 中间产品可以由原材料和另一种中间产品组成,因此我提到"recursive"。
- 无法预先知道中间产品的嵌套/递归层数(本例为2层,实际数据超过6层)
对于这个例子,我的预期结果是(我明确说明了结果数的计算):
prodName |component |qty
prod1 |a |1+2*6 = 13
prod1 |b |0+2*7 = 14
prod2 |b |3
prod3 |b |4+5*8*7 = 284
prod3 |a |0+5*8*6 = 240
prod3 |d |0+5*9 = 45
我做了什么:
我通过使用 merge
创建一个非常繁琐的连接序列来解决这个问题。虽然这种方法适用于玩具数据,但我不太可能将其应用于真实数据。
#load data.table
library(data.table)
# split the tables between products and different levels of intermediate
a1 <- a[prodName %like% "prod",]
b1 <- a[prodName %like% "int1",]
c1 <- a[prodName %like% "int2",]
# convert int2 to raw materials
d1 <- merge(c1,
b1,
by.x = "component",
by.y = "prodName",
all.x = TRUE)[
is.na(component.y),
component.y := component][
is.na(qty.y),
qty.y := 1][,
.(prodName, qty = qty.x*qty.y),
by = .(component = component.y)]
# Since int1 is already exploded into raw materials, rbind both tables:
d1 <- rbind(d1, b1)
# convert all final products into raw materials, except that the raw mats that go directly into the product won't appear:
e1 <- merge(a1,
d1,
by.x = "component",
by.y = "prodName",
all.x = TRUE)
# rbind the last calculated raw mats (those coming from intermediate products) with those coming _directly_ into the final product:
result <- rbind(e1[!is.na(qty.y),
.(prodName, qty = qty.x * qty.y),
by = .(component = component.y)],
e1[is.na(qty.y),
.(prodName, component, qty = qty.x)])[,
.(qty = sum(qty)),
keyby = .(prodName, component)]
我知道我可以将数据分成表并执行连接,直到每个中间产品都表示为仅由原材料组成,但如上所述,由于数据的大小和中间产品的递归级别。
有没有更简单/更好的方法来进行这种递归连接?
这是我使用您的数据集的尝试。
它使用 while
循环检查以查看 components
是否也在 prodName
字段中。循环始终需要具有相同的字段,因此不是为递归乘法器添加一列(即最后的 5*8*7),而是集成迭代乘法器。即5*8*7最后变成5*56
library(data.table)
a[, qty_multiplier := 1]
b <- copy(a)
while (b[component %in% prodName, .N] > 0) {
b <- b[a
, on = .(prodName = component)
, .(prodName = i.prodName
, component = ifelse(is.na(x.component), i.component, x.component)
, qty = i.qty
, qty_multiplier = ifelse(is.na(x.qty), 1, x.qty * qty_multiplier)
)
]
}
b[prodName %like% 'prod', .(qty = sum(qty * qty_multiplier)), by = .(prodName, component)]
prodName component qty
1: prod1 a 13
2: prod1 b 14
3: prod2 b 3
4: prod3 b 284
5: prod3 a 240
6: prod3 d 45
我认为你最好用一组邻接矩阵来表示信息,这些邻接矩阵告诉你
"how much of this is made of that"。你需要4个矩阵,对应所有可能的
关系。
例如,您将最终产品和中间产品之间的关系放在一个有 3 行的矩阵中
和这样的 2 列:
QPI <- matrix(0,3,2)
row.names(QPI) <- c("p1","p2","p3")
colnames(QPI) <- c("i1","i2")
QPI["p1","i1"] <- 2
QPI["p3","i2"] <- 5
i1 i2
p1 2 0
p2 0 0
p3 0 5
这告诉您需要 2 单位的中间产品 i1 才能生产 1 单位的最终产品
p1.
类似地定义其他矩阵:
QPR <- matrix(0,3,3)
row.names(QPR) <- c("p1","p2","p3")
colnames(QPR) <- c("a","b","d")
QPR["p1","a"] <- 1
QPR["p2","b"] <- 3
QPR["p3","b"] <- 4
QIR <- matrix(0,2,3)
row.names(QIR) <- c("i1","i2")
colnames(QIR) <- c("a","b","d")
QIR["i1","a"] <- 6
QIR["i1","b"] <- 7
QIR["i2","d"] <- 9
QII <- matrix(0,2,2)
row.names(QII) <- colnames(QII) <- c("i1","i2")
例如,查看 QIR,我们发现需要 6 个单位的原始 material a 才能制成一个单位的中间产品 i1。
一旦你以这种方式获得它,你就可以总结从原始 material 到最终的所有可能方式
使用矩阵乘法的乘积。
你有 3 个条件:你可以直接从原始到最终 [QPR] QPR,或者从原始到中间
到最终 [QPI%*%QIR
] 或从原始到中间到其他中间到最终 [QPI%*%QII%*%QIR
]
你的结果最后用矩阵表示
result <- QPI%*%QIR + QPI%*%QII%*%QIR + QPR
我把所有的代码都放在了下面。如果你 运行 它你会看到结果是这样的:
a b d
p1 13 14 0
p2 0 3 0
p3 240 284 45
与
说的完全一样
prodName |component |qty
prod1 |a |1+2*6 = 13
prod1 |b |0+2*7 = 14
prod2 |b |3
prod3 |b |4+5*8*7 = 284
prod3 |a |0+5*8*6 = 240
prod3 |d |0+5*9 = 45
希望这对您有所帮助
QPI <- matrix(0,3,2)
row.names(QPI) <- c("p1","p2","p3")
colnames(QPI) <- c("i1","i2")
QPI["p1","i1"] <- 2
QPI["p3","i2"] <- 5
QPR <- matrix(0,3,3)
row.names(QPR) <- c("p1","p2","p3")
colnames(QPR) <- c("a","b","d")
QPR["p1","a"] <- 1
QPR["p2","b"] <- 3
QPR["p3","b"] <- 4
QIR <- matrix(0,2,3)
row.names(QIR) <- c("i1","i2")
colnames(QIR) <- c("a","b","d")
QIR["i1","a"] <- 6
QIR["i1","b"] <- 7
QIR["i2","d"] <- 9
QII <- matrix(0,2,2)
row.names(QII) <- colnames(QII) <- c("i1","i2")
QII["i2","i1"] <- 8
result <- QPI%*%QIR + QPI%*%QII%*%QIR + QPR
print(result)
本质上,您的数据代表有向图中的加权边列表。下面的代码使用 igraph
库直接计算从原始组件到最终产品的每个简单路径上的(产品)距离总和:
library(igraph)
## transform edgelist into graph
graph <- graph_from_edgelist(as.matrix(a[, c(2, 1)])) %>%
set_edge_attr("weight", value = unlist(a[, 3]))
## combinations raw components -> final products
out <- expand.grid(prodname = c("prod1", "prod2", "prod3"), component = c("a", "b", "d"), stringsAsFactors = FALSE)
## calculate quantities
out$qty <- mapply(function(component, prodname) {
## all simple paths from component -> prodname
all_paths <- all_simple_paths(graph, from = component, to = prodname)
## if simple paths exist, sum over product of weights for each path
ifelse(length(all_paths) > 0,
sum(sapply(all_paths, function(path) prod(E(graph, path = path)$weight))), 0)
}, out$component, out$prodname)
out
#> prodname component qty
#> 1 prod1 a 13
#> 2 prod2 a 0
#> 3 prod3 a 240
#> 4 prod1 b 14
#> 5 prod2 b 3
#> 6 prod3 b 284
#> 7 prod1 d 0
#> 8 prod2 d 0
#> 9 prod3 d 45
我有一个包含 3 列的组件列表:产品、组件和使用的组件数量:
a <- structure(list(prodName = c("prod1", "prod1", "prod2", "prod3",
"prod3", "int1", "int1", "int2", "int2"), component = c("a",
"int1", "b", "b", "int2", "a", "b", "int1", "d"), qty = c(1L,
2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L)), row.names = c(NA, -9L), class = c("data.table",
"data.frame"))
prodName component qty
1 prod1 a 1
2 prod1 int1 2
3 prod2 b 3
4 prod3 b 4
5 prod3 int2 5
6 int1 a 6
7 int1 b 7
8 int2 int1 8
9 int2 d 9
以prod
开头的产品为成品,以int
开头的为中间产品,带字母的为原材料。
我需要最终产品的完整成分列表,只有原材料作为成分。也就是我要将任意int
转化为原材料
- 中间产品可以由原材料和另一种中间产品组成,因此我提到"recursive"。
- 无法预先知道中间产品的嵌套/递归层数(本例为2层,实际数据超过6层)
对于这个例子,我的预期结果是(我明确说明了结果数的计算):
prodName |component |qty
prod1 |a |1+2*6 = 13
prod1 |b |0+2*7 = 14
prod2 |b |3
prod3 |b |4+5*8*7 = 284
prod3 |a |0+5*8*6 = 240
prod3 |d |0+5*9 = 45
我做了什么:
我通过使用 merge
创建一个非常繁琐的连接序列来解决这个问题。虽然这种方法适用于玩具数据,但我不太可能将其应用于真实数据。
#load data.table
library(data.table)
# split the tables between products and different levels of intermediate
a1 <- a[prodName %like% "prod",]
b1 <- a[prodName %like% "int1",]
c1 <- a[prodName %like% "int2",]
# convert int2 to raw materials
d1 <- merge(c1,
b1,
by.x = "component",
by.y = "prodName",
all.x = TRUE)[
is.na(component.y),
component.y := component][
is.na(qty.y),
qty.y := 1][,
.(prodName, qty = qty.x*qty.y),
by = .(component = component.y)]
# Since int1 is already exploded into raw materials, rbind both tables:
d1 <- rbind(d1, b1)
# convert all final products into raw materials, except that the raw mats that go directly into the product won't appear:
e1 <- merge(a1,
d1,
by.x = "component",
by.y = "prodName",
all.x = TRUE)
# rbind the last calculated raw mats (those coming from intermediate products) with those coming _directly_ into the final product:
result <- rbind(e1[!is.na(qty.y),
.(prodName, qty = qty.x * qty.y),
by = .(component = component.y)],
e1[is.na(qty.y),
.(prodName, component, qty = qty.x)])[,
.(qty = sum(qty)),
keyby = .(prodName, component)]
我知道我可以将数据分成表并执行连接,直到每个中间产品都表示为仅由原材料组成,但如上所述,由于数据的大小和中间产品的递归级别。
有没有更简单/更好的方法来进行这种递归连接?
这是我使用您的数据集的尝试。
它使用 while
循环检查以查看 components
是否也在 prodName
字段中。循环始终需要具有相同的字段,因此不是为递归乘法器添加一列(即最后的 5*8*7),而是集成迭代乘法器。即5*8*7最后变成5*56
library(data.table)
a[, qty_multiplier := 1]
b <- copy(a)
while (b[component %in% prodName, .N] > 0) {
b <- b[a
, on = .(prodName = component)
, .(prodName = i.prodName
, component = ifelse(is.na(x.component), i.component, x.component)
, qty = i.qty
, qty_multiplier = ifelse(is.na(x.qty), 1, x.qty * qty_multiplier)
)
]
}
b[prodName %like% 'prod', .(qty = sum(qty * qty_multiplier)), by = .(prodName, component)]
prodName component qty
1: prod1 a 13
2: prod1 b 14
3: prod2 b 3
4: prod3 b 284
5: prod3 a 240
6: prod3 d 45
我认为你最好用一组邻接矩阵来表示信息,这些邻接矩阵告诉你 "how much of this is made of that"。你需要4个矩阵,对应所有可能的 关系。 例如,您将最终产品和中间产品之间的关系放在一个有 3 行的矩阵中 和这样的 2 列:
QPI <- matrix(0,3,2)
row.names(QPI) <- c("p1","p2","p3")
colnames(QPI) <- c("i1","i2")
QPI["p1","i1"] <- 2
QPI["p3","i2"] <- 5
i1 i2
p1 2 0
p2 0 0
p3 0 5
这告诉您需要 2 单位的中间产品 i1 才能生产 1 单位的最终产品 p1.
类似地定义其他矩阵:
QPR <- matrix(0,3,3)
row.names(QPR) <- c("p1","p2","p3")
colnames(QPR) <- c("a","b","d")
QPR["p1","a"] <- 1
QPR["p2","b"] <- 3
QPR["p3","b"] <- 4
QIR <- matrix(0,2,3)
row.names(QIR) <- c("i1","i2")
colnames(QIR) <- c("a","b","d")
QIR["i1","a"] <- 6
QIR["i1","b"] <- 7
QIR["i2","d"] <- 9
QII <- matrix(0,2,2)
row.names(QII) <- colnames(QII) <- c("i1","i2")
例如,查看 QIR,我们发现需要 6 个单位的原始 material a 才能制成一个单位的中间产品 i1。 一旦你以这种方式获得它,你就可以总结从原始 material 到最终的所有可能方式 使用矩阵乘法的乘积。
你有 3 个条件:你可以直接从原始到最终 [QPR] QPR,或者从原始到中间
到最终 [QPI%*%QIR
] 或从原始到中间到其他中间到最终 [QPI%*%QII%*%QIR
]
你的结果最后用矩阵表示
result <- QPI%*%QIR + QPI%*%QII%*%QIR + QPR
我把所有的代码都放在了下面。如果你 运行 它你会看到结果是这样的:
a b d
p1 13 14 0
p2 0 3 0
p3 240 284 45
与
说的完全一样prodName |component |qty
prod1 |a |1+2*6 = 13
prod1 |b |0+2*7 = 14
prod2 |b |3
prod3 |b |4+5*8*7 = 284
prod3 |a |0+5*8*6 = 240
prod3 |d |0+5*9 = 45
希望这对您有所帮助
QPI <- matrix(0,3,2)
row.names(QPI) <- c("p1","p2","p3")
colnames(QPI) <- c("i1","i2")
QPI["p1","i1"] <- 2
QPI["p3","i2"] <- 5
QPR <- matrix(0,3,3)
row.names(QPR) <- c("p1","p2","p3")
colnames(QPR) <- c("a","b","d")
QPR["p1","a"] <- 1
QPR["p2","b"] <- 3
QPR["p3","b"] <- 4
QIR <- matrix(0,2,3)
row.names(QIR) <- c("i1","i2")
colnames(QIR) <- c("a","b","d")
QIR["i1","a"] <- 6
QIR["i1","b"] <- 7
QIR["i2","d"] <- 9
QII <- matrix(0,2,2)
row.names(QII) <- colnames(QII) <- c("i1","i2")
QII["i2","i1"] <- 8
result <- QPI%*%QIR + QPI%*%QII%*%QIR + QPR
print(result)
本质上,您的数据代表有向图中的加权边列表。下面的代码使用 igraph
库直接计算从原始组件到最终产品的每个简单路径上的(产品)距离总和:
library(igraph)
## transform edgelist into graph
graph <- graph_from_edgelist(as.matrix(a[, c(2, 1)])) %>%
set_edge_attr("weight", value = unlist(a[, 3]))
## combinations raw components -> final products
out <- expand.grid(prodname = c("prod1", "prod2", "prod3"), component = c("a", "b", "d"), stringsAsFactors = FALSE)
## calculate quantities
out$qty <- mapply(function(component, prodname) {
## all simple paths from component -> prodname
all_paths <- all_simple_paths(graph, from = component, to = prodname)
## if simple paths exist, sum over product of weights for each path
ifelse(length(all_paths) > 0,
sum(sapply(all_paths, function(path) prod(E(graph, path = path)$weight))), 0)
}, out$component, out$prodname)
out
#> prodname component qty
#> 1 prod1 a 13
#> 2 prod2 a 0
#> 3 prod3 a 240
#> 4 prod1 b 14
#> 5 prod2 b 3
#> 6 prod3 b 284
#> 7 prod1 d 0
#> 8 prod2 d 0
#> 9 prod3 d 45