R:查询二叉树的递归算法
R: recursive algorithm for querying a binary tree
我有一棵名为 mytree
的树,如下所示:
在 R 中,我将其存储为列表:
mytree <- list(left = structure(list(y = -10, x = 10, grad = -10.5, sim_score = 110.25,
value = -10.5, criterion = "x < 15"), row.names = 1L, class = "data.frame"),
right = list(left = list(left = structure(list(y = 7, x = 20,
grad = 6.5, sim_score = 42.25, value = 6.5, criterion = "x < 22.5"), row.names = 2L, class = "data.frame"),
right = structure(list(y = 8, x = 25, grad = 7.5, sim_score = 56.25,
value = 7.5, criterion = "x >= 22.5"), row.names = 3L, class = "data.frame"),
root = list(root = structure(list(y = c(7, 8), x = c(20,
25), grad = c(6.5, 7.5), sim_score = c(98, 98), value = c(7,
7), criterion = c("x < 30", "x < 30")), row.names = 2:3, class = "data.frame"),
gain = 0.5)), right = structure(list(y = -7, x = 35,
grad = -7.5, sim_score = 56.25, value = -7.5, criterion = "x >= 30"), row.names = 4L, class = "data.frame"),
root = list(root = structure(list(y = c(7, 8, -7), x = c(20,
25, 35), grad = c(6.5, 7.5, -7.5), sim_score = c(14.0833333333333,
14.0833333333333, 14.0833333333333), value = c(2.16666666666667,
2.16666666666667, 2.16666666666667), criterion = c("x >= 15",
"x >= 15", "x >= 15")), row.names = 2:4, class = "data.frame"),
gain = 140.166666666667)), root = list(root = structure(list(
y = c(-10, 7, 8, -7), x = c(10, 20, 25, 35), grad = c(-10.5,
6.5, 7.5, -7.5), sim_score = c(4, 4, 4, 4)), row.names = c(NA,
-4L), class = "data.frame"), gain = 120.333333333333))
看起来像这样
$left
y x grad sim_score value criterion
1 -10 10 -10.5 110.25 -10.5 x < 15
$right
$right$left
$right$left$left
y x grad sim_score value criterion
2 7 20 6.5 42.25 6.5 x < 22.5
$right$left$right
y x grad sim_score value criterion
3 8 25 7.5 56.25 7.5 x >= 22.5
$right$left$root
$right$left$root$root
y x grad sim_score value criterion
2 7 20 6.5 98 7 x < 30
3 8 25 7.5 98 7 x < 30
$right$left$root$gain
[1] 0.5
$right$right
y x grad sim_score value criterion
4 -7 35 -7.5 56.25 -7.5 x >= 30
$right$root
$right$root$root
y x grad sim_score value criterion
2 7 20 6.5 14.08333 2.166667 x >= 15
3 8 25 7.5 14.08333 2.166667 x >= 15
4 -7 35 -7.5 14.08333 2.166667 x >= 15
$right$root$gain
[1] 140.1667
$root
$root$root
y x grad sim_score
1 -10 10 -10.5 4
2 7 20 6.5 4
3 8 25 7.5 4
4 -7 35 -7.5 4
$root$gain
[1] 120.3333
拆分存储在 criterion
下,休假值存储在 value
下。
给定一个新的数据点,x = 5
,我想查询mytree
,看看这个实例属于哪个叶节点。对于 x = 5
,我的函数应该输出值 -10.5
,因为 5 < 15
。类似地,如果 x = 25
,那么它应该以值 7.5
结束在叶子中。以下是我希望 pred_tree
函数输出的更多示例:
newdata <- data.frame(x = c(5, 19, 18, 30))
> pred_tree(tree = mytree, newdata = newdata)
[1] -10.5
[2] 6.5
[3] 6.5
[4] -7.5
这是我目前的情况:
pred_tree <- function(tree, newdata){
for(i in length(tree)){
# Check if this is a leaf
if(length(tree[[i]]) == 1){
# Check criterion
if(eval(parse(text=tree[[i]]$criterion))){
# Return value of leaf
return(tree[[i]]$value[1])
}
}else if(length(tree[[i]]) > 1){
for(j in 1:length(tree[[i]])){
if(length(tree[[i]][[j]]) == 1){
# Check criterion
if(eval(parse(text=tree[[i]][[j]]$criterion))){
# Return value of leaf
return(tree[[i]][[j]]$value[1])
}
}
}
}
}
}
pred_tree(tree, newdata = newdata)
不幸的是,这个函数没有返回正确的输出。此外,这相当笨拙,如果我对 运行 有很多查询,它可能会非常慢。我猜想使用递归算法比使用嵌套 for 循环更有意义。谁能指出我正确的方向?
@@@@@@@@@@@@@编辑@@@@@@@@@@@
mytree3 <- list(left = list(left = structure(list(y = -10, x = 10, grad = 0,
sim_score = 0, value = 0, criterion = "x < 15"), row.names = 1L, class = "data.frame"),
right = structure(list(y = 7, x = 20, grad = -0.5, sim_score = 0.25,
value = -0.5, criterion = "x >= 15"), row.names = 2L, class = "data.frame"),
root = list(root = structure(list(y = c(-10, 7), x = c(10,
20), grad = c(0, -0.5), sim_score = c(0.125, 0.125), value = c(-0.25,
-0.25), criterion = c("x < 22.5", "x < 22.5")), row.names = 1:2, class = "data.frame"),
gain = 0.125)), right = list(left = structure(list(y = 8,
x = 25, grad = 0.5, sim_score = 0.25, value = 0.5, criterion = "x < 30"), row.names = 3L, class = "data.frame"),
right = structure(list(y = -7, x = 35, grad = 0, sim_score = 0,
value = 0, criterion = "x >= 30"), row.names = 4L, class = "data.frame"),
root = list(root = structure(list(y = c(8, -7), x = c(25,
35), grad = c(0.5, 0), sim_score = c(0.125, 0.125), value = c(0.25,
0.25), criterion = c("x >= 22.5", "x >= 22.5")), row.names = 3:4, class = "data.frame"),
gain = 0.125)), root = list(root = structure(list(y = c(-10,
7, 8, -7), x = c(10, 20, 25, 35), grad = c(0, -0.5, 0.5, 0),
sim_score = c(0, 0, 0, 0), value = c(0, 0, 0, 0)), row.names = c(NA,
-4L), class = "data.frame"), gain = 0.25))
运行下面没有给出正确的输出
pred_tree(tree = mytree3, newdata = newdata)
您可以执行的简单递归可以是:
.pred <- function(x, tree)
{
#Ensure you pass in a list and not a dataframe
if(is.data.frame(tree)) tree <- list(tree)
#Reorder the list if necessary
if(!is.data.frame(tree[[1]])) tree <- tree[c(2, 1, 3)]
# Check whether the condition is met. If so return
if (eval(parse(text=tree[[1]][["criterion"]]),list(x = x))) tree[[1]][["value"]][1]
else .pred(x, tree[[2]])
}
pred_tree <- function(tree, newdata)
{
cbind(newdata,pred = Vectorize(.pred,"x")(x= newdata$x,tree))
}
现在你可以调用你的函数了:
pred_tree(mytree,data.frame(x=c(1,10,15,18,19,22,23,25,29,30,33,35,100)))
x pred
1 1 -10.5
2 10 -10.5
3 15 6.5
4 18 6.5
5 19 6.5
6 22 6.5
7 23 7.5
8 25 7.5
9 29 7.5
10 30 -7.5
11 33 -7.5
12 35 -7.5
13 100 -7.5
我有一棵名为 mytree
的树,如下所示:
在 R 中,我将其存储为列表:
mytree <- list(left = structure(list(y = -10, x = 10, grad = -10.5, sim_score = 110.25,
value = -10.5, criterion = "x < 15"), row.names = 1L, class = "data.frame"),
right = list(left = list(left = structure(list(y = 7, x = 20,
grad = 6.5, sim_score = 42.25, value = 6.5, criterion = "x < 22.5"), row.names = 2L, class = "data.frame"),
right = structure(list(y = 8, x = 25, grad = 7.5, sim_score = 56.25,
value = 7.5, criterion = "x >= 22.5"), row.names = 3L, class = "data.frame"),
root = list(root = structure(list(y = c(7, 8), x = c(20,
25), grad = c(6.5, 7.5), sim_score = c(98, 98), value = c(7,
7), criterion = c("x < 30", "x < 30")), row.names = 2:3, class = "data.frame"),
gain = 0.5)), right = structure(list(y = -7, x = 35,
grad = -7.5, sim_score = 56.25, value = -7.5, criterion = "x >= 30"), row.names = 4L, class = "data.frame"),
root = list(root = structure(list(y = c(7, 8, -7), x = c(20,
25, 35), grad = c(6.5, 7.5, -7.5), sim_score = c(14.0833333333333,
14.0833333333333, 14.0833333333333), value = c(2.16666666666667,
2.16666666666667, 2.16666666666667), criterion = c("x >= 15",
"x >= 15", "x >= 15")), row.names = 2:4, class = "data.frame"),
gain = 140.166666666667)), root = list(root = structure(list(
y = c(-10, 7, 8, -7), x = c(10, 20, 25, 35), grad = c(-10.5,
6.5, 7.5, -7.5), sim_score = c(4, 4, 4, 4)), row.names = c(NA,
-4L), class = "data.frame"), gain = 120.333333333333))
看起来像这样
$left
y x grad sim_score value criterion
1 -10 10 -10.5 110.25 -10.5 x < 15
$right
$right$left
$right$left$left
y x grad sim_score value criterion
2 7 20 6.5 42.25 6.5 x < 22.5
$right$left$right
y x grad sim_score value criterion
3 8 25 7.5 56.25 7.5 x >= 22.5
$right$left$root
$right$left$root$root
y x grad sim_score value criterion
2 7 20 6.5 98 7 x < 30
3 8 25 7.5 98 7 x < 30
$right$left$root$gain
[1] 0.5
$right$right
y x grad sim_score value criterion
4 -7 35 -7.5 56.25 -7.5 x >= 30
$right$root
$right$root$root
y x grad sim_score value criterion
2 7 20 6.5 14.08333 2.166667 x >= 15
3 8 25 7.5 14.08333 2.166667 x >= 15
4 -7 35 -7.5 14.08333 2.166667 x >= 15
$right$root$gain
[1] 140.1667
$root
$root$root
y x grad sim_score
1 -10 10 -10.5 4
2 7 20 6.5 4
3 8 25 7.5 4
4 -7 35 -7.5 4
$root$gain
[1] 120.3333
拆分存储在 criterion
下,休假值存储在 value
下。
给定一个新的数据点,x = 5
,我想查询mytree
,看看这个实例属于哪个叶节点。对于 x = 5
,我的函数应该输出值 -10.5
,因为 5 < 15
。类似地,如果 x = 25
,那么它应该以值 7.5
结束在叶子中。以下是我希望 pred_tree
函数输出的更多示例:
newdata <- data.frame(x = c(5, 19, 18, 30))
> pred_tree(tree = mytree, newdata = newdata)
[1] -10.5
[2] 6.5
[3] 6.5
[4] -7.5
这是我目前的情况:
pred_tree <- function(tree, newdata){
for(i in length(tree)){
# Check if this is a leaf
if(length(tree[[i]]) == 1){
# Check criterion
if(eval(parse(text=tree[[i]]$criterion))){
# Return value of leaf
return(tree[[i]]$value[1])
}
}else if(length(tree[[i]]) > 1){
for(j in 1:length(tree[[i]])){
if(length(tree[[i]][[j]]) == 1){
# Check criterion
if(eval(parse(text=tree[[i]][[j]]$criterion))){
# Return value of leaf
return(tree[[i]][[j]]$value[1])
}
}
}
}
}
}
pred_tree(tree, newdata = newdata)
不幸的是,这个函数没有返回正确的输出。此外,这相当笨拙,如果我对 运行 有很多查询,它可能会非常慢。我猜想使用递归算法比使用嵌套 for 循环更有意义。谁能指出我正确的方向?
@@@@@@@@@@@@@编辑@@@@@@@@@@@
mytree3 <- list(left = list(left = structure(list(y = -10, x = 10, grad = 0,
sim_score = 0, value = 0, criterion = "x < 15"), row.names = 1L, class = "data.frame"),
right = structure(list(y = 7, x = 20, grad = -0.5, sim_score = 0.25,
value = -0.5, criterion = "x >= 15"), row.names = 2L, class = "data.frame"),
root = list(root = structure(list(y = c(-10, 7), x = c(10,
20), grad = c(0, -0.5), sim_score = c(0.125, 0.125), value = c(-0.25,
-0.25), criterion = c("x < 22.5", "x < 22.5")), row.names = 1:2, class = "data.frame"),
gain = 0.125)), right = list(left = structure(list(y = 8,
x = 25, grad = 0.5, sim_score = 0.25, value = 0.5, criterion = "x < 30"), row.names = 3L, class = "data.frame"),
right = structure(list(y = -7, x = 35, grad = 0, sim_score = 0,
value = 0, criterion = "x >= 30"), row.names = 4L, class = "data.frame"),
root = list(root = structure(list(y = c(8, -7), x = c(25,
35), grad = c(0.5, 0), sim_score = c(0.125, 0.125), value = c(0.25,
0.25), criterion = c("x >= 22.5", "x >= 22.5")), row.names = 3:4, class = "data.frame"),
gain = 0.125)), root = list(root = structure(list(y = c(-10,
7, 8, -7), x = c(10, 20, 25, 35), grad = c(0, -0.5, 0.5, 0),
sim_score = c(0, 0, 0, 0), value = c(0, 0, 0, 0)), row.names = c(NA,
-4L), class = "data.frame"), gain = 0.25))
运行下面没有给出正确的输出
pred_tree(tree = mytree3, newdata = newdata)
您可以执行的简单递归可以是:
.pred <- function(x, tree)
{
#Ensure you pass in a list and not a dataframe
if(is.data.frame(tree)) tree <- list(tree)
#Reorder the list if necessary
if(!is.data.frame(tree[[1]])) tree <- tree[c(2, 1, 3)]
# Check whether the condition is met. If so return
if (eval(parse(text=tree[[1]][["criterion"]]),list(x = x))) tree[[1]][["value"]][1]
else .pred(x, tree[[2]])
}
pred_tree <- function(tree, newdata)
{
cbind(newdata,pred = Vectorize(.pred,"x")(x= newdata$x,tree))
}
现在你可以调用你的函数了:
pred_tree(mytree,data.frame(x=c(1,10,15,18,19,22,23,25,29,30,33,35,100)))
x pred
1 1 -10.5
2 10 -10.5
3 15 6.5
4 18 6.5
5 19 6.5
6 22 6.5
7 23 7.5
8 25 7.5
9 29 7.5
10 30 -7.5
11 33 -7.5
12 35 -7.5
13 100 -7.5