使用 dplyr 的 case_when 函数连接两个数据集
Use case_when function of dplyr with connection between two datasets
我有以下两个示例数据集,想使用 case_when 函数作为两个数据集的组合:
game_data <- data.frame(player = c(1,1,1,2,2,2,3,3,3), level = c(1,2,3,1,2,3,1,2,3), score=c(0,150,170,80,100,110,75,100,0))
> game_data
player level score
1 1 1 0
2 1 2 150
3 1 3 170
4 2 1 80
5 2 2 100
6 2 3 110
7 3 1 75
8 3 2 100
9 3 3 0
>
> range_data <- data.frame(level = c(1,2,3), Point1 = c(20,70,140), Point2 = c(40,80,180), Point3 = c(60,90,220))
> range_data
level Point1 Point2 Point3
1 1 20 40 60
2 2 70 80 90
3 3 140 180 220
>
我现在想使用第二个数据集中点之间的范围来根据分数之间的范围在 game_data 数据集中创建一个新变量。
例如,如果用户 1 的分数在 2 级中为 150,则新变量 PointRange 应显示 "Range4",因为它高于 90。
我已经尝试了以下但它不起作用:
result <- game_data %>%
mutate(PointRange = case_when(level == range_data$level & score < range_data$point1 ~ "Range1",
level == range_data$level & score >= range_data$point1 & score < data$point2 ~ "Range2",
level == range_data$level & score >= range_data$point2 & score <= data$point3 ~ "Range3",
level == range_data$level & score >= range_data$point3 ~ "Range4"))
我该如何管理?提前致谢!
由于您是在级别列上进行匹配,因此您可以简单地 inner_join
该列,然后从单个数据框开始工作。
the arguments are evaluated in order, so you must proceed from the most specific to the most general.
game_data %>%
inner_join(range_data, by = "level") %>%
mutate(PointRange = case_when(score>=Point3 ~ "Range4",
score>=Point2 ~"Range3",
score>=Point1 ~"Range2",
score<Point1 ~"Range1")) %>%
select(-Point1,-Point2,-Point3)
# player level score PointRange
#1 1 1 0 Range1
#2 1 2 150 Range4
#3 1 3 170 Range2
#4 2 1 80 Range4
#5 2 2 100 Range4
#6 2 3 110 Range1
#7 3 1 75 Range4
#8 3 2 100 Range4
#9 3 3 0 Range1
我有以下两个示例数据集,想使用 case_when 函数作为两个数据集的组合:
game_data <- data.frame(player = c(1,1,1,2,2,2,3,3,3), level = c(1,2,3,1,2,3,1,2,3), score=c(0,150,170,80,100,110,75,100,0))
> game_data
player level score
1 1 1 0
2 1 2 150
3 1 3 170
4 2 1 80
5 2 2 100
6 2 3 110
7 3 1 75
8 3 2 100
9 3 3 0
>
> range_data <- data.frame(level = c(1,2,3), Point1 = c(20,70,140), Point2 = c(40,80,180), Point3 = c(60,90,220))
> range_data
level Point1 Point2 Point3
1 1 20 40 60
2 2 70 80 90
3 3 140 180 220
>
我现在想使用第二个数据集中点之间的范围来根据分数之间的范围在 game_data 数据集中创建一个新变量。 例如,如果用户 1 的分数在 2 级中为 150,则新变量 PointRange 应显示 "Range4",因为它高于 90。
我已经尝试了以下但它不起作用:
result <- game_data %>%
mutate(PointRange = case_when(level == range_data$level & score < range_data$point1 ~ "Range1",
level == range_data$level & score >= range_data$point1 & score < data$point2 ~ "Range2",
level == range_data$level & score >= range_data$point2 & score <= data$point3 ~ "Range3",
level == range_data$level & score >= range_data$point3 ~ "Range4"))
我该如何管理?提前致谢!
由于您是在级别列上进行匹配,因此您可以简单地 inner_join
该列,然后从单个数据框开始工作。
the arguments are evaluated in order, so you must proceed from the most specific to the most general.
game_data %>%
inner_join(range_data, by = "level") %>%
mutate(PointRange = case_when(score>=Point3 ~ "Range4",
score>=Point2 ~"Range3",
score>=Point1 ~"Range2",
score<Point1 ~"Range1")) %>%
select(-Point1,-Point2,-Point3)
# player level score PointRange
#1 1 1 0 Range1
#2 1 2 150 Range4
#3 1 3 170 Range2
#4 2 1 80 Range4
#5 2 2 100 Range4
#6 2 3 110 Range1
#7 3 1 75 Range4
#8 3 2 100 Range4
#9 3 3 0 Range1