如何在具有两个共同列的两个不同数据集中查找单个值(基于条件)
How to lookup for a single value in two different datasets that have two columns in common (based on a condition)
我有两个不同的数据集如下:
当且仅当:两个数据集中的 ID 和日期列匹配时,我需要在数据集 #2 中添加看涨交易量、看跌交易量和总交易量列。我根据数据集 #1 中第 3 列的值(C 代表看涨期权,P 代表看跌期权,T 代表总计)分离看涨期权、看跌期权和总计。
我是 运行 这段代码,但它不起作用(仅显示调用示例,同样的规则适用于看跌和总计)。
dataset2$call_volume <- if(dataset1$optiontype== "C")
{ dataset1$volume [ match (
interaction(dataset2$ID,dataset2$date),
interaction(dataset1$ID,dataset1$date)
)]}
有没有人建议我如何处理代码?非常感谢!
> dput(dataset1)
structure(list(ID = c(44652, 44652, 44652, 56266, 56266, 56266,
44652, 44652, 44652, 56266, 56266, 56266), date = c("1997/01/02",
"1997/01/02", "1997/01/02", "1997/01/02", "1997/01/02", "1997/01/02",
"1997/01/03", "1997/01/03", "1997/01/03", "1997/01/03", "1997/01/03",
"1997/01/03"), `option type (C,P,T: for calls, puts, and total)` = c("C",
"P", "T", "C", "P", "T", "C", "P", "T", "C", "P", "T"), volume = c(34,
250, 284, 30, 0, 30, 1443, 211, 1654, 4490, 826, 5316)), row.names = c(NA,
-12L), class = c("tbl_df", "tbl", "data.frame"))
> dput(dataset2)
structure(list(ID = c(44652, 44652, 44652, 56266, 56266, 56266
), date = c("1997/01/02", "1997/01/03", "1997/01/04", "1997/01/02",
"1997/01/03", "1997/01/04"), `call volume` = c(NA, NA, NA, NA,
NA, NA), `put volume` = c(NA, NA, NA, NA, NA, NA), `total volume` = c(NA,
NA, NA, NA, NA, NA)), row.names = c(NA, -6L), class = c("tbl_df",
"tbl", "data.frame"))
更新:我在这两个数据集中还有许多其他列彼此完全不同,唯一共同的是下面的图片和数据集中显示的列。
我认为这是一个 x/y 问题。我认为您实际上是在尝试将 dataset1
转换为宽格式以填充 dataset2
。在此之后你可以left_join
两个框架。
library(tidyr)
library(dplyr)
names(dataset1)[3] <- "option_type"
dataset2 %>%
dplyr::select(-`call volume`, -`put volume`, -`total volume`) %>%
left_join(dataset1 %>%
tidyr::pivot_wider(names_from = "option_type", values_from = "volume") %>%
rename("Call Volume" = C, "Put Volume" = P, "Total Volume" = `T`),
by = c("ID", "date"))
#> # A tibble: 6 x 5
#> ID date `Call Volume` `Put Volume` `Total Volume`
#> <dbl> <chr> <dbl> <dbl> <dbl>
#> 1 44652 1997/01/02 34 250 284
#> 2 44652 1997/01/03 1443 211 1654
#> 3 44652 1997/01/04 NA NA NA
#> 4 56266 1997/01/02 30 0 30
#> 5 56266 1997/01/03 4490 826 5316
#> 6 56266 1997/01/04 NA NA NA
由 reprex package (v0.3.0)
于 2020-10-07 创建
如果我理解正确,你希望 dataset2 具有来自 dataset1 的值(如果匹配)或 NA(如果不匹配)。
如果是这样,你需要使用left_join
。
如果没有,请使用所需的输出更新您的问题。
library(tidyverse)
d1 <- structure(list(ID = c(44652, 44652, 44652, 56266, 56266, 56266,
44652, 44652, 44652, 56266, 56266, 56266), date = c("1997/01/02",
"1997/01/02", "1997/01/02", "1997/01/02", "1997/01/02", "1997/01/02",
"1997/01/03", "1997/01/03", "1997/01/03", "1997/01/03", "1997/01/03",
"1997/01/03"), `option type (C,P,T: for calls, puts, and total)` = c("C",
"P", "T", "C", "P", "T", "C", "P", "T", "C", "P", "T"), volume = c(34,
250, 284, 30, 0, 30, 1443, 211, 1654, 4490, 826, 5316)), row.names = c(NA,
-12L), class = c("tbl_df", "tbl", "data.frame"))
d2 <- structure(list(ID = c(44652, 44652, 44652, 56266, 56266, 56266
), date = c("1997/01/02", "1997/01/03", "1997/01/04", "1997/01/02",
"1997/01/03", "1997/01/04"), `call volume` = c(NA, NA, NA, NA,
NA, NA), `put volume` = c(NA, NA, NA, NA, NA, NA), `total volume` = c(NA,
NA, NA, NA, NA, NA)), row.names = c(NA, -6L), class = c("tbl_df",
"tbl", "data.frame"))
d1_longer <- d1 %>%
pivot_wider(names_from = `option type (C,P,T: for calls, puts, and total)`, values_from = volume) %>%
rename(`call volume` = `C`, `put volume` = `P`, `total volume` = `T`)
d2 %>%
select(ID, date) %>%
left_join(d1_longer)
#> Joining, by = c("ID", "date")
#> # A tibble: 6 x 5
#> ID date `call volume` `put volume` `total volume`
#> <dbl> <chr> <dbl> <dbl> <dbl>
#> 1 44652 1997/01/02 34 250 284
#> 2 44652 1997/01/03 1443 211 1654
#> 3 44652 1997/01/04 NA NA NA
#> 4 56266 1997/01/02 30 0 30
#> 5 56266 1997/01/03 4490 826 5316
#> 6 56266 1997/01/04 NA NA NA
由 reprex package (v0.3.0)
于 2020-10-07 创建
我有两个不同的数据集如下:
当且仅当:两个数据集中的 ID 和日期列匹配时,我需要在数据集 #2 中添加看涨交易量、看跌交易量和总交易量列。我根据数据集 #1 中第 3 列的值(C 代表看涨期权,P 代表看跌期权,T 代表总计)分离看涨期权、看跌期权和总计。
我是 运行 这段代码,但它不起作用(仅显示调用示例,同样的规则适用于看跌和总计)。
dataset2$call_volume <- if(dataset1$optiontype== "C")
{ dataset1$volume [ match (
interaction(dataset2$ID,dataset2$date),
interaction(dataset1$ID,dataset1$date)
)]}
有没有人建议我如何处理代码?非常感谢!
> dput(dataset1)
structure(list(ID = c(44652, 44652, 44652, 56266, 56266, 56266,
44652, 44652, 44652, 56266, 56266, 56266), date = c("1997/01/02",
"1997/01/02", "1997/01/02", "1997/01/02", "1997/01/02", "1997/01/02",
"1997/01/03", "1997/01/03", "1997/01/03", "1997/01/03", "1997/01/03",
"1997/01/03"), `option type (C,P,T: for calls, puts, and total)` = c("C",
"P", "T", "C", "P", "T", "C", "P", "T", "C", "P", "T"), volume = c(34,
250, 284, 30, 0, 30, 1443, 211, 1654, 4490, 826, 5316)), row.names = c(NA,
-12L), class = c("tbl_df", "tbl", "data.frame"))
> dput(dataset2)
structure(list(ID = c(44652, 44652, 44652, 56266, 56266, 56266
), date = c("1997/01/02", "1997/01/03", "1997/01/04", "1997/01/02",
"1997/01/03", "1997/01/04"), `call volume` = c(NA, NA, NA, NA,
NA, NA), `put volume` = c(NA, NA, NA, NA, NA, NA), `total volume` = c(NA,
NA, NA, NA, NA, NA)), row.names = c(NA, -6L), class = c("tbl_df",
"tbl", "data.frame"))
更新:我在这两个数据集中还有许多其他列彼此完全不同,唯一共同的是下面的图片和数据集中显示的列。
我认为这是一个 x/y 问题。我认为您实际上是在尝试将 dataset1
转换为宽格式以填充 dataset2
。在此之后你可以left_join
两个框架。
library(tidyr)
library(dplyr)
names(dataset1)[3] <- "option_type"
dataset2 %>%
dplyr::select(-`call volume`, -`put volume`, -`total volume`) %>%
left_join(dataset1 %>%
tidyr::pivot_wider(names_from = "option_type", values_from = "volume") %>%
rename("Call Volume" = C, "Put Volume" = P, "Total Volume" = `T`),
by = c("ID", "date"))
#> # A tibble: 6 x 5
#> ID date `Call Volume` `Put Volume` `Total Volume`
#> <dbl> <chr> <dbl> <dbl> <dbl>
#> 1 44652 1997/01/02 34 250 284
#> 2 44652 1997/01/03 1443 211 1654
#> 3 44652 1997/01/04 NA NA NA
#> 4 56266 1997/01/02 30 0 30
#> 5 56266 1997/01/03 4490 826 5316
#> 6 56266 1997/01/04 NA NA NA
由 reprex package (v0.3.0)
于 2020-10-07 创建如果我理解正确,你希望 dataset2 具有来自 dataset1 的值(如果匹配)或 NA(如果不匹配)。
如果是这样,你需要使用left_join
。
如果没有,请使用所需的输出更新您的问题。
library(tidyverse)
d1 <- structure(list(ID = c(44652, 44652, 44652, 56266, 56266, 56266,
44652, 44652, 44652, 56266, 56266, 56266), date = c("1997/01/02",
"1997/01/02", "1997/01/02", "1997/01/02", "1997/01/02", "1997/01/02",
"1997/01/03", "1997/01/03", "1997/01/03", "1997/01/03", "1997/01/03",
"1997/01/03"), `option type (C,P,T: for calls, puts, and total)` = c("C",
"P", "T", "C", "P", "T", "C", "P", "T", "C", "P", "T"), volume = c(34,
250, 284, 30, 0, 30, 1443, 211, 1654, 4490, 826, 5316)), row.names = c(NA,
-12L), class = c("tbl_df", "tbl", "data.frame"))
d2 <- structure(list(ID = c(44652, 44652, 44652, 56266, 56266, 56266
), date = c("1997/01/02", "1997/01/03", "1997/01/04", "1997/01/02",
"1997/01/03", "1997/01/04"), `call volume` = c(NA, NA, NA, NA,
NA, NA), `put volume` = c(NA, NA, NA, NA, NA, NA), `total volume` = c(NA,
NA, NA, NA, NA, NA)), row.names = c(NA, -6L), class = c("tbl_df",
"tbl", "data.frame"))
d1_longer <- d1 %>%
pivot_wider(names_from = `option type (C,P,T: for calls, puts, and total)`, values_from = volume) %>%
rename(`call volume` = `C`, `put volume` = `P`, `total volume` = `T`)
d2 %>%
select(ID, date) %>%
left_join(d1_longer)
#> Joining, by = c("ID", "date")
#> # A tibble: 6 x 5
#> ID date `call volume` `put volume` `total volume`
#> <dbl> <chr> <dbl> <dbl> <dbl>
#> 1 44652 1997/01/02 34 250 284
#> 2 44652 1997/01/03 1443 211 1654
#> 3 44652 1997/01/04 NA NA NA
#> 4 56266 1997/01/02 30 0 30
#> 5 56266 1997/01/03 4490 826 5316
#> 6 56266 1997/01/04 NA NA NA
由 reprex package (v0.3.0)
于 2020-10-07 创建