合并两个数据框,但只包含没有 NA 的变量
Merge two data frames, but only include variables where there are no NAs
我有两个要合并的数据框:
df1
:
Date Company Return
1988-09-30 BELSHIPS 0.087
1988-10-31 BELSHIPS 0.021
1988-11-30 BELSHIPS 0.015
1988-12-30 BELSHIPS -0.048
1988-09-30 GOODTECH 0.114
1988-10-31 GOODTECH 0.074
1988-11-30 GOODTECH NA
1988-12-30 GOODTECH NA
1988-09-30 LABOREMUS -0.014
1988-10-31 LABOREMUS 0.024
1988-11-30 LABOREMUS 0.017
1988-12-30 LABOREMUS 0.021
df2
:
Company
BELSHIPS
BIK BOK
FARSTAD SHIPPING
GOODTECH
GYLDENDAL
我想按公司合并两个数据框,但我只想包括 return 中没有 NA 的公司。因此,新数据框应如下所示:
df3
:
Date Company Return
1988-09-30 BELSHIPS 0.087
1988-10-31 BELSHIPS 0.021
1988-11-30 BELSHIPS 0.015
1988-12-30 BELSHIPS -0.048
只包括 BELSHIPS 公司,因为 GOODTECH 在 Return 中有 NA,而 LABOREMUS 不包括在 df2
.
中
我已经试过 df3 <- merge(df2, df1[!is.na(df1$Return)], by = "Company")
但这行不通,因为它只省略了带有 NA 的行,而不是整个公司。
关于如何解决这个问题有什么建议吗?
基础 R 解决方案:
# Select companies that have NA
# You can also use unique on this
foo <- df1$Company[is.na(df1$Return)]
# Subset data frame where Company is within df2 and doesn't have NA
subset(df1, Company %in% df2$Company & !Company %in% foo)
# Date Company Return
# 1 1988-09-30 BELSHIPS 0.087
# 2 1988-10-31 BELSHIPS 0.021
# 3 1988-11-30 BELSHIPS 0.015
# 4 1988-12-30 BELSHIPS -0.048
测试数据:
df2 = data.frame(Company = c('BELSHIPS','GOODTECH'))
df1 = data.frame(Company = c('BELSHIPS','BELSHIPS','BELSHIPS','GOODTECH','GOODTECH','GOODTECH','LABOREMUS','LABOREMUS','LABOREMUS'),Return = c(1,2,3,1,NA,NA,3,4,5) )
使用 which()
和 unique()
获取具有 NA
行的公司:
df3<-merge(df2, df1[df1$Company!=unique(df1[which(is.na(df1$Return)),'Company']),], by = 'Company')
您也可以使用 dplyr
:
df2 %>%
left_join(df1, by = "Company") %>%
group_by(Company) %>%
filter(sum(is.na(Return)) == 0)
这给你:
# A tibble: 4 x 3
# Groups: Company [1]
Company Date Return
<chr> <fctr> <dbl>
1 BELSHIPS 1988-09-30 0.087
2 BELSHIPS 1988-10-31 0.021
3 BELSHIPS 1988-11-30 0.015
4 BELSHIPS 1988-12-30 -0.048
简单合并,然后使用函数na.omit(merged df)
我有两个要合并的数据框:
df1
:
Date Company Return
1988-09-30 BELSHIPS 0.087
1988-10-31 BELSHIPS 0.021
1988-11-30 BELSHIPS 0.015
1988-12-30 BELSHIPS -0.048
1988-09-30 GOODTECH 0.114
1988-10-31 GOODTECH 0.074
1988-11-30 GOODTECH NA
1988-12-30 GOODTECH NA
1988-09-30 LABOREMUS -0.014
1988-10-31 LABOREMUS 0.024
1988-11-30 LABOREMUS 0.017
1988-12-30 LABOREMUS 0.021
df2
:
Company
BELSHIPS
BIK BOK
FARSTAD SHIPPING
GOODTECH
GYLDENDAL
我想按公司合并两个数据框,但我只想包括 return 中没有 NA 的公司。因此,新数据框应如下所示:
df3
:
Date Company Return
1988-09-30 BELSHIPS 0.087
1988-10-31 BELSHIPS 0.021
1988-11-30 BELSHIPS 0.015
1988-12-30 BELSHIPS -0.048
只包括 BELSHIPS 公司,因为 GOODTECH 在 Return 中有 NA,而 LABOREMUS 不包括在 df2
.
我已经试过 df3 <- merge(df2, df1[!is.na(df1$Return)], by = "Company")
但这行不通,因为它只省略了带有 NA 的行,而不是整个公司。
关于如何解决这个问题有什么建议吗?
基础 R 解决方案:
# Select companies that have NA
# You can also use unique on this
foo <- df1$Company[is.na(df1$Return)]
# Subset data frame where Company is within df2 and doesn't have NA
subset(df1, Company %in% df2$Company & !Company %in% foo)
# Date Company Return
# 1 1988-09-30 BELSHIPS 0.087
# 2 1988-10-31 BELSHIPS 0.021
# 3 1988-11-30 BELSHIPS 0.015
# 4 1988-12-30 BELSHIPS -0.048
测试数据:
df2 = data.frame(Company = c('BELSHIPS','GOODTECH'))
df1 = data.frame(Company = c('BELSHIPS','BELSHIPS','BELSHIPS','GOODTECH','GOODTECH','GOODTECH','LABOREMUS','LABOREMUS','LABOREMUS'),Return = c(1,2,3,1,NA,NA,3,4,5) )
使用 which()
和 unique()
获取具有 NA
行的公司:
df3<-merge(df2, df1[df1$Company!=unique(df1[which(is.na(df1$Return)),'Company']),], by = 'Company')
您也可以使用 dplyr
:
df2 %>%
left_join(df1, by = "Company") %>%
group_by(Company) %>%
filter(sum(is.na(Return)) == 0)
这给你:
# A tibble: 4 x 3
# Groups: Company [1]
Company Date Return
<chr> <fctr> <dbl>
1 BELSHIPS 1988-09-30 0.087
2 BELSHIPS 1988-10-31 0.021
3 BELSHIPS 1988-11-30 0.015
4 BELSHIPS 1988-12-30 -0.048
简单合并,然后使用函数na.omit(merged df)