R缺失值。有两个数据集,在相同的 ID,CUSIP & DATE,一个数据很好,另一个有 NA。如何通过完整的数据库来拟合它?
R missing value. There are two data set, at the same ID, CUSIP & DATE, one is fine for its data, another has NA. How to fit it by Complete database?
有两个数据集,A&B,如下
一个
id CUSIP name day
01 00256 ALEX 20170101
02 00259 BEAR 20170101
03 00258 CAT 20170101
B
id CUSIP name day
01 00256 NA 20170101
06 00259 BEAR 20170106
09 00258 CAT 20170109
数据集B中有一个NA
,但是在同一个CUSIP中我们可以看到A数据集的name列不是NA
,而是ALEX
。
同一个CUSIP下如何使用A库中的数据填充B库?从而使整个事情如下所示:
B
id CUSIP name day
01 00256 ALEX 20170101
06 00259 BEAR 20170106
09 00258 CAT 20170109
鉴于以上信息:
A=data.frame(id=c(01,02,03), CUSIP=c(00256,00259 ,00258 ),
name=c("ALEX","BEAR","CAT") ,day=c("2017-01-01" , "2017-01-01","2017-01-01")
,stringsAsFactors = F)
B=data.frame(id=c(01,06,09),CUSIP=c(00256,00259 ,00258 ),
name=c(NA,"BEAR","CAT"),day=c("2017-01-01" , "2017-01-06","2017-01-09"),
stringsAsFactors = F)
为了能够使用 A 填充 B :
dplyr::coalesce(B,A)
id CUSIP name day
1 1 256 ALEX 2017-01-01
2 6 259 BEAR 2017-01-06
3 9 258 CAT 2017-01-09
B$name <- ifelse(is.na(B$name), A$name[match(A$id, B$id)], B$name)
有两个数据集,A&B,如下
一个
id CUSIP name day
01 00256 ALEX 20170101
02 00259 BEAR 20170101
03 00258 CAT 20170101
B
id CUSIP name day
01 00256 NA 20170101
06 00259 BEAR 20170106
09 00258 CAT 20170109
数据集B中有一个NA
,但是在同一个CUSIP中我们可以看到A数据集的name列不是NA
,而是ALEX
。
同一个CUSIP下如何使用A库中的数据填充B库?从而使整个事情如下所示:
B
id CUSIP name day
01 00256 ALEX 20170101
06 00259 BEAR 20170106
09 00258 CAT 20170109
鉴于以上信息:
A=data.frame(id=c(01,02,03), CUSIP=c(00256,00259 ,00258 ),
name=c("ALEX","BEAR","CAT") ,day=c("2017-01-01" , "2017-01-01","2017-01-01")
,stringsAsFactors = F)
B=data.frame(id=c(01,06,09),CUSIP=c(00256,00259 ,00258 ),
name=c(NA,"BEAR","CAT"),day=c("2017-01-01" , "2017-01-06","2017-01-09"),
stringsAsFactors = F)
为了能够使用 A 填充 B :
dplyr::coalesce(B,A)
id CUSIP name day
1 1 256 ALEX 2017-01-01
2 6 259 BEAR 2017-01-06
3 9 258 CAT 2017-01-09
B$name <- ifelse(is.na(B$name), A$name[match(A$id, B$id)], B$name)