在多个数据框和存储记录中搜索某个值
Search a certain value in multiple dataframes and storage records
我认为这是一个复杂的问题,我会尽量使其易于理解。
我有 3 个数据框,例如:
NS_3<-as.data.frame(cbind(c("3","3","3","3","3"),c("341007","325001","324003","524302","346002")))
NS_4<-as.data.frame(cbind(c("4","4","4","4","4","4","4"),c("341007","270001","270001","521009","346001","524302","335104")))
NS_15<-as.data.frame(cbind(c("15","15","15","15","15"),c("301001","301001","316104","344003","291003")))
names(NS_3)<-c("NS", "Pred FAILCODE TEST")
names(NS_4)<-c("NS", "Pred FAILCODE TEST")
names(NS_15)<-c("NS", "Pred FAILCODE TEST")
image of the three dataframes
我想做的是:
1) 检查数据帧 NS_4
和 NS_15
是否包含 NS_3$Pred FAILCODE TEST
的每一行的值。
2) 如果这个值存在于某个数据帧中,那么它应该计算并存储这个数据帧 Pred FAILCODE TEST
的所有值,除了找到的值。
例如:
对于 NS_3
中的第一个 Pred FAILCODE TEST
值,检查 341007
是否存在于 NS_4
和 NS_15
中。
一旦此检查在 NS_4
中为 TRUE
,那么它应该计算所有 NS_4$Pred FAILCODE TEST
值的频率,除了有问题的值(即 341007
).
因此,第一个循环的结果应该是
Results for the first loop 341007
对于 NS_3$Pred FAILCODE TEST
的第二个和第三个值,由于 325001
和 324003
都没有出现在任何数据帧中,因此不应考虑它们。
对于第四个值 524302
,结果应该是这样的:
FAILCODES 524302
341007 1
270001 2
521009 1
346001 1
335104 1
一旦循环以 NS_3$Pred FAILCODE TEST
值结束,那么它应该对 NS_4$Pred FAILCODE TEST
值做同样的事情,在 NS_3
和 NS_15
中搜索它们。完成 NS_4 后,它应该对 NS_15
做同样的事情,搜索
NS_15$Pred FAILCODE TEST
值位于 NS_3
和 NS_4
。
我相信它需要嵌套的 for-loops 来遍历每个数据帧的每一行。此外,dflist<-list(df1=NS_3,df2=NS_4,df3=NS_15)
可能对这些循环有帮助。
实际上我有大约 70 个不同的数据帧和 50 个不同的 Pred FAILCODE TEST
值来检查每个数据帧。
我希望很清楚,如果你们需要更多信息,请告诉我!
想想就可以了,
#your code
NS_3<-as.data.frame(cbind(c("3","3","3","3","3"),c("341007","325001","324003","524302","346002")))
NS_4<-as.data.frame(cbind(c("4","4","4","4","4","4","4"),c("341007","270001","270001","521009","346001","524302","335104")))
NS_15<-as.data.frame(cbind(c("15","15","15","15","15"),c("301001","301001","316104","344003","291003")))
names(NS_3)<-c("NS", "Pred FAILCODE TEST")
names(NS_4)<-c("NS", "Pred FAILCODE TEST")
names(NS_15)<-c("NS", "Pred FAILCODE TEST")
#Make a vector of your Tables suffixes
df_index <- c(3,4,15)
#Essentially rbind() all of tables in your df_index
#there is probably an elegant way to do this with do.call()
input <- eval(parse(text = paste0("rbind(",
paste0("NS_", df_index, collapse = ","),
")")
)
)
require(dplyr)
require(magrittr)
#convert from factor to numeric
input$`Pred FAILCODE TEST` <- as.numeric(as.character(input$`Pred FAILCODE TEST`))
input$NS <- as.numeric(as.character(input$NS))
#make a compressed table of frequencies
input %>% group_by(NS, `Pred FAILCODE TEST`) %>%
summarize(n=n()) -> compressTBL
#little function to look up each record and compare
Lookup <- function(NS, FailCode){
input$NS[input$`Pred FAILCODE TEST` == FailCode & !input$NS == NS]
}
#the output, a list, each column is row in your input table
output <- sapply(X = 1:nrow(input),
FUN = function(x){
compressTBL[compressTBL$NS == Lookup(input$NS[x], input$`Pred FAILCODE TEST`[x]),]
})
#The only records with values are 1,4,6,11
output
#same as what you got in your loop
as.data.frame(output[,4]) #4th record 524302
我认为这是一个复杂的问题,我会尽量使其易于理解。
我有 3 个数据框,例如:
NS_3<-as.data.frame(cbind(c("3","3","3","3","3"),c("341007","325001","324003","524302","346002")))
NS_4<-as.data.frame(cbind(c("4","4","4","4","4","4","4"),c("341007","270001","270001","521009","346001","524302","335104")))
NS_15<-as.data.frame(cbind(c("15","15","15","15","15"),c("301001","301001","316104","344003","291003")))
names(NS_3)<-c("NS", "Pred FAILCODE TEST")
names(NS_4)<-c("NS", "Pred FAILCODE TEST")
names(NS_15)<-c("NS", "Pred FAILCODE TEST")
image of the three dataframes
我想做的是:
1) 检查数据帧 NS_4
和 NS_15
是否包含 NS_3$Pred FAILCODE TEST
的每一行的值。
2) 如果这个值存在于某个数据帧中,那么它应该计算并存储这个数据帧 Pred FAILCODE TEST
的所有值,除了找到的值。
例如:
对于 NS_3
中的第一个 Pred FAILCODE TEST
值,检查 341007
是否存在于 NS_4
和 NS_15
中。
一旦此检查在 NS_4
中为 TRUE
,那么它应该计算所有 NS_4$Pred FAILCODE TEST
值的频率,除了有问题的值(即 341007
).
因此,第一个循环的结果应该是
Results for the first loop 341007
对于 NS_3$Pred FAILCODE TEST
的第二个和第三个值,由于 325001
和 324003
都没有出现在任何数据帧中,因此不应考虑它们。
对于第四个值 524302
,结果应该是这样的:
FAILCODES 524302
341007 1
270001 2
521009 1
346001 1
335104 1
一旦循环以 NS_3$Pred FAILCODE TEST
值结束,那么它应该对 NS_4$Pred FAILCODE TEST
值做同样的事情,在 NS_3
和 NS_15
中搜索它们。完成 NS_4 后,它应该对 NS_15
做同样的事情,搜索
NS_15$Pred FAILCODE TEST
值位于 NS_3
和 NS_4
。
我相信它需要嵌套的 for-loops 来遍历每个数据帧的每一行。此外,dflist<-list(df1=NS_3,df2=NS_4,df3=NS_15)
可能对这些循环有帮助。
实际上我有大约 70 个不同的数据帧和 50 个不同的 Pred FAILCODE TEST
值来检查每个数据帧。
我希望很清楚,如果你们需要更多信息,请告诉我!
想想就可以了,
#your code
NS_3<-as.data.frame(cbind(c("3","3","3","3","3"),c("341007","325001","324003","524302","346002")))
NS_4<-as.data.frame(cbind(c("4","4","4","4","4","4","4"),c("341007","270001","270001","521009","346001","524302","335104")))
NS_15<-as.data.frame(cbind(c("15","15","15","15","15"),c("301001","301001","316104","344003","291003")))
names(NS_3)<-c("NS", "Pred FAILCODE TEST")
names(NS_4)<-c("NS", "Pred FAILCODE TEST")
names(NS_15)<-c("NS", "Pred FAILCODE TEST")
#Make a vector of your Tables suffixes
df_index <- c(3,4,15)
#Essentially rbind() all of tables in your df_index
#there is probably an elegant way to do this with do.call()
input <- eval(parse(text = paste0("rbind(",
paste0("NS_", df_index, collapse = ","),
")")
)
)
require(dplyr)
require(magrittr)
#convert from factor to numeric
input$`Pred FAILCODE TEST` <- as.numeric(as.character(input$`Pred FAILCODE TEST`))
input$NS <- as.numeric(as.character(input$NS))
#make a compressed table of frequencies
input %>% group_by(NS, `Pred FAILCODE TEST`) %>%
summarize(n=n()) -> compressTBL
#little function to look up each record and compare
Lookup <- function(NS, FailCode){
input$NS[input$`Pred FAILCODE TEST` == FailCode & !input$NS == NS]
}
#the output, a list, each column is row in your input table
output <- sapply(X = 1:nrow(input),
FUN = function(x){
compressTBL[compressTBL$NS == Lookup(input$NS[x], input$`Pred FAILCODE TEST`[x]),]
})
#The only records with values are 1,4,6,11
output
#same as what you got in your loop
as.data.frame(output[,4]) #4th record 524302