在多个数据框和存储记录中搜索某个值

Search a certain value in multiple dataframes and storage records

我认为这是一个复杂的问题,我会尽量使其易于理解。

我有 3 个数据框,例如:

NS_3<-as.data.frame(cbind(c("3","3","3","3","3"),c("341007","325001","324003","524302","346002")))
NS_4<-as.data.frame(cbind(c("4","4","4","4","4","4","4"),c("341007","270001","270001","521009","346001","524302","335104")))

NS_15<-as.data.frame(cbind(c("15","15","15","15","15"),c("301001","301001","316104","344003","291003")))

names(NS_3)<-c("NS", "Pred FAILCODE TEST")
names(NS_4)<-c("NS", "Pred FAILCODE TEST")
names(NS_15)<-c("NS", "Pred FAILCODE TEST")

image of the three dataframes

我想做的是:

1) 检查数据帧 NS_4NS_15 是否包含 NS_3$Pred FAILCODE TEST 的每一行的值。

2) 如果这个值存在于某个数据帧中,那么它应该计算并存储这个数据帧 Pred FAILCODE TEST 的所有值,除了找到的值。

例如: 对于 NS_3 中的第一个 Pred FAILCODE TEST 值,检查 341007 是否存在于 NS_4NS_15 中。

一旦此检查在 NS_4 中为 TRUE,那么它应该计算所有 NS_4$Pred FAILCODE TEST 值的频率,除了有问题的值(即 341007).

因此,第一个循环的结果应该是

Results for the first loop 341007

对于 NS_3$Pred FAILCODE TEST 的第二个和第三个值,由于 325001324003 都没有出现在任何数据帧中,因此不应考虑它们。

对于第四个值 524302,结果应该是这样的:

FAILCODES 524302
341007    1
270001    2
521009    1
346001    1
335104    1

一旦循环以 NS_3$Pred FAILCODE TEST 值结束,那么它应该对 NS_4$Pred FAILCODE TEST 值做同样的事情,在 NS_3NS_15 中搜索它们。完成 NS_4 后,它应该对 NS_15 做同样的事情,搜索 NS_15$Pred FAILCODE TEST 值位于 NS_3NS_4

我相信它需要嵌套的 for-loops 来遍历每个数据帧的每一行。此外,dflist<-list(df1=NS_3,df2=NS_4,df3=NS_15) 可能对这些循环有帮助。

实际上我有大约 70 个不同的数据帧和 50 个不同的 Pred FAILCODE TEST 值来检查每个数据帧。

我希望很清楚,如果你们需要更多信息,请告诉我!

想想就可以了,

#your code
NS_3<-as.data.frame(cbind(c("3","3","3","3","3"),c("341007","325001","324003","524302","346002")))
NS_4<-as.data.frame(cbind(c("4","4","4","4","4","4","4"),c("341007","270001","270001","521009","346001","524302","335104")))
NS_15<-as.data.frame(cbind(c("15","15","15","15","15"),c("301001","301001","316104","344003","291003")))

names(NS_3)<-c("NS", "Pred FAILCODE TEST")
names(NS_4)<-c("NS", "Pred FAILCODE TEST")
names(NS_15)<-c("NS", "Pred FAILCODE TEST")

#Make a vector of your Tables suffixes
df_index <- c(3,4,15)

#Essentially rbind() all of tables in your df_index 
#there is probably an elegant way to do this with do.call()
input <- eval(parse(text = paste0("rbind(",  
                     paste0("NS_", df_index, collapse = ","), 
                     ")")
       )
 )

require(dplyr)
require(magrittr)

#convert from factor to numeric
input$`Pred FAILCODE TEST` <- as.numeric(as.character(input$`Pred FAILCODE TEST`))
input$NS <- as.numeric(as.character(input$NS))

#make a compressed table of frequencies
input %>% group_by(NS, `Pred FAILCODE TEST`) %>% 
summarize(n=n()) -> compressTBL

#little function to look up each record and compare
Lookup <- function(NS, FailCode){
  input$NS[input$`Pred FAILCODE TEST` == FailCode & !input$NS == NS]
}

#the output, a list, each column is row in your input table
output <- sapply(X = 1:nrow(input), 
   FUN = function(x){
   compressTBL[compressTBL$NS == Lookup(input$NS[x], input$`Pred FAILCODE TEST`[x]),]
   })

#The only records with values are 1,4,6,11
output

#same as what you got in your loop
as.data.frame(output[,4]) #4th record 524302