使用 Dplyr 范围过滤器的开发版本进行条件过滤
Conditional Filtering using Devel Version of Dplyr's Scoped Filter
library(dplyr) devel version, soon-to-be released 0.6
library(tidyr)
下面是一个简单的数据集。 Q1Sat-Q3Sat 变量是满意度,Q1Used-Q3Used 变量是指受访者是否使用了他们正在评分的项目。这些问题是在调查中一起提出的。实际上,真实数据集至少包含 50 个 Sat 变量和 Used 变量。
Q1Sat<-c("Neutral","Neutral","VSat","Sat","Neutral","Sat","VDis","Sat","Sat","VSat")
Q2Sat<-c("Neutral","VSat","Dis","Dis","VDis","Sat","Sat","VSat","Neutral","Dis")
Q3Sat<-c("Sat","Sat","Diss","Neutral","VSat","VDis","Sat","Sat","Sat","Neutral")
Q3Used<-c("Yes","No","Yes","Yes","Yes","Yes","Yes","Yes","Yes","No")
Q2Used<-c("Yes","Yes","Yes","Yes","No","No","Yes","Yes","Yes","Yes")
Q1Used<-c("Yes","Yes","Yes","No","No","Yes","Yes","Yes","No","Yes")
House<-c("Yes","No","Unsure","Yes","Yes","No","Unsure","Unsure","Yes","Yes")
Test<-data_frame(Q1Sat,Q2Sat,Q3Sat,Q1Used,Q2Used,Q3Used,House)
我想使用下面的代码将数据重组为带百分比的 table。但是,我需要过滤 q1Used - q3Used 变量以仅包含 "Yes",并且 House 变量仅包含 "Yes"。如前所述,q1Sat 与 q1Used 相关联,因此仅当 q1Used 为“是,并且 House 变量为 "Yes" 时才应包含 q1Sat。我需要为 q2Sat 和 q3Sat 执行此操作。
但是,我一直在思考如何实现这一点。我尝试使用 dplyr 的开发版本中的作用域过滤器,但我不确定如何将它与多个变量一起使用 - q1Used:q3Used,以及 House
那么如何将 House != "Yes" 的过滤器添加到下面代码中的范围过滤器中?
Test%>%
filter_at(vars(Q1Used:Q35Used),all_vars(. != 1)%>%
select(Q1Sat:Q3Sat)%>%
gather()%>%
count(key,value)%>%
mutate(perc=round(n/sum(n),2))%>%
select(-n)%>%
spread(value,perc)
没有开发版本的解决方案。一般的想法是我们将不需要的值重新编码为 NA 而不是过滤。
sat = Test %>% select(Q1Sat:Q3Sat, House) %>%
gather(key_sat, Sat, -House)
used = Test %>% select(Q1Used:Q3Used) %>%
gather(key_used, Used)
cbind(used, sat) %>%
group_by(key_sat) %>%
mutate(
value = ((Used != "No") & (House == "Yes")) * 1,
base = sum(value)
) %>%
group_by(key_sat, Sat) %>%
summarise(perc = sum(value)/sum(base[1])) %>%
spread(Sat,perc)
library(dplyr) devel version, soon-to-be released 0.6
library(tidyr)
下面是一个简单的数据集。 Q1Sat-Q3Sat 变量是满意度,Q1Used-Q3Used 变量是指受访者是否使用了他们正在评分的项目。这些问题是在调查中一起提出的。实际上,真实数据集至少包含 50 个 Sat 变量和 Used 变量。
Q1Sat<-c("Neutral","Neutral","VSat","Sat","Neutral","Sat","VDis","Sat","Sat","VSat")
Q2Sat<-c("Neutral","VSat","Dis","Dis","VDis","Sat","Sat","VSat","Neutral","Dis")
Q3Sat<-c("Sat","Sat","Diss","Neutral","VSat","VDis","Sat","Sat","Sat","Neutral")
Q3Used<-c("Yes","No","Yes","Yes","Yes","Yes","Yes","Yes","Yes","No")
Q2Used<-c("Yes","Yes","Yes","Yes","No","No","Yes","Yes","Yes","Yes")
Q1Used<-c("Yes","Yes","Yes","No","No","Yes","Yes","Yes","No","Yes")
House<-c("Yes","No","Unsure","Yes","Yes","No","Unsure","Unsure","Yes","Yes")
Test<-data_frame(Q1Sat,Q2Sat,Q3Sat,Q1Used,Q2Used,Q3Used,House)
我想使用下面的代码将数据重组为带百分比的 table。但是,我需要过滤 q1Used - q3Used 变量以仅包含 "Yes",并且 House 变量仅包含 "Yes"。如前所述,q1Sat 与 q1Used 相关联,因此仅当 q1Used 为“是,并且 House 变量为 "Yes" 时才应包含 q1Sat。我需要为 q2Sat 和 q3Sat 执行此操作。
但是,我一直在思考如何实现这一点。我尝试使用 dplyr 的开发版本中的作用域过滤器,但我不确定如何将它与多个变量一起使用 - q1Used:q3Used,以及 House
那么如何将 House != "Yes" 的过滤器添加到下面代码中的范围过滤器中?
Test%>%
filter_at(vars(Q1Used:Q35Used),all_vars(. != 1)%>%
select(Q1Sat:Q3Sat)%>%
gather()%>%
count(key,value)%>%
mutate(perc=round(n/sum(n),2))%>%
select(-n)%>%
spread(value,perc)
没有开发版本的解决方案。一般的想法是我们将不需要的值重新编码为 NA 而不是过滤。
sat = Test %>% select(Q1Sat:Q3Sat, House) %>%
gather(key_sat, Sat, -House)
used = Test %>% select(Q1Used:Q3Used) %>%
gather(key_used, Used)
cbind(used, sat) %>%
group_by(key_sat) %>%
mutate(
value = ((Used != "No") & (House == "Yes")) * 1,
base = sum(value)
) %>%
group_by(key_sat, Sat) %>%
summarise(perc = sum(value)/sum(base[1])) %>%
spread(Sat,perc)