post 前二进制数据的分组条形图

Grouped barplot for pre-post binary data

我想为如下排列的数据设计一个分组条形图

       sx1pre sx1post sx2pre sx2post   
1         1     1       1       0
2         1     0       1       0  
3         0     1       1       0 
4         1     0       0       1
5         1     0       1       0
6         1     0       1       0 

我想在单个图表中比较每个 sx(1 或 2)"pre" 和 "post" 的频率。我想用图形表示在手术前 (pre) 表现出一种症状 (sx) 的患者占总数的百分比,以及在手术后 (post) 表现出相同症状的患者的百分比。 谢谢

再读一遍,我想我知道了,你想达到什么目的。我猜你已经在 R 中有了数据?

df=read.delim("temp.csv") #data is now in df

frequencies=data.frame(lapply(df,FUN=function(x){sum(x)/length(x)})) #calculate percentages

frequencies=data.frame(t(frequencies)) #make long form of data frame

names(frequencies)="percentage" #rename column
frequencies$category=row.names(frequencies) #get "proper" metadata
frequencies$timepoint=ifelse(grepl("pre",frequencies$category),"pre","post") #get timepoint
frequencies$intervention=ifelse(grepl("sx1",frequencies$category),"sx1","sx2") #get intervention type

#plot
ggplot(frequencies,aes(x=intervention,y=percentage,fill=timepoint))+
  geom_col(position=position_dodge())

关于disease-conditions,使用这样的东西可能更容易:

new_names_after_comment=c('PAIN.PO','DYSPNEA.PO','PAIN.FU','DYSPNEA.FU')
frequencies$category_new=new_names_after_comment #just add as a new column

library(tidyr)
frequencies=frequencies %>% 
    separate(category_new,into=c("Disease","Timepoint"),sep="\.",remove = F)

#plot after comment

ggplot(frequencies, aes(x=Disease,y=percentage,fill=Timepoint))+
  geom_col(position = position_dodge())