创建应急 table

Creating a contingency table

我目前有一个数据框,它说明了基因组中的特定基因簇,它被定义为格式良好的制表符分隔文件,看起来基本上类似于下面的数据框(示例):

Gene Cluster     Genome
-----------------------------
GCF3372      Streptomyces_hygroscopicus
GCF3450      Streptomyces_sp_Hm1069
GCF3371      Streptomyces_sp_MBT13
GCF3371      Streptomyces_xiamenensis

基于此,我想基于此数据框创建一个 absence/presence table 或意外事件 table,其值为 0 和 1,具体取决于特定对象的缺失或存在基因组中的基因簇。整个想法是让我能够测量基因组中特定基因簇的出现,因此我想要一个 presence/absence table 以便能够对该矩阵进行统计分析。

x <- data.frame(gc = c('GCF3372','GCF3450','GCF3371','GCF3371','GCF3371'), 
                strain = c('Streptomyces_hygroscopicus', 'Streptomyces_sp_Hm1069', 
                           'Streptomyces_sp_MBT13', 'Streptomyces_xiamenensis','Streptomyces_hygroscopicus'))
dput(head(x[, c(1,2)]))

这是一种从两个分类变量计算意外事件 table 的方法。出于说明目的,我将使用 sexheight(这些在结构上似乎类似于数据框中的两个变量 x):

数据:

set.seed(300)
df <- data.frame(
  Height = sample(c("tall", "very tall", "small", "very small"), 20, replace = T),
  Sex = sample(c("m", "f"), 20, replace = T)
)
df
       Height Sex
1   very tall   f
2   very tall   m
3   very tall   m
4        tall   f
5  very small   m
6        tall   f
7        tall   m
8  very small   f
9       small   f
10       tall   m
11 very small   f
12       tall   m
13 very small   m
14      small   f
15 very small   m
16      small   m
17 very small   m
18 very small   m
19       tall   f
20       tall   m

首先,如评论中所述,使用 table:

将数据制成表格
tbl <- table(df$Sex, df$Height); tbl
    small tall very small very tall
  f     2    3          2         1
  m     1    4          5         2

然后你可以将tbl的第一行定义为一个新向量female,将第二行定义为male:

female <- tbl[1,]
male <- tbl[2,]

最后,你将两者行绑定成一个向量counts,这是你的偶然性table:

counts <- rbind(female, male)
counts
       small tall very small very tall
female     2    3          2         1
male       1    4          5         2

根据偶然性 table 你可以 运行 你的测试,可能是卡方:

test <- chisq.test(counts); test

    Pearson's Chi-squared test

data:  counts
X-squared = 1.3492, df = 3, p-value = 0.7175