如何在 R 中比较数据框同一行中的值执行 t 检验?

How do I perform a t test in R comparing values in the same row of a data frame?

我有以下数据框:

  Gene    WT1    WT2     WT3    KO1    KO2   KO3
  <chr> <dbl>  <dbl>   <dbl>  <dbl>  <dbl> <dbl>
1 BIG2  -4.46 -5.25  -5.01   -4.59  -3.47  -5.16
2 CAVN1 -4.71 -4.78  -4.69   -4.53  -4.62  -5.14
3 HVM03 -5.31 -5.63  -3.98   -0.418 -0.194 -4.21
4 DYN1  -2.09 -0.292 -0.0488 -5.13  -5.90  -4.96
5 ACSA   4.62  4.42   4.62   -5.32  -3.83  -4.08

我想在每一行上做一个 t.test,将 3 个 WT 值与 3 个 KO 值进行比较,并在数据框的末尾添加一个包含 p 值的新列。如果你能帮上忙,请告诉我

您可以按行应用 t.test 并提取 p.value。

library(dplyr)

df %>%
  rowwise() %>%
  mutate(p.value = t.test(c_across(starts_with('WT')), 
                          c_across(starts_with('KO')))$p.value) %>%
  ungroup

#  Gene    WT1    WT2     WT3    KO1    KO2   KO3 p.value
#  <chr> <dbl>  <dbl>   <dbl>  <dbl>  <dbl> <dbl>   <dbl>
#1 BIG2  -4.46 -5.25  -5.01   -4.59  -3.47  -5.16 0.433  
#2 CAVN1 -4.71 -4.78  -4.69   -4.53  -4.62  -5.14 0.866  
#3 HVM03 -5.31 -5.63  -3.98   -0.418 -0.194 -4.21 0.109  
#4 DYN1  -2.09 -0.292 -0.0488 -5.13  -5.9   -4.96 0.00971
#5 ACSA   4.62  4.42   4.62   -5.32  -3.83  -4.08 0.00222

使用 startsWith + mapply

的基础 R 选项
WT <- data.frame(t(df[startsWith(names(df), "WT")]))
KO <- data.frame(t(df[startsWith(names(df), "KO")]))
df$p.value <- mapply(function(x, y) t.test(x, y)$p.value, WT, KO)

给予

> df
   Gene   WT1    WT2     WT3    KO1    KO2   KO3     p.value
1  BIG2 -4.46 -5.250 -5.0100 -4.590 -3.470 -5.16 0.432649677
2 CAVN1 -4.71 -4.780 -4.6900 -4.530 -4.620 -5.14 0.865600809
3 HVM03 -5.31 -5.630 -3.9800 -0.418 -0.194 -4.21 0.108804979
4  DYN1 -2.09 -0.292 -0.0488 -5.130 -5.900 -4.96 0.009712383
5  ACSA  4.62  4.420  4.6200 -5.320 -3.830 -4.08 0.002216407

数据

> dput(df)
structure(list(Gene = c("BIG2", "CAVN1", "HVM03", "DYN1", "ACSA"
), WT1 = c(-4.46, -4.71, -5.31, -2.09, 4.62), WT2 = c(-5.25,
-4.78, -5.63, -0.292, 4.42), WT3 = c(-5.01, -4.69, -3.98, -0.0488,
4.62), KO1 = c(-4.59, -4.53, -0.418, -5.13, -5.32), KO2 = c(-3.47,
-4.62, -0.194, -5.9, -3.83), KO3 = c(-5.16, -5.14, -4.21, -4.96,
-4.08)), class = "data.frame", row.names = c("1", "2", "3", "4",
"5"))