获取巨大单细胞数据帧的每个样本中的总计数
Getting the Total counts in each sample of a huge single cell dataframe
我有一个包含 79 列和 78687 行的巨大元数据文件。此元数据来自我们的癌症实验结果。
我正在使用 dplyr 查询该元数据中每个样本的细胞计数。
我有 16 个样本:
,
我需要找到每个样本中每种情况(肿瘤或正常或 MSS_Status)的细胞计数。
我是单独做的,如下
dim(meta %>% filter(Condition == "Tumor" & MSI_Status=="MSS" & Location =="Left" & orig.ident == "B_cac10"));
# 689 24
我相信有一种聪明的方法可以做到这一点,我怎样才能循环这个来一次性得到答案?
P.S:我是一名生物学家,我在循环或编码方面的知识非常有限
编辑:1
可重现的例子
df <- data.frame(Condition = c("Normal","Normal","Normal","Tumor","Tumor","Tumor"),
MSI_Status = c("High", "High", "High", "Low", "Low", "Low"),
Location = c("Lungs", "Lungs", "Lungs", "Kidney", "Kidney", "Liver"),
Clusters = c(1,2,4,2,2,6),
orig.ident = c("B-cac10","B-cac11","T-cac15","B-cac15","B-cac19","T-cac22"))
我的代码:
df %>% filter(Condition == "Tumor" & MSI_Status=="Low" & Location
=="Kidney" & orig.ident == "B-cac15")
预期结果:
每个 orig.idents 计数应在条件“肿瘤”、MSI_Status==“低”和位置 =“肾脏”
下给出
非常感谢您的帮助,注意安全。
戴夫
您可以使用 dplyr
函数 filter
根据您的条件对数据进行子集化。然后可以使用dplyr
count
函数统计orig.ident
中的唯一值。正如评论中提到的,您可以选择从此函数中设置 name = Freq
。我选择使用 rename
函数而不是尽可能明确,因为您是 R
.
的新手
数据
df <- data.frame(Condition =
c("Normal","Normal","Normal","Tumor","Tumor","Tumor"), MSI_Status =
c("High", "High", "High", "Low", "Low", "Low"), Location = c("Lungs",
"Lungs", "Lungs", "Kidney", "Kidney", "Liver"), Clusters =
c(1,2,4,2,2,6), orig.ident=c("B-cac10","B-cac11","T-cac15","B-
cac15","B-cac19","T-cac22"))
代码
library(dplyr)
df %>%
filter(Condition == "Tumor" &
MSI_Status == "Low" &
Location == "Kidney") %>%
count(orig.ident) %>%
rename(Freq = n)
#> orig.ident Freq
#> 1 B-cac15 1
#> 2 B-cac19 1
由 reprex package (v0.3.0)
于 2020-09-05 创建
我有一个包含 79 列和 78687 行的巨大元数据文件。此元数据来自我们的癌症实验结果。 我正在使用 dplyr 查询该元数据中每个样本的细胞计数。
我有 16 个样本:
我需要找到每个样本中每种情况(肿瘤或正常或 MSS_Status)的细胞计数。 我是单独做的,如下
dim(meta %>% filter(Condition == "Tumor" & MSI_Status=="MSS" & Location =="Left" & orig.ident == "B_cac10"));
# 689 24
我相信有一种聪明的方法可以做到这一点,我怎样才能循环这个来一次性得到答案?
P.S:我是一名生物学家,我在循环或编码方面的知识非常有限
编辑:1
可重现的例子
df <- data.frame(Condition = c("Normal","Normal","Normal","Tumor","Tumor","Tumor"),
MSI_Status = c("High", "High", "High", "Low", "Low", "Low"),
Location = c("Lungs", "Lungs", "Lungs", "Kidney", "Kidney", "Liver"),
Clusters = c(1,2,4,2,2,6),
orig.ident = c("B-cac10","B-cac11","T-cac15","B-cac15","B-cac19","T-cac22"))
我的代码:
df %>% filter(Condition == "Tumor" & MSI_Status=="Low" & Location
=="Kidney" & orig.ident == "B-cac15")
预期结果:
每个 orig.idents 计数应在条件“肿瘤”、MSI_Status==“低”和位置 =“肾脏”
下给出非常感谢您的帮助,注意安全。 戴夫
您可以使用 dplyr
函数 filter
根据您的条件对数据进行子集化。然后可以使用dplyr
count
函数统计orig.ident
中的唯一值。正如评论中提到的,您可以选择从此函数中设置 name = Freq
。我选择使用 rename
函数而不是尽可能明确,因为您是 R
.
数据
df <- data.frame(Condition =
c("Normal","Normal","Normal","Tumor","Tumor","Tumor"), MSI_Status =
c("High", "High", "High", "Low", "Low", "Low"), Location = c("Lungs",
"Lungs", "Lungs", "Kidney", "Kidney", "Liver"), Clusters =
c(1,2,4,2,2,6), orig.ident=c("B-cac10","B-cac11","T-cac15","B-
cac15","B-cac19","T-cac22"))
代码
library(dplyr)
df %>%
filter(Condition == "Tumor" &
MSI_Status == "Low" &
Location == "Kidney") %>%
count(orig.ident) %>%
rename(Freq = n)
#> orig.ident Freq
#> 1 B-cac15 1
#> 2 B-cac19 1
由 reprex package (v0.3.0)
于 2020-09-05 创建