以向量为参数的 RSQLite 参数化查询
RSQLite parametrised query with vectors as parameters
我是 SQL
及其语法的新手,我无法理解如何在 R
中使用 [= 将多个值(例如向量或列表)传递给参数化查询中的单个参数16=]。
我有一个双表数据库 (myTCGA
),其中的数据来自 RNASeq 数据。第一个 (tcga_P
) 包含某些基因样本的表达值 (FPKM
),而 (tcgaMeta
) 包含这些样本的元数据信息。
#tcga_P
FPKM Sample Tissue GeneName
5550 0.0633 TCGA-AB-2803-03A Acute_Myeloid_Leukemia PLEKHN1
5551 0.2390 TCGA-AB-2805-03A Acute_Myeloid_Leukemia PLEKHN1
5552 0.0253 TCGA-AB-2806-03A Acute_Myeloid_Leukemia PLEKHN1
5553 0.0385 TCGA-AB-2807-03A Acute_Myeloid_Leukemia PLEKHN1
5554 0.0326 TCGA-AB-2808-03A Acute_Myeloid_Leukemia PLEKHN1
5555 0.2836 TCGA-AB-2810-03A Acute_Myeloid_Leukemia PLEKHN1
# tcgaMeta (only few columns)
SampleIndex SampleID SubjectID Tumor.Type Sample.Type
1 0 TCGA-01-0628-11A TCGA-01-0628 OV Solid Tissue Normal
2 1 TCGA-01-0630-11A TCGA-01-0630 OV Solid Tissue Normal
3 2 TCGA-01-0631-11A TCGA-01-0631 OV Solid Tissue Normal
我只想从 tcga_P
中提取属于特定组的样本(例如所有肺部样本)的表达值。为此,我编写了一个如下所示的查询
library(DBI)
library(RSQLite)
library(data.table)
myGene <- "PLEKHN1"
myTissue <- "lung"
myCancer <- "Lung Adenocarcinoma"
selectedSamples <- dbGetQuery(myTCGA,
"SELECT A.*
FROM tcga_P A
WHERE A.GeneName = $gene AND
A.Sample in (SELECT B.SampleID FROM tcgaMeta B
WHERE B.Tissue = $tissue AND
B.`Disease.TCGA.` = $cancer )
",param = list(gene=myGene,tissue=myTissue,cancer=myCancer))
# from long to wide
selectedSamplesWide <- dcast(selectedSamples,GeneName~Sample, value.var = "FPKM",fun.aggregate = sum)
此查询正确 returns 仅当我当时提供一个基因时才查询值,但如果我尝试从多个基因中提取值则失败 (myGene <- c("PLEKHN1","PSMD12"))
Error in result_bind(res@ptr, params) :
Parameter 2 does not have length 2.
我可以循环 (sapply
) 载体中的基因,使 sql 一次调用一个基因并将结果绑定在一起,但我想在sql呼唤。
到目前为止,我尝试使用
WHERE A.GeneName IN ($gene)
WHERE A.GeneName IN (SELECT C.GeneName FROM $gene C)
我还尝试将 myGene
转换为 data.frame
并将基因视为一列。说什么都没用是多余的。
我错过了什么?参数如何传递给 param = list()
?
在 SQL 中,具有相等性的 WHERE
子句 =
需要一个值,您试图将表达式计算为两个值。但是,WHERE
和 IN
允许多个值:
WHERE A.GeneName IN ('PLEKHN1', 'PSMD12', ...)
对于无限数量的值,考虑使用 paste
+ collapse
动态创建准备好的语句并使用 setNames
和 as.list
绑定参数值:
myGene <- c("PLEKHN1", "PSMD12")
myTissue <- "lung"
myCancer <- "Lung Adenocarcinoma"
myPlaceHolders <- paste0("$gene", seq_along(myGene))
sql <- paste0("SELECT A.*
FROM tcga_P A
WHERE A.GeneName IN (", paste(myPlaceHolders, collapse=", "), ")
AND A.Sample in (SELECT B.SampleID
FROM tcgaMeta B
WHERE B.Tissue = $tissue
AND B.`Disease.TCGA.` = $cancer)
")
myGeneParams <- as.list(setNames(myGene, gsub("\$", "", myPlaceHolders)))
paramList <- c(myGeneParams, tissue=myTissue, cancer=myCancer)
selectedSamples <- dbGetQuery(myTCGA, sql, param = myParamList)
Rextester demo (对于prepared statement和参数输出不查询运行)
我是 SQL
及其语法的新手,我无法理解如何在 R
中使用 [= 将多个值(例如向量或列表)传递给参数化查询中的单个参数16=]。
我有一个双表数据库 (myTCGA
),其中的数据来自 RNASeq 数据。第一个 (tcga_P
) 包含某些基因样本的表达值 (FPKM
),而 (tcgaMeta
) 包含这些样本的元数据信息。
#tcga_P
FPKM Sample Tissue GeneName
5550 0.0633 TCGA-AB-2803-03A Acute_Myeloid_Leukemia PLEKHN1
5551 0.2390 TCGA-AB-2805-03A Acute_Myeloid_Leukemia PLEKHN1
5552 0.0253 TCGA-AB-2806-03A Acute_Myeloid_Leukemia PLEKHN1
5553 0.0385 TCGA-AB-2807-03A Acute_Myeloid_Leukemia PLEKHN1
5554 0.0326 TCGA-AB-2808-03A Acute_Myeloid_Leukemia PLEKHN1
5555 0.2836 TCGA-AB-2810-03A Acute_Myeloid_Leukemia PLEKHN1
# tcgaMeta (only few columns)
SampleIndex SampleID SubjectID Tumor.Type Sample.Type
1 0 TCGA-01-0628-11A TCGA-01-0628 OV Solid Tissue Normal
2 1 TCGA-01-0630-11A TCGA-01-0630 OV Solid Tissue Normal
3 2 TCGA-01-0631-11A TCGA-01-0631 OV Solid Tissue Normal
我只想从 tcga_P
中提取属于特定组的样本(例如所有肺部样本)的表达值。为此,我编写了一个如下所示的查询
library(DBI)
library(RSQLite)
library(data.table)
myGene <- "PLEKHN1"
myTissue <- "lung"
myCancer <- "Lung Adenocarcinoma"
selectedSamples <- dbGetQuery(myTCGA,
"SELECT A.*
FROM tcga_P A
WHERE A.GeneName = $gene AND
A.Sample in (SELECT B.SampleID FROM tcgaMeta B
WHERE B.Tissue = $tissue AND
B.`Disease.TCGA.` = $cancer )
",param = list(gene=myGene,tissue=myTissue,cancer=myCancer))
# from long to wide
selectedSamplesWide <- dcast(selectedSamples,GeneName~Sample, value.var = "FPKM",fun.aggregate = sum)
此查询正确 returns 仅当我当时提供一个基因时才查询值,但如果我尝试从多个基因中提取值则失败 (myGene <- c("PLEKHN1","PSMD12"))
Error in result_bind(res@ptr, params) :
Parameter 2 does not have length 2.
我可以循环 (sapply
) 载体中的基因,使 sql 一次调用一个基因并将结果绑定在一起,但我想在sql呼唤。
到目前为止,我尝试使用
WHERE A.GeneName IN ($gene)
WHERE A.GeneName IN (SELECT C.GeneName FROM $gene C)
我还尝试将 myGene
转换为 data.frame
并将基因视为一列。说什么都没用是多余的。
我错过了什么?参数如何传递给 param = list()
?
在 SQL 中,具有相等性的 WHERE
子句 =
需要一个值,您试图将表达式计算为两个值。但是,WHERE
和 IN
允许多个值:
WHERE A.GeneName IN ('PLEKHN1', 'PSMD12', ...)
对于无限数量的值,考虑使用 paste
+ collapse
动态创建准备好的语句并使用 setNames
和 as.list
绑定参数值:
myGene <- c("PLEKHN1", "PSMD12")
myTissue <- "lung"
myCancer <- "Lung Adenocarcinoma"
myPlaceHolders <- paste0("$gene", seq_along(myGene))
sql <- paste0("SELECT A.*
FROM tcga_P A
WHERE A.GeneName IN (", paste(myPlaceHolders, collapse=", "), ")
AND A.Sample in (SELECT B.SampleID
FROM tcgaMeta B
WHERE B.Tissue = $tissue
AND B.`Disease.TCGA.` = $cancer)
")
myGeneParams <- as.list(setNames(myGene, gsub("\$", "", myPlaceHolders)))
paramList <- c(myGeneParams, tissue=myTissue, cancer=myCancer)
selectedSamples <- dbGetQuery(myTCGA, sql, param = myParamList)
Rextester demo (对于prepared statement和参数输出不查询运行)