SQL 根据第二个数据集中的两个条件提取元数据的查询
SQL query to extract metadata based on two conditions in second dataset
我有两个数据集,格式如下:
m - acc | sra_study | bioproject |
tax - acc | tax_id | total_count
每行m
代表一个生物样本(acc
)。 tax
table 表示在每个生物样本 (acc
) 中发现了哪些生物体 (tax_id
) 以及它们被观察了多少次 (total_count
)。
tax_id
和 total_count
列值是整数,而其余列是字符串。
我想根据 tax_id=9606
是否与 total_count > 10000000
以及 tax_id=2
是否与 total_count>1000000
一起出现来过滤 m
中的行给定样本 (acc
).
我尝试使用以下 SQL 查询来做到这一点:
SELECT m.acc, m.sra_study, m.bioproject
FROM `nih-sra-datastore.sra.metadata` as m,
`nih-sra-datastore.sra_tax_analysis_tool.tax_analysis` as tax
WHERE m.acc = tax.acc
AND (tax.tax_id = 9606 AND tax.total_count > 10000000)
AND (tax.tax_id = 2 AND tax.total_count > 1000000)
但是,查询没有return任何结果。我怀疑这是因为我的 SQL 查询的语法有问题。
切换到显式 JOIN
语法。
OR
WHERE
子句条件。
如果你想要一个 m.acc 有 both tax_id 9606 and 2(在不同的行),做一个 GROUP BY
:
SELECT m.acc, m.sra_study, m.bioproject
FROM `nih-sra-datastore.sra.metadata` as m
JOIN `nih-sra-datastore.sra_tax_analysis_tool.tax_analysis` as tax
ON m.acc = tax.acc
WHERE (tax.tax_id = 9606 AND tax.total_count > 10000000)
OR (tax.tax_id = 2 AND tax.total_count > 1000000)
GROUP BY m.acc, m.sra_study, m.bioproject
HAVING COUNT(DISTINCT tax.tax_id) = 2
您在 where 子句中指定了看似互斥的条件
tax.tax_id = 9606 .. AND .. tax.tax_id = 2
修改喜欢
SELECT m.acc, m.sra_study, m.bioproject
FROM `nih-sra-datastore.sra.metadata` as m, `nih-sra-datastore.sra_tax_analysis_tool.tax_analysis` as tax
WHERE m.acc=tax.acc
AND ((tax.tax_id=9606 AND tax.total_count > 10000000)
OR
(tax.tax_id=2 AND tax.total_count>1000000)
)
我有两个数据集,格式如下:
m - acc | sra_study | bioproject |
tax - acc | tax_id | total_count
每行m
代表一个生物样本(acc
)。 tax
table 表示在每个生物样本 (acc
) 中发现了哪些生物体 (tax_id
) 以及它们被观察了多少次 (total_count
)。
tax_id
和 total_count
列值是整数,而其余列是字符串。
我想根据 tax_id=9606
是否与 total_count > 10000000
以及 tax_id=2
是否与 total_count>1000000
一起出现来过滤 m
中的行给定样本 (acc
).
我尝试使用以下 SQL 查询来做到这一点:
SELECT m.acc, m.sra_study, m.bioproject
FROM `nih-sra-datastore.sra.metadata` as m,
`nih-sra-datastore.sra_tax_analysis_tool.tax_analysis` as tax
WHERE m.acc = tax.acc
AND (tax.tax_id = 9606 AND tax.total_count > 10000000)
AND (tax.tax_id = 2 AND tax.total_count > 1000000)
但是,查询没有return任何结果。我怀疑这是因为我的 SQL 查询的语法有问题。
切换到显式 JOIN
语法。
OR
WHERE
子句条件。
如果你想要一个 m.acc 有 both tax_id 9606 and 2(在不同的行),做一个 GROUP BY
:
SELECT m.acc, m.sra_study, m.bioproject
FROM `nih-sra-datastore.sra.metadata` as m
JOIN `nih-sra-datastore.sra_tax_analysis_tool.tax_analysis` as tax
ON m.acc = tax.acc
WHERE (tax.tax_id = 9606 AND tax.total_count > 10000000)
OR (tax.tax_id = 2 AND tax.total_count > 1000000)
GROUP BY m.acc, m.sra_study, m.bioproject
HAVING COUNT(DISTINCT tax.tax_id) = 2
您在 where 子句中指定了看似互斥的条件
tax.tax_id = 9606 .. AND .. tax.tax_id = 2
修改喜欢
SELECT m.acc, m.sra_study, m.bioproject
FROM `nih-sra-datastore.sra.metadata` as m, `nih-sra-datastore.sra_tax_analysis_tool.tax_analysis` as tax
WHERE m.acc=tax.acc
AND ((tax.tax_id=9606 AND tax.total_count > 10000000)
OR
(tax.tax_id=2 AND tax.total_count>1000000)
)