SQL 根据第二个数据集中的两个条件提取元数据的查询

SQL query to extract metadata based on two conditions in second dataset

我有两个数据集,格式如下:

m -  acc | sra_study | bioproject | 
tax - acc | tax_id | total_count 

每行m代表一个生物样本(acc)。 tax table 表示在每个生物样本 (acc) 中发现了哪些生物体 (tax_id) 以及它们被观察了多少次 (total_count)。 tax_idtotal_count 列值是整数,而其余列是字符串。

我想根据 tax_id=9606 是否与 total_count > 10000000 以及 tax_id=2 是否与 total_count>1000000 一起出现来过滤 m 中的行给定样本 (acc).

我尝试使用以下 SQL 查询来做到这一点:

SELECT m.acc, m.sra_study, m.bioproject
FROM `nih-sra-datastore.sra.metadata` as m,
     `nih-sra-datastore.sra_tax_analysis_tool.tax_analysis` as tax
WHERE m.acc = tax.acc
  AND (tax.tax_id = 9606 AND tax.total_count > 10000000)
  AND (tax.tax_id = 2 AND tax.total_count > 1000000)   

但是,查询没有return任何结果。我怀疑这是因为我的 SQL 查询的语法有问题。

切换到显式 JOIN 语法。

OR WHERE 子句条件。

如果你想要一个 m.acc 有 both tax_id 9606 and 2(在不同的行),做一个 GROUP BY:

SELECT m.acc, m.sra_study, m.bioproject
FROM `nih-sra-datastore.sra.metadata` as m
JOIN `nih-sra-datastore.sra_tax_analysis_tool.tax_analysis` as tax
    ON m.acc = tax.acc
WHERE (tax.tax_id = 9606 AND tax.total_count > 10000000)
   OR (tax.tax_id = 2    AND tax.total_count > 1000000)   
GROUP BY m.acc, m.sra_study, m.bioproject
HAVING COUNT(DISTINCT tax.tax_id) = 2

您在 where 子句中指定了看似互斥的条件 tax.tax_id = 9606 .. AND .. tax.tax_id = 2

修改喜欢

SELECT m.acc, m.sra_study, m.bioproject
FROM `nih-sra-datastore.sra.metadata` as m, `nih-sra-datastore.sra_tax_analysis_tool.tax_analysis` as tax
WHERE m.acc=tax.acc 
  AND ((tax.tax_id=9606 AND tax.total_count > 10000000) 
       OR 
       (tax.tax_id=2 AND tax.total_count>1000000)
      )