计算每一行的频率
Calculate frequency for each row
我正在尝试计算每行中我的元素的频率,我将解释:
我 select 来自 table 包含一些元素,例如 "pos,chr,ref,alt,id_disease".
从这些我必须提取我的 ref 的频率,alt 即:
num_occurrencies_of(ref='A' and alt='C')/total number of rows
.
通过这个查询,我几乎没有接近我的 objective,事实上 id 没有正确计算频率它 returns 总是 a constant
SELECT pos, chr, upper(ref||' '||alt) AS refalt, id_disease AS lvl15, t1.tot_var, t1.freq
FROM varianti
JOIN ( SELECT count(*) AS tot_var,(count(*)::numeric / sum(count(*)) over ()) as freq
FROM varianti)t1 ON TRUE
WHERE length(ref)=1 AND length(alt)=1 AND chr similar to 'chr[\d X Y]*'
我只想像这样检索数据:
chr pos refalt lvl15 freq tot_var
1 120 AT 15 0.3 1000
1 150 CG 30 0.01 1000
tot_var = 计算我需要的行总数(它不能是 1,我计算每一行!)
ref 和 alt 都可以在每个可能的排列中具有这些值 (A、T、C、G),AA、AT、TA、TC、CT 等。
我的代码中缺少什么?
如果您想了解更多信息,请告诉我
变量示例:
chr pos ref alt id_disease
chr1 152 A C 15
chr3 487 T T 74
这是我的查询的输出:
pos chr refalt lvl15 tot_var freq
124338543 chr11 G A 69 1 0.000000677833751782702767
124338595 chr11 C T 28 1 0.000000677833751782702767
124361862 chr11 C . 53 1 0.000000677833751782702767
124361899 chr11 T A 20 1 0.000000677833751782702767
根据您提供的信息
SELECT DISTINCT chr, pos,
upper(ref||' '||alt) AS refalt, id_disease AS lvl15,
SUM(CASE WHEN (ref == 'A' AND alt == 'C')THEN 1 ELSE 0 END)/COUNT(*) AS 'freq',
COUNT(*) AS 'tot_var'
FROM varianti
我仍然不确定 'tot_var' 是什么。获取实际数据样本以及该数据样本本身的预期输出会很有用。
编辑 1:获取数据集中每对夫妇的频率
SELECT DISTINCT upper(ref||' '||alt) AS refalt,
COUNT(chr)/COUNT(*) AS 'freq'
FROM varianti
GROUP BY refalt
编辑 2:根据要求更新了查询
SELECT varianti.chr, varianti.pos,
upper(varianti.ref||' '||varianti.alt) AS refalt, varianti.id_disease AS lvl15, COUNT(*) AS 'tot_var',
FROM varianti
JOIN
( SELECT DISTINCT upper(ref||' '||alt) AS refalt,
COUNT(chr)/COUNT(*) AS 'freq'
FROM varianti
GROUP BY refalt
) refalt_table ON refalt_table.refalt = varianti.refalt
编辑 3:根据错误更新了查询
SELECT chr, pos, upper(ref||' '||alt) as refalt, id_disease AS lvl15, refalt_table.freq as 'freq', (SELECT COUNT(*) FROM varianti tot where tot.pos = v.pos) as 'tot_var'
FROM varianti v
LEFT JOIN
( SELECT DISTINCT UPPER(ref) as 'ref',UPPER(alt) as 'alt',
COUNT(pos)/(SELECT COUNT(*) FROM varianti vcount) AS 'freq'
FROM varianti
GROUP BY ref,alt
) refalt_table ON refalt_table.ref = v.ref and refalt_table.alt = v.alt
我正在尝试计算每行中我的元素的频率,我将解释: 我 select 来自 table 包含一些元素,例如 "pos,chr,ref,alt,id_disease".
从这些我必须提取我的 ref 的频率,alt 即:
num_occurrencies_of(ref='A' and alt='C')/total number of rows
.
通过这个查询,我几乎没有接近我的 objective,事实上 id 没有正确计算频率它 returns 总是 a constant
SELECT pos, chr, upper(ref||' '||alt) AS refalt, id_disease AS lvl15, t1.tot_var, t1.freq
FROM varianti
JOIN ( SELECT count(*) AS tot_var,(count(*)::numeric / sum(count(*)) over ()) as freq
FROM varianti)t1 ON TRUE
WHERE length(ref)=1 AND length(alt)=1 AND chr similar to 'chr[\d X Y]*'
我只想像这样检索数据:
chr pos refalt lvl15 freq tot_var
1 120 AT 15 0.3 1000
1 150 CG 30 0.01 1000
tot_var = 计算我需要的行总数(它不能是 1,我计算每一行!)
ref 和 alt 都可以在每个可能的排列中具有这些值 (A、T、C、G),AA、AT、TA、TC、CT 等。
我的代码中缺少什么?
如果您想了解更多信息,请告诉我
变量示例:
chr pos ref alt id_disease
chr1 152 A C 15
chr3 487 T T 74
这是我的查询的输出:
pos chr refalt lvl15 tot_var freq
124338543 chr11 G A 69 1 0.000000677833751782702767
124338595 chr11 C T 28 1 0.000000677833751782702767
124361862 chr11 C . 53 1 0.000000677833751782702767
124361899 chr11 T A 20 1 0.000000677833751782702767
根据您提供的信息
SELECT DISTINCT chr, pos,
upper(ref||' '||alt) AS refalt, id_disease AS lvl15,
SUM(CASE WHEN (ref == 'A' AND alt == 'C')THEN 1 ELSE 0 END)/COUNT(*) AS 'freq',
COUNT(*) AS 'tot_var'
FROM varianti
我仍然不确定 'tot_var' 是什么。获取实际数据样本以及该数据样本本身的预期输出会很有用。
编辑 1:获取数据集中每对夫妇的频率
SELECT DISTINCT upper(ref||' '||alt) AS refalt,
COUNT(chr)/COUNT(*) AS 'freq'
FROM varianti
GROUP BY refalt
编辑 2:根据要求更新了查询
SELECT varianti.chr, varianti.pos,
upper(varianti.ref||' '||varianti.alt) AS refalt, varianti.id_disease AS lvl15, COUNT(*) AS 'tot_var',
FROM varianti
JOIN
( SELECT DISTINCT upper(ref||' '||alt) AS refalt,
COUNT(chr)/COUNT(*) AS 'freq'
FROM varianti
GROUP BY refalt
) refalt_table ON refalt_table.refalt = varianti.refalt
编辑 3:根据错误更新了查询
SELECT chr, pos, upper(ref||' '||alt) as refalt, id_disease AS lvl15, refalt_table.freq as 'freq', (SELECT COUNT(*) FROM varianti tot where tot.pos = v.pos) as 'tot_var'
FROM varianti v
LEFT JOIN
( SELECT DISTINCT UPPER(ref) as 'ref',UPPER(alt) as 'alt',
COUNT(pos)/(SELECT COUNT(*) FROM varianti vcount) AS 'freq'
FROM varianti
GROUP BY ref,alt
) refalt_table ON refalt_table.ref = v.ref and refalt_table.alt = v.alt