聚合具有不规则时间戳的 SQL 函数
Aggregate SQL functions with irregular time stamps
我有一个 table,其中包括时间戳和河流流量。有时我有多个记录,有时我没有记录。
如何计算两个日期之间的平均流量和总流量?
假设两点之间的线性值为acceptable。也许是某种加权平均值。如果有一些最小二乘回归算法或类似的东西可以提供更准确的结果,那也很好。
编辑。对于给定的一天,我有以下虚构数据以供说明之用。如果可能的话,我想做的比假设平均值 146 更好,因为流量在持续时间较长的情况下很高,而实际平均值可能超过 200。
10/12/15 12:00 AM 100
10/12/15 12:01 AM 102
10/12/15 12:02 AM 104
10/12/15 12:03 AM 106
10/12/15 12:04 AM 200
10/12/15 10:00 PM 204
10/12/15 11:00 PM 208
Average 146
这样的方向通常应该是正确的方向:
SELECT AVG(dayflowRate) AS avgFlowRate, SUM(dayFlow) AS totalFlow
FROM (
SELECT DATE(theTS) AS theDate, AVG(flowRate) AS dayFlowRate
, AVG(flowRate) * (24*60*60) AS dayFlow
FROM theTable
WHERE theTS BETWEEN [beginTS] AND [endTS]
GROUP BY theDate
) AS dayQ
;
但是,它假定全天使用该 24 * 60 * 60
乘数(顺便说一下,为了清楚起见才进行扩展)。如果您需要更高的精度,您将需要查看 MIN 和 MAX 聚合以及 TIME_TO_SEC 函数。
我认为这个(下)可能更准确一些:
SELECT AVG(dayflowRate) AS avgFlowRate, SUM(dayFlow) AS totalFlow
FROM (
SELECT DATE(theTS) AS theDate, AVG(flowRate) AS dayFlowRate
, AVG(flowRate)
* ( TIME_TO_SEC(LEAST(MAX(theTS), [endTS])
- TIME_TO_SEC(GREATEST(MIN(theTS), [beginTS]))
)
AS dayFlow
FROM theTable
WHERE theTS BETWEEN [beginTS] AND [endTS]
GROUP BY theDate
) AS dayQ
;
编辑:或者可能不会,如果那天的测量是在上午 11 点和下午 1 点,即使是在多天的中间,dayFlow 实际上也只会是两个小时的流量。
这应该是最好的:
SELECT AVG(dayflowRate) AS avgFlowRate, SUM(dayFlow) AS totalFlow
FROM (
SELECT DATE(theTS) AS theDate, AVG(flowRate) AS dayFlowRate
, AVG(flowRate)
* ( IF(DATE(theTS)=DATE([endTS]), TIME_TO_SEC([endTS]), (24*60*60))
- IF(DATE(theTS)=DATE([beginTS]), TIME_TO_SEC([beginTS]), 0)
)
AS dayFlow
FROM theTable
WHERE theTS BETWEEN [beginTS] AND [endTS]
GROUP BY theDate
) AS dayQ
;
您需要进行加权平均。为此,您需要下一个时间戳:
select rf.*,
(select rf2.timestamp
from riverflow rf2
where rf2.timestamp > rf.timestamp
order by rf.timestamp asc
limit 1
) as nextTimestamp
from riverflow rf;
接下来是加权平均值。我不知道你想如何处理测量周期可能与给定日期不一致的问题。相反,我们将只取值并报告开始和结束观察时间:
select min(timestamp) as start, max(timestamp) as end,
(sum(riverflow * timestampdiff(second, timestamp, nexttimestamp) / (24*60*60)) /
(timestampdiff(second, min(timestamp), max(timestamp)) / (24*60*60)
) as avgRiverflow
from (select rf.*,
(select rf2.timestamp
from riverflow rf2
where rf2.timestamp > rf.timestamp
order by rf2.timestamp asc
limit 1
) as nextTimestamp
from riverflow rf
where timestamp >= $date1 and timestamp < $date2
) t;
我有一个 table,其中包括时间戳和河流流量。有时我有多个记录,有时我没有记录。
如何计算两个日期之间的平均流量和总流量?
假设两点之间的线性值为acceptable。也许是某种加权平均值。如果有一些最小二乘回归算法或类似的东西可以提供更准确的结果,那也很好。
编辑。对于给定的一天,我有以下虚构数据以供说明之用。如果可能的话,我想做的比假设平均值 146 更好,因为流量在持续时间较长的情况下很高,而实际平均值可能超过 200。
10/12/15 12:00 AM 100
10/12/15 12:01 AM 102
10/12/15 12:02 AM 104
10/12/15 12:03 AM 106
10/12/15 12:04 AM 200
10/12/15 10:00 PM 204
10/12/15 11:00 PM 208
Average 146
这样的方向通常应该是正确的方向:
SELECT AVG(dayflowRate) AS avgFlowRate, SUM(dayFlow) AS totalFlow
FROM (
SELECT DATE(theTS) AS theDate, AVG(flowRate) AS dayFlowRate
, AVG(flowRate) * (24*60*60) AS dayFlow
FROM theTable
WHERE theTS BETWEEN [beginTS] AND [endTS]
GROUP BY theDate
) AS dayQ
;
但是,它假定全天使用该 24 * 60 * 60
乘数(顺便说一下,为了清楚起见才进行扩展)。如果您需要更高的精度,您将需要查看 MIN 和 MAX 聚合以及 TIME_TO_SEC 函数。
我认为这个(下)可能更准确一些:
SELECT AVG(dayflowRate) AS avgFlowRate, SUM(dayFlow) AS totalFlow
FROM (
SELECT DATE(theTS) AS theDate, AVG(flowRate) AS dayFlowRate
, AVG(flowRate)
* ( TIME_TO_SEC(LEAST(MAX(theTS), [endTS])
- TIME_TO_SEC(GREATEST(MIN(theTS), [beginTS]))
)
AS dayFlow
FROM theTable
WHERE theTS BETWEEN [beginTS] AND [endTS]
GROUP BY theDate
) AS dayQ
;
编辑:或者可能不会,如果那天的测量是在上午 11 点和下午 1 点,即使是在多天的中间,dayFlow 实际上也只会是两个小时的流量。
这应该是最好的:
SELECT AVG(dayflowRate) AS avgFlowRate, SUM(dayFlow) AS totalFlow
FROM (
SELECT DATE(theTS) AS theDate, AVG(flowRate) AS dayFlowRate
, AVG(flowRate)
* ( IF(DATE(theTS)=DATE([endTS]), TIME_TO_SEC([endTS]), (24*60*60))
- IF(DATE(theTS)=DATE([beginTS]), TIME_TO_SEC([beginTS]), 0)
)
AS dayFlow
FROM theTable
WHERE theTS BETWEEN [beginTS] AND [endTS]
GROUP BY theDate
) AS dayQ
;
您需要进行加权平均。为此,您需要下一个时间戳:
select rf.*,
(select rf2.timestamp
from riverflow rf2
where rf2.timestamp > rf.timestamp
order by rf.timestamp asc
limit 1
) as nextTimestamp
from riverflow rf;
接下来是加权平均值。我不知道你想如何处理测量周期可能与给定日期不一致的问题。相反,我们将只取值并报告开始和结束观察时间:
select min(timestamp) as start, max(timestamp) as end,
(sum(riverflow * timestampdiff(second, timestamp, nexttimestamp) / (24*60*60)) /
(timestampdiff(second, min(timestamp), max(timestamp)) / (24*60*60)
) as avgRiverflow
from (select rf.*,
(select rf2.timestamp
from riverflow rf2
where rf2.timestamp > rf.timestamp
order by rf2.timestamp asc
limit 1
) as nextTimestamp
from riverflow rf
where timestamp >= $date1 and timestamp < $date2
) t;