InfluxDB 对整个时间序列数据的连续查询运行

Question

如果我的解释是正确的，根据此处提供的文档：InfluxDB Downsampling 当我们每 30 分钟使用 Continuous Query 运行对数据进行下采样时，它仅运行前 30 分钟的数据。

文档的相关部分：

Use the CREATE CONTINUOUS QUERY statement to generate a CQ:

 CREATE CONTINUOUS QUERY "cq_30m" ON "food_data" BEGIN
  SELECT mean("website") AS "mean_website",mean("phone") AS "mean_phone"
  INTO "a_year"."downsampled_orders"
  FROM "orders"
  GROUP BY time(30m)
END

That query creates a CQ called cq_30m in the database food_data. cq_30m tells InfluxDB to calculate the 30-minute average of the two fields website and phone in the measurement orders and in the DEFAULT RP two_hours. It also tells InfluxDB to write those results to the measurement downsampled_orders in the retention policy a_year with the field keys mean_website and mean_phone. InfluxDB will run this query every 30 minutes for the previous 30 minutes.

当我创建 Continuous Query 时，它实际上在整个数据集上运行，而不是在前 30 分钟上运行。我的问题是，这是否只是第一次发生，之后它在前 30 分钟的数据而不是整个数据集上运行？

我知道查询本身使用 GROUP BY time(30m)，这意味着它将 return 所有数据组合在一起，但这是否也适用于 Continuous Query？如果是这样，我是否应该包含一个 filter 以仅处理 Continuous Query 中最后 30 分钟的数据？

Answer 1

您所描述的是预期的功能。

Schedule and coverage Continuous queries operate on real-time data. They use the local server’s timestamp, the GROUP BY time() interval, and InfluxDB database’s preset time boundaries to determine when to execute and what time range to cover in the query.

CQs execute at the same interval as the cq_query’s GROUP BY time() interval, and they run at the start of the InfluxDB database’s preset time boundaries. If the GROUP BY time() interval is one hour, the CQ executes at the start of every hour.

When the CQ executes, it runs a single query for the time range between now() and now() minus the GROUP BY time() interval. If the GROUP BY time() interval is one hour and the current time is 17:00, the query’s time range is between 16:00 and 16:59.999999999.

所以它应该只处理最后 30 分钟。

关于第一个运行的观点很好。

我确实设法从旧文档中找到了一个片段

Backfilling Data In the event that the source time series already has data in it when you create a new downsampled continuous query, InfluxDB will go back in time and calculate the values for all intervals up to the present. The continuous query will then continue running in the background for all current and future intervals.

https://influxdbcom.readthedocs.io/en/latest/content/docs/v0.8/api/continuous_queries/#backfilling-data

这可以解释您发现的行为

InfluxDB 对整个时间序列数据的连续查询运行

InfluxDB Continuous Query running on entire time series data

time-series

influxdb

grafana

InfluxDB 对整个时间序列数据的连续查询 运行

InfluxDB Continuous Query running on entire time series data

time-series

influxdb

grafana

InfluxDB 对整个时间序列数据的连续查询运行