InfluxDB 对整个时间序列数据的连续查询 运行
InfluxDB Continuous Query running on entire time series data
如果我的解释是正确的,根据此处提供的文档:InfluxDB Downsampling 当我们每 30 分钟使用 Continuous Query
运行 对数据进行下采样时,它仅运行前 30 分钟的数据。
文档的相关部分:
Use the CREATE CONTINUOUS QUERY statement to generate a CQ:
CREATE CONTINUOUS QUERY "cq_30m" ON "food_data" BEGIN
SELECT mean("website") AS "mean_website",mean("phone") AS "mean_phone"
INTO "a_year"."downsampled_orders"
FROM "orders"
GROUP BY time(30m)
END
That query creates a CQ called cq_30m in the database food_data.
cq_30m tells InfluxDB to calculate the 30-minute average of the two
fields website and phone in the measurement orders and in the DEFAULT
RP two_hours. It also tells InfluxDB to write those results to the
measurement downsampled_orders in the retention policy a_year with the
field keys mean_website and mean_phone. InfluxDB will run this query
every 30 minutes for the previous 30 minutes.
当我创建 Continuous Query
时,它实际上在整个数据集上运行,而不是在前 30 分钟上运行。我的问题是,这是否只是第一次发生,之后它在前 30 分钟的数据而不是整个数据集上运行?
我知道查询本身使用 GROUP BY time(30m)
,这意味着它将 return 所有数据组合在一起,但这是否也适用于 Continuous Query
?如果是这样,我是否应该包含一个 filter
以仅处理 Continuous Query
中最后 30 分钟的数据?
您所描述的是预期的功能。
Schedule and coverage
Continuous queries operate on real-time data. They use the local server’s timestamp, the GROUP BY time() interval, and InfluxDB database’s preset time boundaries to determine when to execute and what time range to cover in the query.
CQs execute at the same interval as the cq_query’s GROUP BY time() interval, and they run at the start of the InfluxDB database’s preset time boundaries. If the GROUP BY time() interval is one hour, the CQ executes at the start of every hour.
When the CQ executes, it runs a single query for the time range between now() and now() minus the GROUP BY time() interval. If the GROUP BY time() interval is one hour and the current time is 17:00, the query’s time range is between 16:00 and 16:59.999999999.
所以它应该只处理最后 30 分钟。
关于第一个 运行 的观点很好。
我确实设法从旧文档中找到了一个片段
Backfilling Data
In the event that the source time series already has data in it when you create a new downsampled continuous query, InfluxDB will go back in time and calculate the values for all intervals up to the present. The continuous query will then continue running in the background for all current and future intervals.
这可以解释您发现的行为
如果我的解释是正确的,根据此处提供的文档:InfluxDB Downsampling 当我们每 30 分钟使用 Continuous Query
运行 对数据进行下采样时,它仅运行前 30 分钟的数据。
文档的相关部分:
Use the CREATE CONTINUOUS QUERY statement to generate a CQ:
CREATE CONTINUOUS QUERY "cq_30m" ON "food_data" BEGIN
SELECT mean("website") AS "mean_website",mean("phone") AS "mean_phone"
INTO "a_year"."downsampled_orders"
FROM "orders"
GROUP BY time(30m)
END
That query creates a CQ called cq_30m in the database food_data. cq_30m tells InfluxDB to calculate the 30-minute average of the two fields website and phone in the measurement orders and in the DEFAULT RP two_hours. It also tells InfluxDB to write those results to the measurement downsampled_orders in the retention policy a_year with the field keys mean_website and mean_phone. InfluxDB will run this query every 30 minutes for the previous 30 minutes.
当我创建 Continuous Query
时,它实际上在整个数据集上运行,而不是在前 30 分钟上运行。我的问题是,这是否只是第一次发生,之后它在前 30 分钟的数据而不是整个数据集上运行?
我知道查询本身使用 GROUP BY time(30m)
,这意味着它将 return 所有数据组合在一起,但这是否也适用于 Continuous Query
?如果是这样,我是否应该包含一个 filter
以仅处理 Continuous Query
中最后 30 分钟的数据?
您所描述的是预期的功能。
Schedule and coverage Continuous queries operate on real-time data. They use the local server’s timestamp, the GROUP BY time() interval, and InfluxDB database’s preset time boundaries to determine when to execute and what time range to cover in the query.
CQs execute at the same interval as the cq_query’s GROUP BY time() interval, and they run at the start of the InfluxDB database’s preset time boundaries. If the GROUP BY time() interval is one hour, the CQ executes at the start of every hour.
When the CQ executes, it runs a single query for the time range between now() and now() minus the GROUP BY time() interval. If the GROUP BY time() interval is one hour and the current time is 17:00, the query’s time range is between 16:00 and 16:59.999999999.
所以它应该只处理最后 30 分钟。
关于第一个 运行 的观点很好。
我确实设法从旧文档中找到了一个片段
Backfilling Data In the event that the source time series already has data in it when you create a new downsampled continuous query, InfluxDB will go back in time and calculate the values for all intervals up to the present. The continuous query will then continue running in the background for all current and future intervals.
这可以解释您发现的行为