Google BigQuery 随时间聚合 OHLC 数据 window
Google BigQuery aggregate OHLC data over time window
google 的 BigQuery 存储了时间序列交易历史记录。
# Transaction history scheme
exchange_id INTEGER REQUIRED
from_id INTEGER REQUIRED
to_id INTEGER REQUIRED
price FLOAT REQUIRED
size FLOAT REQUIRED
ts TIMESTAMP REQUIRED
is_sell BOOLEAN NULLABLE
_PARTITIONTIME TIMESTAMP NULLABLE
exchange_id - 交易发生的平台
from_id - 基本符号
to_id - 引号
价格 - 交易价格
大小 - 交易数量
我需要聚合 OHLC 超过 30 秒时间间隔的数据
exchange_id, from_id, to_id
。我如何在 BigQuery 中执行此操作?
# Required OHLC aggregated data scheme
ts TIMESTAMP REQUIRED
exchange_id INTEGER REQUIRED
from_id INTEGER REQUIRED
to_id INTEGER REQUIRED
open FLOAT REQUIRED
high FLOAT REQUIRED
low FLOAT REQUIRED
close FLOAT REQUIRED
volume FLOAT REQUIRED
_PARTITIONTIME TIMESTAMP NULLABLE
open - 间隔
中的第一个价格
高 - 最高价..
低价 - 最低价..
关闭 - 最后价格..
交易量 - 当前区间内所有交易规模的总和
最有前途的想法是:
SELECT
TIMESTAMP_SECONDS(
UNIX_SECONDS(ts) -
60 * 1000000
) AS time,
exchange_id,
from_id,
to_id,
MIN(price) as low,
MAX(price) as high,
SUM(size) as volume
FROM
`table`
GROUP BY
time, exchange_id, from_id, to_id
ORDER BY
time
还有这个:
SELECT
exchange_id,from_id,to_id,
MAX(price) OVER (PARTITION BY exchange_id,from_id,to_id ORDER BY ts RANGE BETWEEN 60 * 1000000 PRECEDING AND CURRENT ROW) as high,
MIN(price) OVER (PARTITION BY exchange_id,from_id,to_id ORDER BY ts RANGE BETWEEN 60 * 1000000 PRECEDING AND CURRENT ROW) as low,
SUM(size) OVER (PARTITION BY exchange_id,from_id,to_id ORDER BY ts RANGE BETWEEN 60 * 1000000 PRECEDING AND CURRENT ROW) as volume,
FROM [table];
# returns:
1 1 4445 3808 9.0E-8 9.0E-8 300000.0
2 1 4445 3808 9.0E-8 9.0E-8 300000.0
3 1 4445 3808 9.0E-8 9.0E-8 300000.0
...
14 1 4445 3808 9.0E-8 9.0E-8 865939.3721800799
15 1 4445 3808 9.0E-8 9.0E-8 865939.3721800799
16 1 4445 3808 9.0E-8 9.0E-8 865939.3721800799
但这一切都不起作用。看来我遗漏了一些关于在 BigQuery 中滑动 window 的重要信息。
以下适用于 BigQuery 标准 SQL
#standardsql
SELECT
exchange_id,
from_id,
to_id,
TIMESTAMP_SECONDS(DIV(UNIX_SECONDS(ts), 30) * 30) time,
ARRAY_AGG(price ORDER BY ts LIMIT 1)[SAFE_OFFSET(0)] open,
MAX(price) high,
MIN(price) low,
ARRAY_AGG(price ORDER BY ts DESC LIMIT 1)[SAFE_OFFSET(0)] close,
SUM(size) volume
FROM `yourproject.yourdataset.yourtable`
GROUP BY 1, 2, 3, 4
找到了一种在预定义 date_parts
(docs) 上进行聚合的优雅方法。当您需要在星期一或几个月进行汇总时,它非常有用。
DATETIME_TRUNC支持下一个参数:
MICROSECOND
MILLISECOND
SECOND
MINUTE
HOUR
DAY
WEEK
WEEK(<WEEKDAY>)
MONTH
QUARTER
YEAR
您可以像这样聚合使用它:
#standardsql
SELECT
TIMESTAMP(DATETIME_TRUNC(DATETIME(timestamp), DAY)) as timestamp,
ARRAY_AGG(open ORDER BY timestamp LIMIT 1)[SAFE_OFFSET(0)] open,
MAX(high) high,
MIN(low) low,
ARRAY_AGG(close ORDER BY timestamp DESC LIMIT 1)[SAFE_OFFSET(0)] close,
SUM(volume) volume
FROM `hcmc-project.test_bitfinex.BTC_USD__1h`
GROUP BY timestamp
ORDER BY timestamp ASC
google 的 BigQuery 存储了时间序列交易历史记录。
# Transaction history scheme
exchange_id INTEGER REQUIRED
from_id INTEGER REQUIRED
to_id INTEGER REQUIRED
price FLOAT REQUIRED
size FLOAT REQUIRED
ts TIMESTAMP REQUIRED
is_sell BOOLEAN NULLABLE
_PARTITIONTIME TIMESTAMP NULLABLE
exchange_id - 交易发生的平台
from_id - 基本符号
to_id - 引号
价格 - 交易价格
大小 - 交易数量
我需要聚合 OHLC 超过 30 秒时间间隔的数据
exchange_id, from_id, to_id
。我如何在 BigQuery 中执行此操作?
# Required OHLC aggregated data scheme
ts TIMESTAMP REQUIRED
exchange_id INTEGER REQUIRED
from_id INTEGER REQUIRED
to_id INTEGER REQUIRED
open FLOAT REQUIRED
high FLOAT REQUIRED
low FLOAT REQUIRED
close FLOAT REQUIRED
volume FLOAT REQUIRED
_PARTITIONTIME TIMESTAMP NULLABLE
open - 间隔
中的第一个价格
高 - 最高价..
低价 - 最低价..
关闭 - 最后价格..
交易量 - 当前区间内所有交易规模的总和
最有前途的想法是:
SELECT
TIMESTAMP_SECONDS(
UNIX_SECONDS(ts) -
60 * 1000000
) AS time,
exchange_id,
from_id,
to_id,
MIN(price) as low,
MAX(price) as high,
SUM(size) as volume
FROM
`table`
GROUP BY
time, exchange_id, from_id, to_id
ORDER BY
time
还有这个:
SELECT
exchange_id,from_id,to_id,
MAX(price) OVER (PARTITION BY exchange_id,from_id,to_id ORDER BY ts RANGE BETWEEN 60 * 1000000 PRECEDING AND CURRENT ROW) as high,
MIN(price) OVER (PARTITION BY exchange_id,from_id,to_id ORDER BY ts RANGE BETWEEN 60 * 1000000 PRECEDING AND CURRENT ROW) as low,
SUM(size) OVER (PARTITION BY exchange_id,from_id,to_id ORDER BY ts RANGE BETWEEN 60 * 1000000 PRECEDING AND CURRENT ROW) as volume,
FROM [table];
# returns:
1 1 4445 3808 9.0E-8 9.0E-8 300000.0
2 1 4445 3808 9.0E-8 9.0E-8 300000.0
3 1 4445 3808 9.0E-8 9.0E-8 300000.0
...
14 1 4445 3808 9.0E-8 9.0E-8 865939.3721800799
15 1 4445 3808 9.0E-8 9.0E-8 865939.3721800799
16 1 4445 3808 9.0E-8 9.0E-8 865939.3721800799
但这一切都不起作用。看来我遗漏了一些关于在 BigQuery 中滑动 window 的重要信息。
以下适用于 BigQuery 标准 SQL
#standardsql
SELECT
exchange_id,
from_id,
to_id,
TIMESTAMP_SECONDS(DIV(UNIX_SECONDS(ts), 30) * 30) time,
ARRAY_AGG(price ORDER BY ts LIMIT 1)[SAFE_OFFSET(0)] open,
MAX(price) high,
MIN(price) low,
ARRAY_AGG(price ORDER BY ts DESC LIMIT 1)[SAFE_OFFSET(0)] close,
SUM(size) volume
FROM `yourproject.yourdataset.yourtable`
GROUP BY 1, 2, 3, 4
找到了一种在预定义 date_parts
(docs) 上进行聚合的优雅方法。当您需要在星期一或几个月进行汇总时,它非常有用。
DATETIME_TRUNC支持下一个参数:
MICROSECOND
MILLISECOND
SECOND
MINUTE
HOUR
DAY
WEEK
WEEK(<WEEKDAY>)
MONTH
QUARTER
YEAR
您可以像这样聚合使用它:
#standardsql
SELECT
TIMESTAMP(DATETIME_TRUNC(DATETIME(timestamp), DAY)) as timestamp,
ARRAY_AGG(open ORDER BY timestamp LIMIT 1)[SAFE_OFFSET(0)] open,
MAX(high) high,
MIN(low) low,
ARRAY_AGG(close ORDER BY timestamp DESC LIMIT 1)[SAFE_OFFSET(0)] close,
SUM(volume) volume
FROM `hcmc-project.test_bitfinex.BTC_USD__1h`
GROUP BY timestamp
ORDER BY timestamp ASC