如何在子查询之间填写日期?
How to fill dates in between in subquery?
我有 table 购买:
player_id | date | registration_date | price
pl1 | 2019-01-21 | 2019-01-20 | 20
pl1 | 2019-01-23 | 2019-01-20 | 10
pl1 | 2019-01-24 | 2019-01-20 | 15
在 'date' 上使用 groupArray 并在 'price' 上使用 arrayCumSum 并使用 ArrayJoin 进行计算后,我得到了 table 以及每天的累计总和:
player_id | date | registration_date | sum_price
pl1 | 2019-01-21 | 2019-01-20 | 20
pl1 | 2019-01-23 | 2019-01-20 | 30
pl1 | 2019-01-24 | 2019-01-20 | 45
但我需要添加从注册到今天(今天是'2019-01-25'
)遗漏的日期:
player_id | date | registration_date | sum_price
pl1 | 2019-01-20 | 2019-01-20 | 0
pl1 | 2019-01-21 | 2019-01-20 | 20
pl1 | 2019-01-22 | 2019-01-20 | 20
pl1 | 2019-01-23 | 2019-01-20 | 30
pl1 | 2019-01-24 | 2019-01-20 | 45
pl1 | 2019-01-25 | 2019-01-20 | 45
我该怎么做?
试试这个:
SELECT player_id, result.1 as date, registrationDate as registration_date, result.2 as sum_price
FROM
(
SELECT
player_id,
groupArray((date, price)) AS purchases,
min(registration_date) AS registrationDate,
arrayMap(x -> registrationDate + x, range(toUInt32(toDate('2019-01-25') - registrationDate + 1))) dates,
arrayFilter(x -> arrayFirstIndex(p -> p.1 = x, purchases) = 0, dates) AS missed_dates,
arrayMap(x -> (x, 0), missed_dates) AS dummy_purchases,
arraySort(x -> x.1, arrayConcat(purchases, dummy_purchases)) all_purchases,
arrayCumSum(x -> x.2, all_purchases) cum_prices,
arrayMap(index -> (all_purchases[index].1, cum_prices[index]), arrayEnumerate(all_purchases)) flat_result,
arrayJoin(flat_result) result
FROM test.purchases01
GROUP BY player_id
)
/* result
┌─player_id─┬───────date─┬─registration_date─┬─sum_price─┐
│ pl1 │ 2019-01-20 │ 2019-01-20 │ 0 │
│ pl1 │ 2019-01-21 │ 2019-01-20 │ 20 │
│ pl1 │ 2019-01-22 │ 2019-01-20 │ 20 │
│ pl1 │ 2019-01-23 │ 2019-01-20 │ 30 │
│ pl1 │ 2019-01-24 │ 2019-01-20 │ 45 │
│ pl1 │ 2019-01-25 │ 2019-01-20 │ 45 │
└───────────┴────────────┴───────────────────┴───────────┘
*/
/* Prepare test data */
CREATE TABLE test.purchases01
(
`player_id` String,
`date` Date,
`registration_date` Date,
`price` int
)
ENGINE = Memory;
INSERT INTO test.purchases01
VALUES ('pl1', '2019-01-21', '2019-01-20', 20),
('pl1', '2019-01-23', '2019-01-20', 10),
('pl1', '2019-01-24', '2019-01-20', 15);
我有 table 购买:
player_id | date | registration_date | price
pl1 | 2019-01-21 | 2019-01-20 | 20
pl1 | 2019-01-23 | 2019-01-20 | 10
pl1 | 2019-01-24 | 2019-01-20 | 15
在 'date' 上使用 groupArray 并在 'price' 上使用 arrayCumSum 并使用 ArrayJoin 进行计算后,我得到了 table 以及每天的累计总和:
player_id | date | registration_date | sum_price
pl1 | 2019-01-21 | 2019-01-20 | 20
pl1 | 2019-01-23 | 2019-01-20 | 30
pl1 | 2019-01-24 | 2019-01-20 | 45
但我需要添加从注册到今天(今天是'2019-01-25'
)遗漏的日期:
player_id | date | registration_date | sum_price
pl1 | 2019-01-20 | 2019-01-20 | 0
pl1 | 2019-01-21 | 2019-01-20 | 20
pl1 | 2019-01-22 | 2019-01-20 | 20
pl1 | 2019-01-23 | 2019-01-20 | 30
pl1 | 2019-01-24 | 2019-01-20 | 45
pl1 | 2019-01-25 | 2019-01-20 | 45
我该怎么做?
试试这个:
SELECT player_id, result.1 as date, registrationDate as registration_date, result.2 as sum_price
FROM
(
SELECT
player_id,
groupArray((date, price)) AS purchases,
min(registration_date) AS registrationDate,
arrayMap(x -> registrationDate + x, range(toUInt32(toDate('2019-01-25') - registrationDate + 1))) dates,
arrayFilter(x -> arrayFirstIndex(p -> p.1 = x, purchases) = 0, dates) AS missed_dates,
arrayMap(x -> (x, 0), missed_dates) AS dummy_purchases,
arraySort(x -> x.1, arrayConcat(purchases, dummy_purchases)) all_purchases,
arrayCumSum(x -> x.2, all_purchases) cum_prices,
arrayMap(index -> (all_purchases[index].1, cum_prices[index]), arrayEnumerate(all_purchases)) flat_result,
arrayJoin(flat_result) result
FROM test.purchases01
GROUP BY player_id
)
/* result
┌─player_id─┬───────date─┬─registration_date─┬─sum_price─┐
│ pl1 │ 2019-01-20 │ 2019-01-20 │ 0 │
│ pl1 │ 2019-01-21 │ 2019-01-20 │ 20 │
│ pl1 │ 2019-01-22 │ 2019-01-20 │ 20 │
│ pl1 │ 2019-01-23 │ 2019-01-20 │ 30 │
│ pl1 │ 2019-01-24 │ 2019-01-20 │ 45 │
│ pl1 │ 2019-01-25 │ 2019-01-20 │ 45 │
└───────────┴────────────┴───────────────────┴───────────┘
*/
/* Prepare test data */
CREATE TABLE test.purchases01
(
`player_id` String,
`date` Date,
`registration_date` Date,
`price` int
)
ENGINE = Memory;
INSERT INTO test.purchases01
VALUES ('pl1', '2019-01-21', '2019-01-20', 20),
('pl1', '2019-01-23', '2019-01-20', 10),
('pl1', '2019-01-24', '2019-01-20', 15);