使用 Postgresql 进行 RFM 分析
RFM analysis with Postgresql
我正在尝试使用 Postgresql 查询创建 RFM 分析。但是,我还没有完全完成对 Recency 维度的查询。
本文启发的查询
"https://cooldata.wordpress.com/2014/03/25/an-all-sql-way-to-automate-rfm-scoring/ "
Recency 维度的标准是
- 最后一个订单在 2 个月内 = 5
- 最后一个订单在 4 个月内 = 4
- 6 个月内的最后一个订单 = 3
- 最后一个订单在 8 个月内 = 2
- 最后一个订单在 10 个月内 = 1
以下是我一直试图完成的查询
WITH rfm AS
(SELECT email,
SUM((total_incl_tax)) AS cash,
MAX(decode(order_order.order_date, 2016-01-01, 5, 2016-02-01, 4, 2016-03-01, 3, 2016-04-01, 2, 201605-01, 1)) AS recency,
COUNT(DISTINCT(order_date)) AS frequency
FROM order_order
GROUP BY email)
SELECT rfm.email,
CASE
WHEN rfm.cash >= 2000000 THEN 5
WHEN rfm.cash > 1500000 THEN 4
WHEN rfm.cash > 1000000 THEN 3
WHEN rfm.cash > 500000 THEN 2
WHEN rfm.frequency > 4 THEN 5
WHEN rfm.frequency = 4 THEN 4
WHEN rfm.frequency = 3 THEN 3
WHEN rfm.frequency = 2 THEN 2
WHEN rfm.frequency = 1 THEN 1
else 1
END + rfm.frequency AS rfm_score
--+ Five_years.recency
FROM rfm
GROUP BY rfm.email, rfm.cash,rfm.frequency
ORDER BY rfm.email
错误是:
ERROR: function decode(timestamp with time zone, integer, integer, integer, integer, integer, integer, integer, integer, integer, integer) does not exist Hint: No function matches the given name and argument types. You might need to add explicit type casts. Position: 186
我认为错误出在这一行
MAX(decode(order_order.order_date, 2016-01-01, 5, 2016-02-01, 4, 2016-03-01, 3, 2016-04-01, 2, 2016-05-01, 1)) AS recency
是否有任何建议可以将错误行修改为新近度维度的标准?谢谢
Postgres 中没有 decode()
函数。您可以将其替换为另一个 CASE
语句:
WITH rfm AS
(
SELECT email,
SUM((total_incl_tax)) AS cash,
MAX(
CASE
WHEN order_order.order_date = '2016-01-01' THEN 5
WHEN order_order.order_date = '2016-02-01' THEN 4
WHEN order_order.order_date = '2016-03-01' THEN 3
WHEN order_order.order_date = '2016-04-01' THEN 2
WHEN order_order.order_date = '2016-05-01' THEN 1
END
) as recency,
COUNT(DISTINCT(order_date)) AS frequency
FROM order_order
GROUP BY email
)
SELECT rfm.email,
CASE
WHEN rfm.cash >= 2000000 THEN 5
WHEN rfm.cash > 1500000 THEN 4
WHEN rfm.cash > 1000000 THEN 3
WHEN rfm.cash > 500000 THEN 2
WHEN rfm.frequency > 4 THEN 5
WHEN rfm.frequency = 4 THEN 4
WHEN rfm.frequency = 3 THEN 3
WHEN rfm.frequency = 2 THEN 2
WHEN rfm.frequency = 1 THEN 1
else 1
END + rfm.frequency + rfm.recency AS rfm_score
FROM rfm
GROUP BY rfm.email, rfm.cash,rfm.frequency
ORDER BY rfm.email
进一步阅读:Decode equivalent in postgres
我正在尝试使用 Postgresql 查询创建 RFM 分析。但是,我还没有完全完成对 Recency 维度的查询。
本文启发的查询
"https://cooldata.wordpress.com/2014/03/25/an-all-sql-way-to-automate-rfm-scoring/ "
Recency 维度的标准是
- 最后一个订单在 2 个月内 = 5
- 最后一个订单在 4 个月内 = 4
- 6 个月内的最后一个订单 = 3
- 最后一个订单在 8 个月内 = 2
- 最后一个订单在 10 个月内 = 1
以下是我一直试图完成的查询
WITH rfm AS
(SELECT email,
SUM((total_incl_tax)) AS cash,
MAX(decode(order_order.order_date, 2016-01-01, 5, 2016-02-01, 4, 2016-03-01, 3, 2016-04-01, 2, 201605-01, 1)) AS recency,
COUNT(DISTINCT(order_date)) AS frequency
FROM order_order
GROUP BY email)
SELECT rfm.email,
CASE
WHEN rfm.cash >= 2000000 THEN 5
WHEN rfm.cash > 1500000 THEN 4
WHEN rfm.cash > 1000000 THEN 3
WHEN rfm.cash > 500000 THEN 2
WHEN rfm.frequency > 4 THEN 5
WHEN rfm.frequency = 4 THEN 4
WHEN rfm.frequency = 3 THEN 3
WHEN rfm.frequency = 2 THEN 2
WHEN rfm.frequency = 1 THEN 1
else 1
END + rfm.frequency AS rfm_score
--+ Five_years.recency
FROM rfm
GROUP BY rfm.email, rfm.cash,rfm.frequency
ORDER BY rfm.email
错误是:
ERROR: function decode(timestamp with time zone, integer, integer, integer, integer, integer, integer, integer, integer, integer, integer) does not exist Hint: No function matches the given name and argument types. You might need to add explicit type casts. Position: 186
我认为错误出在这一行
MAX(decode(order_order.order_date, 2016-01-01, 5, 2016-02-01, 4, 2016-03-01, 3, 2016-04-01, 2, 2016-05-01, 1)) AS recency
是否有任何建议可以将错误行修改为新近度维度的标准?谢谢
Postgres 中没有 decode()
函数。您可以将其替换为另一个 CASE
语句:
WITH rfm AS
(
SELECT email,
SUM((total_incl_tax)) AS cash,
MAX(
CASE
WHEN order_order.order_date = '2016-01-01' THEN 5
WHEN order_order.order_date = '2016-02-01' THEN 4
WHEN order_order.order_date = '2016-03-01' THEN 3
WHEN order_order.order_date = '2016-04-01' THEN 2
WHEN order_order.order_date = '2016-05-01' THEN 1
END
) as recency,
COUNT(DISTINCT(order_date)) AS frequency
FROM order_order
GROUP BY email
)
SELECT rfm.email,
CASE
WHEN rfm.cash >= 2000000 THEN 5
WHEN rfm.cash > 1500000 THEN 4
WHEN rfm.cash > 1000000 THEN 3
WHEN rfm.cash > 500000 THEN 2
WHEN rfm.frequency > 4 THEN 5
WHEN rfm.frequency = 4 THEN 4
WHEN rfm.frequency = 3 THEN 3
WHEN rfm.frequency = 2 THEN 2
WHEN rfm.frequency = 1 THEN 1
else 1
END + rfm.frequency + rfm.recency AS rfm_score
FROM rfm
GROUP BY rfm.email, rfm.cash,rfm.frequency
ORDER BY rfm.email
进一步阅读:Decode equivalent in postgres