impala find_in_set 对比性能
impala find_in_set vs in performance
谁能告诉我 find_in_set() 和 in() 哪个性能更好?
SELECT a.data_date,
lower(substr (a.cookie_id,-3,1)) cookie_type,
CASE WHEN find_in_set (lower(substr (a.cookie_id,-3,1)),'2,3,5,6,8,b,c,d') > 0 THEN 'A' ELSE 'B'END 'AB',
COUNT(a.cookie_id)
FROM dw.dw_cookie_dau_visit a,
WHERE a.data_date = '20181102'
AND a.site_id = 600
AND lower(substr(a.cookie_id,-1,1)) NOT IN ('e','f')
AND lower(substr(a.cookie_id,-3,1)) IN ('0','1','2','3','4','5','6','7','8','9','a','b','c','d','e','f')
GROUP BY a.data_date,cookie_type,AB;
SELECT a.data_date,
lower(substr (a.cookie_id,-3,1)) cookie_type,
CASE WHEN lower(substr (a.cookie_id,-3,1) in ('2', '3', '5', '6', '8', 'b', 'c', 'd') THEN 'A' ELSE 'B'END 'AB',
COUNT(a.cookie_id)
FROM dw.dw_cookie_dau_visit a,
WHERE a.data_date = '20181102'
AND a.site_id = 600
AND lower(substr(a.cookie_id,-1,1)) NOT IN ('e','f')
AND lower(substr(a.cookie_id,-3,1)) IN ('0','1','2','3','4','5','6','7','8','9','a','b','c','d','e','f')
GROUP BY a.data_date,cookie_type,AB
我应该选择哪一个?
他们不做同样的事情。第二个版本应该是:
(CASE WHEN lower(substr(a.cookie_id, -3, 1) in ('2', '3', '5', '6', '8', 'b', 'c', 'd') THEN 'A' ELSE 'B' END) as AB,
在我看来,这是编写逻辑的更好方法,因为它为此目的使用了特定的 SQL 操作数。
至于性能,无所谓。查询的性能更多地取决于 from
和 group by
子句,而不是 select
.
中的 case
表达式
谁能告诉我 find_in_set() 和 in() 哪个性能更好?
SELECT a.data_date,
lower(substr (a.cookie_id,-3,1)) cookie_type,
CASE WHEN find_in_set (lower(substr (a.cookie_id,-3,1)),'2,3,5,6,8,b,c,d') > 0 THEN 'A' ELSE 'B'END 'AB',
COUNT(a.cookie_id)
FROM dw.dw_cookie_dau_visit a,
WHERE a.data_date = '20181102'
AND a.site_id = 600
AND lower(substr(a.cookie_id,-1,1)) NOT IN ('e','f')
AND lower(substr(a.cookie_id,-3,1)) IN ('0','1','2','3','4','5','6','7','8','9','a','b','c','d','e','f')
GROUP BY a.data_date,cookie_type,AB;
SELECT a.data_date,
lower(substr (a.cookie_id,-3,1)) cookie_type,
CASE WHEN lower(substr (a.cookie_id,-3,1) in ('2', '3', '5', '6', '8', 'b', 'c', 'd') THEN 'A' ELSE 'B'END 'AB',
COUNT(a.cookie_id)
FROM dw.dw_cookie_dau_visit a,
WHERE a.data_date = '20181102'
AND a.site_id = 600
AND lower(substr(a.cookie_id,-1,1)) NOT IN ('e','f')
AND lower(substr(a.cookie_id,-3,1)) IN ('0','1','2','3','4','5','6','7','8','9','a','b','c','d','e','f')
GROUP BY a.data_date,cookie_type,AB
我应该选择哪一个?
他们不做同样的事情。第二个版本应该是:
(CASE WHEN lower(substr(a.cookie_id, -3, 1) in ('2', '3', '5', '6', '8', 'b', 'c', 'd') THEN 'A' ELSE 'B' END) as AB,
在我看来,这是编写逻辑的更好方法,因为它为此目的使用了特定的 SQL 操作数。
至于性能,无所谓。查询的性能更多地取决于 from
和 group by
子句,而不是 select
.
case
表达式