如何将 SQL 查询写入特定用户的不同区域 ID

Question

您好，我创建了一个具有以下值的后续流，

"account_id VARCHAR, user_id VARCHAR, src_ip VARCHAR, country_code VARCHAR, message VARCHAR"

现在我可以创建一个 table，在给定的滚动 window 中只有特定的 account_id 匹配字段，如下所示，

CREATE TABLE  221_console_failure AS \
      SELECT user_id, country_code \ 
      FROM my_stream \
      WINDOW TUMBLING (SIZE 600 SECONDS) \
      WHERE account_id = '4894833322'

有什么方法可以判断同一用户是否在 10 分钟内从不同的国家/地区代码值登录

我的 country_code 字段包含 IN、US、SG 等值。

Answer 1

KSQL 尚不支持 COUNT(DISTINCT)，这是您需要的，以便能够运行这个：

SELECT USER_ID, COUNT(DISTINCT COUNTRY_CODE) \
  FROM USER_EVENTS WINDOW TUMBLING (SIZE 10 MINUTES) \
GROUP BY USER_ID \
HAVING COUNT(DISTINCT COUNTRY_CODE) > 1;

如果此功能对您有用，请随时在 https://github.com/confluentinc/ksql/issues/506 上 vote/comment。

Answer 2

对于您的用例，您可以使用 HISTOGRAM 作为变通方法，直到 KSQL 提供 DISTINCT 函数。

HISTOGRAM(col1) (input type:STREAM/TABLE): Return a map containing the distinct String values of col1 mapped to the number of times each one occurs for the given window. This version limits the number of distinct values which can be counted to 1000, beyond which any additional entries are ignored.

CREATE TABLE 221_console_failure AS \
      SELECT user_id, \
      HISTOGRAM(country_code) as region, count(*) 
      FROM my_stream \
      WINDOW TUMBLING (SIZE 600 SECONDS) \
      WHERE user_account_id = '4894833322' \
                    GROUP BY user_account_id;

Output at consumer: b'{"USER_ID":"4894833322","REGION":{"SG":2,"IN":3},"KSQL_COL_2":5}'

现在您只需检查 REGION 的地图长度 > 1，因为它会收集不同的值。

如果您有经度和纬度，您也可以尝试使用标量函数 GEO_DISTANCE(lat1, lon1, lat2, lon2, unit)。

如何将 SQL 查询写入特定用户的不同区域 ID

How to write a KSQL query to distint region id for particular user

ksqldb