根据时间戳 Kusto 查询删除重复项

Question

我有两个 table，就像下面 Kusto 中的那样。我正在尝试根据 name/usernames 加入 table，但保留第二个 table 的行，即使第一个 table 没有匹配项，也删除如果用户名和电子邮件相同，则根据时间戳从第二个 table 重复（在这种情况下，我会保留最新的信息 - 最新时间戳）

Table 1

Name | pets | color | city
A    | A1   | blue  | NYC
A    | A2   | blue  | NYC
A    | A3   | blue  | NYC
B    | B1   | red   | Boston
C    | C1   | yellow| Miami
C    | C2   | yellow| Miami

Table 2

username | email          | school   | timestamp
A        | a@whatever.com | schoolA  | 10pm
B        | b@whatever.com | schoolB1 | 10pm
B        | b@whatever.com | schoolB2 | 11pm
C        | c@whatever.com | schoolC  | 9pm
D        | d@whatever.com | schoolD  | 11pm
E        | e@whatever.com | schoolE  | 10pm

Table results I want

name | pets | color  | city  | email          | school   | timestamp
A    | A1   | blue   | NYC   | a@whatever.com | schoolA  | 10pm
A    | A2   | blue   | NYC   | a@whatever.com | schoolA  | 10pm
A    | A3   | blue   | NYC   | a@whatever.com | schoolA  | 10pm
B    | B1   | red    | Boston| b@whatever.com | schoolB2 | 11pm
C    | C1   | yellow | Miami | c@whatever.com | schoolC  | 9pm
C    | C2   | yellow | Miami | c@whatever.com | schoolC  | 9pm
D    |      |        |       | d@whatever.com | schoolD  | 11pm
E    |      |        |       | e@whatever.com | schoolE  | 10pm

Answer 1

如果我没理解错的话，下面的查询是可行的。

它使用：

arg_max() (aggregation function): "如果用户名和电子邮件相同，则根据时间戳从第二个 table 中删除重复项（在这种情况下，我会保留来自最最近 -- 最新时间戳)"
Right outer-join flavor：“保留第二个 table 的行，即使第一个 table 没有匹配项”

let T1 = datatable(name:string, pets:string, color:string, city:string)
[
    "A", "A1", "blue",   "NYC",
    "A", "A2", "blue",   "NYC",
    "A", "A3", "blue",   "NYC",
    "B", "B1", "red ",   "Boston",
    "C", "C1", "yellow", "Miami",
    "C", "C2", "yellow", "Miami",
]
;
let T2 = datatable(username:string, email:string, school:string, timestamp:datetime)
[
    "A", "a@whatever.com", "schoolA",  datetime(2020-11-24 22:00),
    "B", "b@whatever.com", "schoolB1", datetime(2020-11-24 22:00),
    "B", "b@whatever.com", "schoolB2", datetime(2020-11-24 23:00),
    "C", "c@whatever.com", "schoolC",  datetime(2020-11-24 21:00),
    "D", "d@whatever.com", "schoolD",  datetime(2020-11-24 23:00),
    "E", "e@whatever.com", "schoolE",  datetime(2020-11-24 22:00),
]
;
T1
| join kind=rightouter (
    T2
    | summarize arg_max(timestamp, *) by username, email
) on $left.name == $right.username
| project name = username, pets, color, city, email, school, timestamp
| order by name asc, pets asc

| name | pets | color  | city   | email          | school   | timestamp                   |
|------|------|--------|--------|----------------|----------|-----------------------------|
| A    | A1   | blue   | NYC    | a@whatever.com | schoolA  | 2020-11-24 22:00:00.0000000 |
| A    | A2   | blue   | NYC    | a@whatever.com | schoolA  | 2020-11-24 22:00:00.0000000 |
| A    | A3   | blue   | NYC    | a@whatever.com | schoolA  | 2020-11-24 22:00:00.0000000 |
| B    | B1   | red    | Boston | b@whatever.com | schoolB2 | 2020-11-24 23:00:00.0000000 |
| C    | C1   | yellow | Miami  | c@whatever.com | schoolC  | 2020-11-24 21:00:00.0000000 |
| C    | C2   | yellow | Miami  | c@whatever.com | schoolC  | 2020-11-24 21:00:00.0000000 |
| D    |      |        |        | d@whatever.com | schoolD  | 2020-11-24 23:00:00.0000000 |
| E    |      |        |        | e@whatever.com | schoolE  | 2020-11-24 22:00:00.0000000 |

根据时间戳 Kusto 查询删除重复项

Remove duplicates based on Timestamp Kusto Query

database

relational-database

kql

azure-data-explorer