根据时间戳 Kusto 查询删除重复项

Remove duplicates based on Timestamp Kusto Query

我有两个 table,就像下面 Kusto 中的那样。我正在尝试根据 name/usernames 加入 table,但保留第二个 table 的行,即使第一个 table 没有匹配项,也删除如果用户名和电子邮件相同,则根据时间戳从第二个 table 重复(在这种情况下,我会保留最新的信息 - 最新时间戳)

Table 1

Name | pets | color | city
A    | A1   | blue  | NYC
A    | A2   | blue  | NYC
A    | A3   | blue  | NYC
B    | B1   | red   | Boston
C    | C1   | yellow| Miami
C    | C2   | yellow| Miami

Table 2

username | email          | school   | timestamp
A        | a@whatever.com | schoolA  | 10pm
B        | b@whatever.com | schoolB1 | 10pm
B        | b@whatever.com | schoolB2 | 11pm
C        | c@whatever.com | schoolC  | 9pm
D        | d@whatever.com | schoolD  | 11pm
E        | e@whatever.com | schoolE  | 10pm

Table results I want

name | pets | color  | city  | email          | school   | timestamp
A    | A1   | blue   | NYC   | a@whatever.com | schoolA  | 10pm
A    | A2   | blue   | NYC   | a@whatever.com | schoolA  | 10pm
A    | A3   | blue   | NYC   | a@whatever.com | schoolA  | 10pm
B    | B1   | red    | Boston| b@whatever.com | schoolB2 | 11pm
C    | C1   | yellow | Miami | c@whatever.com | schoolC  | 9pm
C    | C2   | yellow | Miami | c@whatever.com | schoolC  | 9pm
D    |      |        |       | d@whatever.com | schoolD  | 11pm
E    |      |        |       | e@whatever.com | schoolE  | 10pm

如果我没理解错的话,下面的查询是可行的。

它使用:

  • arg_max() (aggregation function): "如果用户名和电子邮件相同,则根据时间戳从第二个 table 中删除重复项(在这种情况下,我会保留来自最最近 -- 最新时间戳)"
  • Right outer-join flavor“保留第二个 table 的行,即使第一个 table 没有匹配项”
let T1 = datatable(name:string, pets:string, color:string, city:string)
[
    "A", "A1", "blue",   "NYC",
    "A", "A2", "blue",   "NYC",
    "A", "A3", "blue",   "NYC",
    "B", "B1", "red ",   "Boston",
    "C", "C1", "yellow", "Miami",
    "C", "C2", "yellow", "Miami",
]
;
let T2 = datatable(username:string, email:string, school:string, timestamp:datetime)
[
    "A", "a@whatever.com", "schoolA",  datetime(2020-11-24 22:00),
    "B", "b@whatever.com", "schoolB1", datetime(2020-11-24 22:00),
    "B", "b@whatever.com", "schoolB2", datetime(2020-11-24 23:00),
    "C", "c@whatever.com", "schoolC",  datetime(2020-11-24 21:00),
    "D", "d@whatever.com", "schoolD",  datetime(2020-11-24 23:00),
    "E", "e@whatever.com", "schoolE",  datetime(2020-11-24 22:00),
]
;
T1
| join kind=rightouter (
    T2
    | summarize arg_max(timestamp, *) by username, email
) on $left.name == $right.username
| project name = username, pets, color, city, email, school, timestamp
| order by name asc, pets asc
| name | pets | color  | city   | email          | school   | timestamp                   |
|------|------|--------|--------|----------------|----------|-----------------------------|
| A    | A1   | blue   | NYC    | a@whatever.com | schoolA  | 2020-11-24 22:00:00.0000000 |
| A    | A2   | blue   | NYC    | a@whatever.com | schoolA  | 2020-11-24 22:00:00.0000000 |
| A    | A3   | blue   | NYC    | a@whatever.com | schoolA  | 2020-11-24 22:00:00.0000000 |
| B    | B1   | red    | Boston | b@whatever.com | schoolB2 | 2020-11-24 23:00:00.0000000 |
| C    | C1   | yellow | Miami  | c@whatever.com | schoolC  | 2020-11-24 21:00:00.0000000 |
| C    | C2   | yellow | Miami  | c@whatever.com | schoolC  | 2020-11-24 21:00:00.0000000 |
| D    |      |        |        | d@whatever.com | schoolD  | 2020-11-24 23:00:00.0000000 |
| E    |      |        |        | e@whatever.com | schoolE  | 2020-11-24 22:00:00.0000000 |