查询Hive中每个节点的子节点数

Querying number of children nodes for each node in Hive

我正在尝试提出最好的 HiveQL 查询来获取行列表,其中一列将包含该节点具有的(直接)子节点数。数据库是分层的,所以它看起来像这样:

| ID | Some other column | ParentID |
+-----------------------------------+
| 1  | XXXXXXXXXX x X X  | NULL     |
| 2  | XXXXXXXXXX x X X  | 1        |
| 3  | XXXXXXXXXX x X X  | 2        |
| 4  | XXXXXXXXXX x X X  | 1        |

我正在尝试查询它以输出如下内容:

| ID | Some other column | child count |
+--------------------------------------+
| 1  | XXXXXXXXXX x X X  | 2           |
| 2  | XXXXXXXXXX x X X  | 1           |
| 3  | XXXXXXXXXX x X X  | 0           |
| 4  | XXXXXXXXXX x X X  | 0           |

LEFT JOIN 试试这样的东西。

SELECT a.id,
       COALESCE (b.child_count, 0) "child count"
FROM   mytable a
       LEFT JOIN (SELECT parentid,
                         Count(*) child_count
                  FROM   mytable
                  GROUP  BY parentid) b
              ON a.id = b.parentid;