如何标记组织结构中的最低级别单位（由多个 LEVEL_ID 列表示）？

Question

我有一个数据集，其中列出了不同子部门级别的政府不同部门。它看起来像这样：

LEVEL1ID   LEVEL2ID   LEVEL3ID   LEVEL4ID   LEVEL5ID   YEAR   DESCRIP_E
0          0          0          0          0          2019   Government of X
5          0          0          0          0          2019   Department of Oceans
5          200        0          0          0          2019   Coast Guard
5          200        300        0          0          2019   Coast Guard HQ
5          200        300        400        0          2019   CG HQ - Business Mgmt
5          200        300        401        0          2019   CG HQ - IT
5          200        300        402        0          2019   CG HQ - Vessels
5          200        301        0          0          2019   CG Training
5          200        301        405        0          2019   CG Training - Employees
5          200        301        406        0          2019   CG Training - Students
5          200        302        0          0          2019   CG North
5          200        303        0          0          2019   CG East
5          200        303        407        0          2019   CG East - Shore-Based Personnel
5          200        303        407        500        2019   CG East - Business Mgmt
5          200        303        407        501        2019   CG East - Operations
0          0          0          0          0          2018   Government of X
5          0          0          0          0          2018   Department of Oceans
5          200        0          0          0          2018   Coast Guard
5          200        300        0          0          2018   Coast Guard HQ
5          200        300        400        0          2018   CG HQ - Business Mgmt
(and so on)

我想创建一个新的二进制变量来标记代表给定年份内最低级别组织单位的行。也就是说，我希望我的数据集看起来像这样：

LEVEL1ID   LEVEL2ID   LEVEL3ID   LEVEL4ID   LEVEL5ID   YEAR   UNIQUE     DESCRIP_E
0          0          0          0          0          2019   No         Government of X
5          0          0          0          0          2019   No         Department of Oceans
5          200        0          0          0          2019   No         Coast Guard
5          200        300        0          0          2019   No         Coast Guard HQ
5          200        300        400        0          2019   Yes        CG HQ - Business Mgmt
5          200        300        401        0          2019   Yes        CG HQ - IT
5          200        300        402        0          2019   Yes        CG HQ - Vessels
5          200        301        0          0          2019   No         CG Training
5          200        301        405        0          2019   Yes        CG Training - Employees
5          200        301        406        0          2019   Yes        CG Training - Students
5          200        302        0          0          2019   Yes        CG North
5          200        303        0          0          2019   No         CG East
5          200        303        407        0          2019   No         CG East - Shore-Based Personnel
5          200        303        407        500        2019   Yes        CG East - Business Mgmt
5          200        303        407        501        2019   Yes        CG East - Operations
0          0          0          0          0          2018   No         Government of X
5          0          0          0          0          2018   No         Department of Oceans
5          200        0          0          0          2018   No         Coast Guard
5          200        300        0          0          2018   No         Coast Guard HQ
5          200        300        400        0          2018   Yes        CG HQ - Business Mgmt
(and so on)

我如何在 R（或 Excel）中做到这一点？

Answer 1

我认为这应该可行：

is_unique = function(x) !duplicated(x) & !duplicated(x, fromLast = TRUE)

df$UNIQUE = with(df, ifelse(
  is_unique(LEVEL5ID) |
    (is_unique(LEVEL4ID) & LEVEL5ID == 0) |
    (is_unique(LEVEL3ID) & LEVEL4ID == 0) |
    (is_unique(LEVEL2ID) & LEVEL3ID == 0) |
    (is_unique(LEVEL1ID) & LEVEL2ID == 0),
  "Yes", "No"
  )
)

如果您需要将其概括为任意数量的级别，我们也可以这样做，但将其写出来似乎仅适用于 5 个级别。

如何标记组织结构中的最低级别单位（由多个 LEVEL_ID 列表示）？

How do I tag the lowest-level units within an organization structure (represented by several LEVEL_ID columns)?

r

data-cleaning