如何标记组织结构中的最低级别单位(由多个 LEVEL_ID 列表示)?
How do I tag the lowest-level units within an organization structure (represented by several LEVEL_ID columns)?
我有一个数据集,其中列出了不同子部门级别的政府不同部门。它看起来像这样:
LEVEL1ID LEVEL2ID LEVEL3ID LEVEL4ID LEVEL5ID YEAR DESCRIP_E
0 0 0 0 0 2019 Government of X
5 0 0 0 0 2019 Department of Oceans
5 200 0 0 0 2019 Coast Guard
5 200 300 0 0 2019 Coast Guard HQ
5 200 300 400 0 2019 CG HQ - Business Mgmt
5 200 300 401 0 2019 CG HQ - IT
5 200 300 402 0 2019 CG HQ - Vessels
5 200 301 0 0 2019 CG Training
5 200 301 405 0 2019 CG Training - Employees
5 200 301 406 0 2019 CG Training - Students
5 200 302 0 0 2019 CG North
5 200 303 0 0 2019 CG East
5 200 303 407 0 2019 CG East - Shore-Based Personnel
5 200 303 407 500 2019 CG East - Business Mgmt
5 200 303 407 501 2019 CG East - Operations
0 0 0 0 0 2018 Government of X
5 0 0 0 0 2018 Department of Oceans
5 200 0 0 0 2018 Coast Guard
5 200 300 0 0 2018 Coast Guard HQ
5 200 300 400 0 2018 CG HQ - Business Mgmt
(and so on)
我想创建一个新的二进制变量来标记代表给定年份内最低级别组织单位的行。也就是说,我希望我的数据集看起来像这样:
LEVEL1ID LEVEL2ID LEVEL3ID LEVEL4ID LEVEL5ID YEAR UNIQUE DESCRIP_E
0 0 0 0 0 2019 No Government of X
5 0 0 0 0 2019 No Department of Oceans
5 200 0 0 0 2019 No Coast Guard
5 200 300 0 0 2019 No Coast Guard HQ
5 200 300 400 0 2019 Yes CG HQ - Business Mgmt
5 200 300 401 0 2019 Yes CG HQ - IT
5 200 300 402 0 2019 Yes CG HQ - Vessels
5 200 301 0 0 2019 No CG Training
5 200 301 405 0 2019 Yes CG Training - Employees
5 200 301 406 0 2019 Yes CG Training - Students
5 200 302 0 0 2019 Yes CG North
5 200 303 0 0 2019 No CG East
5 200 303 407 0 2019 No CG East - Shore-Based Personnel
5 200 303 407 500 2019 Yes CG East - Business Mgmt
5 200 303 407 501 2019 Yes CG East - Operations
0 0 0 0 0 2018 No Government of X
5 0 0 0 0 2018 No Department of Oceans
5 200 0 0 0 2018 No Coast Guard
5 200 300 0 0 2018 No Coast Guard HQ
5 200 300 400 0 2018 Yes CG HQ - Business Mgmt
(and so on)
我如何在 R(或 Excel)中做到这一点?
我认为这应该可行:
is_unique = function(x) !duplicated(x) & !duplicated(x, fromLast = TRUE)
df$UNIQUE = with(df, ifelse(
is_unique(LEVEL5ID) |
(is_unique(LEVEL4ID) & LEVEL5ID == 0) |
(is_unique(LEVEL3ID) & LEVEL4ID == 0) |
(is_unique(LEVEL2ID) & LEVEL3ID == 0) |
(is_unique(LEVEL1ID) & LEVEL2ID == 0),
"Yes", "No"
)
)
如果您需要将其概括为任意数量的级别,我们也可以这样做,但将其写出来似乎仅适用于 5 个级别。
我有一个数据集,其中列出了不同子部门级别的政府不同部门。它看起来像这样:
LEVEL1ID LEVEL2ID LEVEL3ID LEVEL4ID LEVEL5ID YEAR DESCRIP_E
0 0 0 0 0 2019 Government of X
5 0 0 0 0 2019 Department of Oceans
5 200 0 0 0 2019 Coast Guard
5 200 300 0 0 2019 Coast Guard HQ
5 200 300 400 0 2019 CG HQ - Business Mgmt
5 200 300 401 0 2019 CG HQ - IT
5 200 300 402 0 2019 CG HQ - Vessels
5 200 301 0 0 2019 CG Training
5 200 301 405 0 2019 CG Training - Employees
5 200 301 406 0 2019 CG Training - Students
5 200 302 0 0 2019 CG North
5 200 303 0 0 2019 CG East
5 200 303 407 0 2019 CG East - Shore-Based Personnel
5 200 303 407 500 2019 CG East - Business Mgmt
5 200 303 407 501 2019 CG East - Operations
0 0 0 0 0 2018 Government of X
5 0 0 0 0 2018 Department of Oceans
5 200 0 0 0 2018 Coast Guard
5 200 300 0 0 2018 Coast Guard HQ
5 200 300 400 0 2018 CG HQ - Business Mgmt
(and so on)
我想创建一个新的二进制变量来标记代表给定年份内最低级别组织单位的行。也就是说,我希望我的数据集看起来像这样:
LEVEL1ID LEVEL2ID LEVEL3ID LEVEL4ID LEVEL5ID YEAR UNIQUE DESCRIP_E
0 0 0 0 0 2019 No Government of X
5 0 0 0 0 2019 No Department of Oceans
5 200 0 0 0 2019 No Coast Guard
5 200 300 0 0 2019 No Coast Guard HQ
5 200 300 400 0 2019 Yes CG HQ - Business Mgmt
5 200 300 401 0 2019 Yes CG HQ - IT
5 200 300 402 0 2019 Yes CG HQ - Vessels
5 200 301 0 0 2019 No CG Training
5 200 301 405 0 2019 Yes CG Training - Employees
5 200 301 406 0 2019 Yes CG Training - Students
5 200 302 0 0 2019 Yes CG North
5 200 303 0 0 2019 No CG East
5 200 303 407 0 2019 No CG East - Shore-Based Personnel
5 200 303 407 500 2019 Yes CG East - Business Mgmt
5 200 303 407 501 2019 Yes CG East - Operations
0 0 0 0 0 2018 No Government of X
5 0 0 0 0 2018 No Department of Oceans
5 200 0 0 0 2018 No Coast Guard
5 200 300 0 0 2018 No Coast Guard HQ
5 200 300 400 0 2018 Yes CG HQ - Business Mgmt
(and so on)
我如何在 R(或 Excel)中做到这一点?
我认为这应该可行:
is_unique = function(x) !duplicated(x) & !duplicated(x, fromLast = TRUE)
df$UNIQUE = with(df, ifelse(
is_unique(LEVEL5ID) |
(is_unique(LEVEL4ID) & LEVEL5ID == 0) |
(is_unique(LEVEL3ID) & LEVEL4ID == 0) |
(is_unique(LEVEL2ID) & LEVEL3ID == 0) |
(is_unique(LEVEL1ID) & LEVEL2ID == 0),
"Yes", "No"
)
)
如果您需要将其概括为任意数量的级别,我们也可以这样做,但将其写出来似乎仅适用于 5 个级别。