Stata中多个观察值（面板数据）的互斥性

Question

我有多个不同曝光的观察结果，我正在使用 Stata/MP 16.1。我想根据曝光是否互斥将 exposure 按 id 分组。请看数据示例。

所需的变量是我手动创建的groups。由于数据集包含 >100,000 个观察值，我如何通过代码实现所需变量 groups?

* Example generated by -dataex-. To install: ssc install dataex
clear
input float id str1 exposure long groups
1 "." 2
1 "a" 2
1 "a" 2
2 "a" 1
2 "." 1
2 "b" 1
2 "c" 1
3 "a" 1
3 "c" 1
3 "c" 1
4 "b" 3
4 "b" 3
4 "b" 3
end
label values groups groups
label def groups 1 "not mutually exclusive", modify
label def groups 2 "only a", modify
label def groups 3 "only b", modify

Answer 1

 * Example generated by -dataex-. To install: ssc install dataex
    clear
    input float id str1 exposure long groups
    1 "a" 2
    1 "a" 2
    1 "a" 2
    2 "a" 1
    2 "a" 1
    2 "b" 1
    2 "c" 1
    3 "a" 1
    3 "c" 1
    3 "c" 1
    4 "b" 3
    4 "b" 3
    4 "b" 3
    end
    label values groups groups
    label def groups 1 "not mutually exclusive", modify
    label def groups 2 "only a", modify
    label def groups 3 "only b", modify
    
    bysort id (exposure) : gen wanted = cond(exposure[1] != exposure[_N], 1, cond(exposure[1] == "a", 2, cond(exposure[1] == "b", 3, .)))
    label val wanted groups 

    assert wanted == groups

逻辑是

如果 id 中有不同的值，则分配 1

否则，数值相同；所以

如果第一个值是a则赋值2（等价于所有值都是a）

如果第一个值是b则分配3（等价于所有值都是b）

否则分配遗漏——根据你的例子不应该有任何这样的，但检查是个好主意。

自然地，您可以将其分解为更短的语句：

bysort id (exposure) : gen wanted = 1 if exposure[1] != exposure[_N] 
by id: replace wanted = 2 if exposure[1] == "a" 
by id: replace wanted = 3 if exposure[2] == "b"

EDIT 这里有一些更复杂的技术设置。请注意，Stata 不会为 ".".

附加任何特殊含义

* Example generated by -dataex-. To install: ssc install dataex
clear
input float id str1 exposure long groups
1 "." 2
1 "a" 2
1 "a" 2
2 "a" 1
2 "." 1
2 "b" 1
2 "c" 1
3 "a" 1
3 "c" 1
3 "c" 1
4 "b" 3
4 "b" 3
4 "b" 3
end

label values groups groups
label def groups 1 "not mutually exclusive", modify
label def groups 2 "only a", modify
label def groups 3 "only b", modify
label def groups 4 "only c", modify

gen OK = exposure != "."
sort OK id exposure 
by OK id: gen wanted = 1 if OK & exposure[1] != exposure[_N] 
by OK id: replace wanted = 2 if wanted == . & OK & exposure[1] == "a"
by OK id: replace wanted = 3 if wanted == . & OK & exposure[1] == "b"
by OK id: replace wanted = 4 if wanted == . & OK & exposure[1] == "c"

bysort id (exposure OK) : replace wanted = wanted[_N]
drop OK 
label val wanted groups 

list, sepby(id)

     +-----------------------------------------------------------------+
     | id   exposure                   groups                   wanted |
     |-----------------------------------------------------------------|
  1. |  1          .                   only a                   only a |
  2. |  1          a                   only a                   only a |
  3. |  1          a                   only a                   only a |
     |-----------------------------------------------------------------|
  4. |  2          .   not mutually exclusive   not mutually exclusive |
  5. |  2          a   not mutually exclusive   not mutually exclusive |
  6. |  2          b   not mutually exclusive   not mutually exclusive |
  7. |  2          c   not mutually exclusive   not mutually exclusive |
     |-----------------------------------------------------------------|
  8. |  3          a   not mutually exclusive   not mutually exclusive |
  9. |  3          c   not mutually exclusive   not mutually exclusive |
 10. |  3          c   not mutually exclusive   not mutually exclusive |
     |-----------------------------------------------------------------|
 11. |  4          b                   only b                   only b |
 12. |  4          b                   only b                   only b |
 13. |  4          b                   only b                   only b |
     +-----------------------------------------------------------------+

Stata中多个观察值（面板数据）的互斥性

Mutual exclusiveness in multiple observations (panel data) in Stata

stata