Stata中多个观察值(面板数据)的互斥性
Mutual exclusiveness in multiple observations (panel data) in Stata
我有多个不同曝光的观察结果,我正在使用 Stata/MP 16.1。我想根据曝光是否互斥将 exposure
按 id
分组。请看数据示例。
所需的变量是我手动创建的groups
。由于数据集包含 >100,000 个观察值,我如何通过代码实现所需变量 groups
?
* Example generated by -dataex-. To install: ssc install dataex
clear
input float id str1 exposure long groups
1 "." 2
1 "a" 2
1 "a" 2
2 "a" 1
2 "." 1
2 "b" 1
2 "c" 1
3 "a" 1
3 "c" 1
3 "c" 1
4 "b" 3
4 "b" 3
4 "b" 3
end
label values groups groups
label def groups 1 "not mutually exclusive", modify
label def groups 2 "only a", modify
label def groups 3 "only b", modify
* Example generated by -dataex-. To install: ssc install dataex
clear
input float id str1 exposure long groups
1 "a" 2
1 "a" 2
1 "a" 2
2 "a" 1
2 "a" 1
2 "b" 1
2 "c" 1
3 "a" 1
3 "c" 1
3 "c" 1
4 "b" 3
4 "b" 3
4 "b" 3
end
label values groups groups
label def groups 1 "not mutually exclusive", modify
label def groups 2 "only a", modify
label def groups 3 "only b", modify
bysort id (exposure) : gen wanted = cond(exposure[1] != exposure[_N], 1, cond(exposure[1] == "a", 2, cond(exposure[1] == "b", 3, .)))
label val wanted groups
assert wanted == groups
逻辑是
如果 id
中有不同的值,则分配 1
否则,数值相同;所以
如果第一个值是a
则赋值2(等价于所有值都是a
)
如果第一个值是b
则分配3(等价于所有值都是b
)
否则分配遗漏——根据你的例子不应该有任何这样的,但检查是个好主意。
自然地,您可以将其分解为更短的语句:
bysort id (exposure) : gen wanted = 1 if exposure[1] != exposure[_N]
by id: replace wanted = 2 if exposure[1] == "a"
by id: replace wanted = 3 if exposure[2] == "b"
EDIT 这里有一些更复杂的技术设置。请注意,Stata 不会为 "."
.
附加任何特殊含义
* Example generated by -dataex-. To install: ssc install dataex
clear
input float id str1 exposure long groups
1 "." 2
1 "a" 2
1 "a" 2
2 "a" 1
2 "." 1
2 "b" 1
2 "c" 1
3 "a" 1
3 "c" 1
3 "c" 1
4 "b" 3
4 "b" 3
4 "b" 3
end
label values groups groups
label def groups 1 "not mutually exclusive", modify
label def groups 2 "only a", modify
label def groups 3 "only b", modify
label def groups 4 "only c", modify
gen OK = exposure != "."
sort OK id exposure
by OK id: gen wanted = 1 if OK & exposure[1] != exposure[_N]
by OK id: replace wanted = 2 if wanted == . & OK & exposure[1] == "a"
by OK id: replace wanted = 3 if wanted == . & OK & exposure[1] == "b"
by OK id: replace wanted = 4 if wanted == . & OK & exposure[1] == "c"
bysort id (exposure OK) : replace wanted = wanted[_N]
drop OK
label val wanted groups
list, sepby(id)
+-----------------------------------------------------------------+
| id exposure groups wanted |
|-----------------------------------------------------------------|
1. | 1 . only a only a |
2. | 1 a only a only a |
3. | 1 a only a only a |
|-----------------------------------------------------------------|
4. | 2 . not mutually exclusive not mutually exclusive |
5. | 2 a not mutually exclusive not mutually exclusive |
6. | 2 b not mutually exclusive not mutually exclusive |
7. | 2 c not mutually exclusive not mutually exclusive |
|-----------------------------------------------------------------|
8. | 3 a not mutually exclusive not mutually exclusive |
9. | 3 c not mutually exclusive not mutually exclusive |
10. | 3 c not mutually exclusive not mutually exclusive |
|-----------------------------------------------------------------|
11. | 4 b only b only b |
12. | 4 b only b only b |
13. | 4 b only b only b |
+-----------------------------------------------------------------+
我有多个不同曝光的观察结果,我正在使用 Stata/MP 16.1。我想根据曝光是否互斥将 exposure
按 id
分组。请看数据示例。
所需的变量是我手动创建的groups
。由于数据集包含 >100,000 个观察值,我如何通过代码实现所需变量 groups
?
* Example generated by -dataex-. To install: ssc install dataex
clear
input float id str1 exposure long groups
1 "." 2
1 "a" 2
1 "a" 2
2 "a" 1
2 "." 1
2 "b" 1
2 "c" 1
3 "a" 1
3 "c" 1
3 "c" 1
4 "b" 3
4 "b" 3
4 "b" 3
end
label values groups groups
label def groups 1 "not mutually exclusive", modify
label def groups 2 "only a", modify
label def groups 3 "only b", modify
* Example generated by -dataex-. To install: ssc install dataex
clear
input float id str1 exposure long groups
1 "a" 2
1 "a" 2
1 "a" 2
2 "a" 1
2 "a" 1
2 "b" 1
2 "c" 1
3 "a" 1
3 "c" 1
3 "c" 1
4 "b" 3
4 "b" 3
4 "b" 3
end
label values groups groups
label def groups 1 "not mutually exclusive", modify
label def groups 2 "only a", modify
label def groups 3 "only b", modify
bysort id (exposure) : gen wanted = cond(exposure[1] != exposure[_N], 1, cond(exposure[1] == "a", 2, cond(exposure[1] == "b", 3, .)))
label val wanted groups
assert wanted == groups
逻辑是
如果 id
中有不同的值,则分配 1
否则,数值相同;所以
如果第一个值是a
则赋值2(等价于所有值都是a
)
如果第一个值是b
则分配3(等价于所有值都是b
)
否则分配遗漏——根据你的例子不应该有任何这样的,但检查是个好主意。
自然地,您可以将其分解为更短的语句:
bysort id (exposure) : gen wanted = 1 if exposure[1] != exposure[_N]
by id: replace wanted = 2 if exposure[1] == "a"
by id: replace wanted = 3 if exposure[2] == "b"
EDIT 这里有一些更复杂的技术设置。请注意,Stata 不会为 "."
.
* Example generated by -dataex-. To install: ssc install dataex
clear
input float id str1 exposure long groups
1 "." 2
1 "a" 2
1 "a" 2
2 "a" 1
2 "." 1
2 "b" 1
2 "c" 1
3 "a" 1
3 "c" 1
3 "c" 1
4 "b" 3
4 "b" 3
4 "b" 3
end
label values groups groups
label def groups 1 "not mutually exclusive", modify
label def groups 2 "only a", modify
label def groups 3 "only b", modify
label def groups 4 "only c", modify
gen OK = exposure != "."
sort OK id exposure
by OK id: gen wanted = 1 if OK & exposure[1] != exposure[_N]
by OK id: replace wanted = 2 if wanted == . & OK & exposure[1] == "a"
by OK id: replace wanted = 3 if wanted == . & OK & exposure[1] == "b"
by OK id: replace wanted = 4 if wanted == . & OK & exposure[1] == "c"
bysort id (exposure OK) : replace wanted = wanted[_N]
drop OK
label val wanted groups
list, sepby(id)
+-----------------------------------------------------------------+
| id exposure groups wanted |
|-----------------------------------------------------------------|
1. | 1 . only a only a |
2. | 1 a only a only a |
3. | 1 a only a only a |
|-----------------------------------------------------------------|
4. | 2 . not mutually exclusive not mutually exclusive |
5. | 2 a not mutually exclusive not mutually exclusive |
6. | 2 b not mutually exclusive not mutually exclusive |
7. | 2 c not mutually exclusive not mutually exclusive |
|-----------------------------------------------------------------|
8. | 3 a not mutually exclusive not mutually exclusive |
9. | 3 c not mutually exclusive not mutually exclusive |
10. | 3 c not mutually exclusive not mutually exclusive |
|-----------------------------------------------------------------|
11. | 4 b only b only b |
12. | 4 b only b only b |
13. | 4 b only b only b |
+-----------------------------------------------------------------+