选择数据集中的多项选择
Multiple choices in a choice data set
原始数据包含有关 consumerid
和 cars
他们 purchased
的信息。
clear
input consumerid car purchase
6 American 1
6 Japanese 0
6 European 0
7 American 0
7 Japanese 0
7 European 1
7 Korean 1
end
由于这是购买数据,因此需要扩展数据集以描述消费者每次购买时的完整汽车选择集。最终的数据集应该是这样的(截取自 Stata 手册 www.stata.com/manuals/cm.pdf on p. 97 in "Example 4: Multiple choices per case"):
我已经生成了几个代码(如下所示),几乎可以让我到达我需要的地方,但是我无法为每个 consumerid-carnumber 组合生成单个 purchase=1 值(即由于扩展,购买值是重复)。
egen sumpurchase=total(purchase), by(id)
expand sumpurchase
bysort id car (purchase): gen carnumber=_n
您可以使用 reshape
获得每辆购买的汽车的所有组合 consumerid/car。此示例假设原始数据集中的排序顺序定义了哪辆车是 carnumber 1,carnumber 2 等
* Example generated by -dataex-. To install: ssc install dataex
clear
input byte consumerid str8 car byte purchase
6 "American" 1
6 "Japanese" 0
6 "European" 0
7 "American" 0
7 "Japanese" 0
7 "European" 1
7 "Korean" 1
end
// Generate carnumber
bys consumerid: gen carnumber = cond(purchase != 0, sum(purchase), 0)
// To wide
reshape wide purchase, i(consumerid car) j(carnumber)
// Keep purchased cars only
drop purchase0
// Back to long
reshape long
// Drop if no cars purchased for consumerid/carnumber
bysort consumerid carnumber (purchase) : drop if missing(purchase[1])
// Replace missing with 0 for non-purchased cars
mvencode purchase, mv(0)
// Sort and see results
sort consumerid carnumber car
list, sepby(consumerid carnumber) abbr(14)
结果:
. list, sepby(consumerid carnumber) abbr(14)
+----------------------------------------------+
| consumerid car carnumber purchase |
|----------------------------------------------|
1. | 6 American 1 1 |
2. | 6 European 1 0 |
3. | 6 Japanese 1 0 |
|----------------------------------------------|
4. | 7 American 1 0 |
5. | 7 European 1 1 |
6. | 7 Japanese 1 0 |
7. | 7 Korean 1 0 |
|----------------------------------------------|
8. | 7 American 2 0 |
9. | 7 European 2 0 |
10. | 7 Japanese 2 0 |
11. | 7 Korean 2 1 |
+----------------------------------------------+
原始数据包含有关 consumerid
和 cars
他们 purchased
的信息。
clear
input consumerid car purchase
6 American 1
6 Japanese 0
6 European 0
7 American 0
7 Japanese 0
7 European 1
7 Korean 1
end
由于这是购买数据,因此需要扩展数据集以描述消费者每次购买时的完整汽车选择集。最终的数据集应该是这样的(截取自 Stata 手册 www.stata.com/manuals/cm.pdf on p. 97 in "Example 4: Multiple choices per case"):
我已经生成了几个代码(如下所示),几乎可以让我到达我需要的地方,但是我无法为每个 consumerid-carnumber 组合生成单个 purchase=1 值(即由于扩展,购买值是重复)。
egen sumpurchase=total(purchase), by(id)
expand sumpurchase
bysort id car (purchase): gen carnumber=_n
您可以使用 reshape
获得每辆购买的汽车的所有组合 consumerid/car。此示例假设原始数据集中的排序顺序定义了哪辆车是 carnumber 1,carnumber 2 等
* Example generated by -dataex-. To install: ssc install dataex
clear
input byte consumerid str8 car byte purchase
6 "American" 1
6 "Japanese" 0
6 "European" 0
7 "American" 0
7 "Japanese" 0
7 "European" 1
7 "Korean" 1
end
// Generate carnumber
bys consumerid: gen carnumber = cond(purchase != 0, sum(purchase), 0)
// To wide
reshape wide purchase, i(consumerid car) j(carnumber)
// Keep purchased cars only
drop purchase0
// Back to long
reshape long
// Drop if no cars purchased for consumerid/carnumber
bysort consumerid carnumber (purchase) : drop if missing(purchase[1])
// Replace missing with 0 for non-purchased cars
mvencode purchase, mv(0)
// Sort and see results
sort consumerid carnumber car
list, sepby(consumerid carnumber) abbr(14)
结果:
. list, sepby(consumerid carnumber) abbr(14)
+----------------------------------------------+
| consumerid car carnumber purchase |
|----------------------------------------------|
1. | 6 American 1 1 |
2. | 6 European 1 0 |
3. | 6 Japanese 1 0 |
|----------------------------------------------|
4. | 7 American 1 0 |
5. | 7 European 1 1 |
6. | 7 Japanese 1 0 |
7. | 7 Korean 1 0 |
|----------------------------------------------|
8. | 7 American 2 0 |
9. | 7 European 2 0 |
10. | 7 Japanese 2 0 |
11. | 7 Korean 2 1 |
+----------------------------------------------+