Stata

Question

我正在尝试创建一个按 ID 列出所有行组合的数据集，但忽略组合的顺序（即 Apple -> Orange 与 Orange -> Apple 相同）。我开始使用问题，它让我得到所有组合，保持不同的顺序。如何修改此代码以获得所需的输出？或者我应该完全使用不同的方法，该方法是什么？

我目前使用链接问题所做的代码：

clear all

input id str10 fruit 
1 "Apple"
1 "Banana"
1 "Orange"
2 "Orange"
2 "Apple"
3 "Pear"
3 "Kiwi"
3 "Apple"
3 "Lemon"
end

tempfile t1
save `t1'
clear

use `t1'
rename fruit f2
keep id f2
joinby id using `t1'
order id fruit f2
sort id  fruit f2

drop if fruit==f2

list, sepby(id)

我想要的输出是：

ID  fruit   f2
1   Apple   Banana
1   Apple   Orange
1   Banana  Orange
2   Orange  Apple
3   Pear    Kiwi
3   Pear    Apple
3   Pear    Lemon
3   Kiwi    Apple
3   Kiwi    Lemon
3   Apple   Lemon

Answer 1

在joinby之后，您可以生成一个辅助变量，将fruit 和f2 放在一个变量中并对它们进行排序，确保相同的组合具有相同的值。然后你可以根据这个变量和id使用duplicates drop来删除重复项。

clear 
input id str10 fruit 
1 "Apple"
1 "Banana"
1 "Orange"
2 "Orange"
2 "Apple"
3 "Pear"
3 "Kiwi"
3 "Apple"
3 "Lemon"
end

tempfile t1
save `t1'

rename fruit f2
joinby id using `t1'
order id fruit f2
sort id  fruit f2

drop if fruit==f2
gen combination = cond(fruit < f2, fruit + " " + f2, f2 + " " + fruit)
duplicates drop id combination, force
drop combination

list, sepby(id)

     +----------------------+
     | id    fruit       f2 |
     |----------------------|
  1. |  1    Apple   Banana |
  2. |  1    Apple   Orange |
  3. |  1   Banana   Orange |
     |----------------------|
  4. |  2    Apple   Orange |
     |----------------------|
  5. |  3    Apple     Kiwi |
  6. |  3    Apple    Lemon |
  7. |  3    Apple     Pear |
  8. |  3     Kiwi    Lemon |
  9. |  3     Kiwi     Pear |
 10. |  3    Lemon     Pear |
     +----------------------+

Stata - 生成忽略顺序的独特组合

Stata - Generate unique combinations ignoring order