Stata（家庭数据集）中多行的比较

Question

我正在处理家庭数据集，我的数据如下所示：

 input id   id_family   mother_id   male
        1           2          12      0
        2           2          13      1
        3           3          15      1
        4           3          17      0
        5           3           4      0
 end

我想做的是识别每个家庭的母亲。母亲是 id 等于另一个家庭成员的 mother_id 之一的家庭成员。在上面的例子中，对于 id_family=3 的家庭，个体 5 有 mother_id=4，这使得个体 4 成为她的母亲。

我创建了一个家庭人数变量，告诉我每个家庭有多少成员。我还为家庭中的每个成员创建了一个排名变量。对于三口之家，我有以下有效的代码：

bysort id_family: gen family_size=_N
bysort id_family: gen rank=_n

gen mother=. 
bysort id_family: replace mother=1 if male==0 & rank==1 & family_size==3 & (id[_n]==id[_n+1] | id[_n]==id[_n+2])
bysort id_family: replace mother=1 if male==0 & rank==2 & family_size==3 & (id[_n]==id[_n-1] | id[_n]==id[_n+1])
bysort id_family: replace mother=1 if male==0 & rank==3 & family_size==3 & (id[_n]==id[_n-1] | id[_n]==id[_n-2])

我得到的是：

id  id_family   mother_id   male    family_size rank    mother  
1        2          12       0           2        1       .      
2        2          13       1           2        2       .      
3        3          15       1           3        1       .      
4        3          17       0           3        2       1      
5        3           4       0           3        3       .

然而，在我的真实数据集中，我必须为 4 号及以上（最多 9 号）的家庭获取 mother，这使得该过程非常低效（因为有太多要比较的许多行元素 "manually")。

您将如何以更简洁的方式获得它？您会使用排列来索引行吗？或者你会使用for循环吗？

Answer 1

这是使用合并的方法。

// create sample data
clear
input id id_family mother_id male
        1           2          12      0
        2           2          13      1
        3           3          15      1
        4           3          17      0
        5           3           4      0
end
save families, replace
clear

// do the job
use families
drop id male
rename mother_id id
sort id_family id
duplicates drop
list, clean abbreviate(10)
save mothers, replace
use families, clear
merge 1:1 id_family id using mothers, keep(master match)
generate byte is_mother = _merge==3
list, clean abbreviate(10)

第二个列表产生

       id   id_family   mother_id   male            _merge   is_mother  
  1.    1           2          12      0   master only (1)           0  
  2.    2           2          13      1   master only (1)           0  
  3.    3           3          15      1   master only (1)           0  
  4.    4           3          17      0       matched (3)           1  
  5.    5           3           4      0   master only (1)           0

我保留 _merge 只是为了说明目的。

Stata（家庭数据集）中多行的比较

Comparisons across multiple rows in Stata (household dataset)

loops

row

stata