折叠一个变量的总和但另一个变量的平均值

Question

我希望折叠以下数据集以创建公司 1 - 公司 2 - 年级数据集：

clear

input str32 Firm_1 str32 Firm_2    year number_employees str32 blah1  str32 blah2  returns 
           "Rathon"    "Hass"      2010      4000               hey    hello        40
           "Rathon"    "Hass"      2010      6000               hey    hello        20
           "Rathon"    "Hass"      2012     12000               money    fame       10
           "Rathon"    "Broq"      2012     12000               dime     bunk       50
           "Birlar"    "Goth"      2008      1000               shop     ladder     30
           "Birlar"    "Goth"      2008      7000               shop     ladder     70
end

我希望缩小最终数据集，以便每个观察代表相同的 firm_1 和 firm_2 相同的 year。因此，它将如下所示：

           Firm_1       Firm_2    year number_employees  blah1    blah2    returns
           "Rathon"    "Hass"      2010     10000          hey     hello     30 
           "Rathon"    "Hass"      2012     12000         money    fame      10
           "Rathon"    "Broq"      2012     12000         dime     bunk      50
           "Birlar"    "Goth"      2008      8000         shop     ladder    50

但是，当我按以下方式使用 collapse 时：

collapse (sum) number_employees, by ( Firm_1 Firm_2 year)

命令删除变量 blah_1 和 blah_2。有没有办法留住他们？此外，returns 应该在折叠观察时取平均值，而不是像我们对 number_employees

所做的那样加起来

Answer 1

这适用于您的示例：

collapse (sum) number_employees (mean) returns , by(Firm_1 Firm_2 blah* year) 

list 

     +--------------------------------------------------------------+
     | Firm_1   Firm_2   year   blah1    blah2   number~s   returns |
     |--------------------------------------------------------------|
  1. | Birlar     Goth   2008    shop   ladder       8000        50 |
  2. | Rathon     Broq   2012    dime     bunk      12000        50 |
  3. | Rathon     Hass   2010     hey    hello      10000        30 |
  4. | Rathon     Hass   2012   money     fame      12000        10 |
     +--------------------------------------------------------------+

by()选项中可以添加每组内的常量变量，只要是你想要的一一对应即可。更明显的是，记录的语法允许为相同或不同的变量计算不同的统计数据。

折叠一个变量的总和但另一个变量的平均值

Collapse on the sum of one variable but the mean of the other

stata