具有重复记录的 Stata 累计和

Stata cumulative sum with duplicate records

clear 
input input record1 record2   value       str8 sdate
  1      1       0        2         "1/1/2010"
  2      1       0        2         "1/1/2010"
  3      1       0        3         "1/3/2010"
  4      1       0        3        "1/3/2010"
  5      1       0        3        "1/3/2010"
  6      0       1        -3        "1/5/2010"
  7      0       1        -3         "1/5/2010"
  8      1       0        2        "1/5/2010"
  9      0       1        1         "1/7/2010" 
 end 
 gen date = daily(sdate, "MDY") 
 format date %td 

我所拥有的 MWE 是我数据中每个人的变量 recordi,如果他们参与该值,则为 1。我想为每个人创建一个变量,该变量将当天的值与前一天的最终值相加。留下以下输出。

input record1 record2   value   date          record1dailysum    record2dailysum
  1      1       0        2     1/1/2010            2                .
  2      1       0        2     1/1/2010            2                .
  3      1       0        3     1/3/2010            5                .
  4      1       0        3     1/3/2010            5                .
  5      1       0        3     1/3/2010            5                .
  6      0       1        -3    1/5/2010            .               -3
  7      0       1        -3    1/5/2010            .               -3
  8      1       0        2     1/5/2010            7                .
  9      0       1        1     1/7/2010            .               -2

我有很多记录,所以我使用循环来创建这些值。这就是我试图创建的 recorddailysumi

qui forval i = 1/2
    by date: egen record`i'dailysum = value + value[_n-1] if record`i' == 1
}

最后,我想将值向下移动一个日期,因此对于 record11/3/2010 的值将是现在 1/1/2010 中的值,等等

压缩数据以按日期和记录创建唯一记录并合并回来不是一种选择(至少这是最后的手段,因为它是一个庞大而混乱的数据集)。

您的代码中存在一些错误:

  • forvalues 行末尾缺少 {
  • by date 表示您希望将每个唯一日期视为一个组。你实际上并不想要这个。你想 sort date 然后 运行 你的代码 record[i] (给定你的数据结构)。
  • (可能)更正常的方法是使用一个名为 record 的字段,它等于 1、2、...,在这种情况下,您只需编写代码 bys record (date): ...(更多稍后)。
  • 请注意 generate(不是 egen)的 sum() 函数给出了一个累加和。

关于请求:不清楚在数据中进行重复观察会得到什么,也不清楚在观察中重复累积总和会得到什么。为什么不只是 duplicates drop [varlist]?或者,如果您需要保留所有观察结果,我认为标记 unique 观察结果可能会更有用。

最后,"I want to create a variable for each individual that cumulatively sums that days value with the final value from the day before. Leaving the following output.""Lastly, I would like to shift the values down by one date so for record1 the values for 1/3/2010 would be the values now in 1/1/2010, etc."

不和

一种解决方案,维护您的结构:

clear 
input input record1 record2   value       str8 sdate
  1      1       0        2         "1/1/2010"
  2      1       0        2         "1/1/2010"
  3      1       0        3         "1/3/2010"
  4      1       0        3        "1/3/2010"
  5      1       0        3        "1/3/2010"
  6      0       1        -3        "1/5/2010"
  7      0       1        -3         "1/5/2010"
  8      1       0        2        "1/5/2010"
  9      0       1        1         "1/7/2010" 
end 

// tag unique obs (consider instead duplicates drop record1 record2 value sdate, force)
egen tag = tag(record1 record2 value sdate)

// generate stata data
gen date = daily(sdate, "MDY") 
format date %td 

// fixed loop
sort date
forval i = 1/2 {
    gen record`i'dailysum = sum(value) if record`i' == 1 & tag == 1
}

// if you must have duplicated sums, you can replace by group
forvalues i = 1/2 {
    clonevar record`i'dailysum2 = record`i'dailysum
    bys record`i' value date (record`i'dailysum2): replace record`i'dailysum2 = record`i'dailysum2[1]
}
sort record2 date record1 date
li, sepby(record1) noobs

结果

  +------------------------------------------------------------------------------------------------------------+
  | input   record1   record2   value      sdate   tag        date   record..   record..   record..   record.. |
  |------------------------------------------------------------------------------------------------------------|
  |     2         1         0       2   1/1/2010     0   01jan2010          .          .          2          . |
  |     1         1         0       2   1/1/2010     1   01jan2010          2          .          2          . |
  |     3         1         0       3   1/3/2010     1   03jan2010          5          .          5          . |
  |     5         1         0       3   1/3/2010     0   03jan2010          .          .          5          . |
  |     4         1         0       3   1/3/2010     0   03jan2010          .          .          5          . |
  |     8         1         0       2   1/5/2010     1   05jan2010          7          .          7          . |
  |------------------------------------------------------------------------------------------------------------|
  |     6         0         1      -3   1/5/2010     1   05jan2010          .         -3          .         -3 |
  |     7         0         1      -3   1/5/2010     0   05jan2010          .          .          .         -3 |
  |     9         0         1       1   1/7/2010     1   07jan2010          .         -2          .         -2 |
  +------------------------------------------------------------------------------------------------------------+

但是,如果这是我的项目,我肯定会调查这样的事情:

// AN ALTERNATIVE APPROACH

clear 
input input record1 record2   value       str8 sdate
  1      1       0        2         "1/1/2010"
  2      1       0        2         "1/1/2010"
  3      1       0        3         "1/3/2010"
  4      1       0        3        "1/3/2010"
  5      1       0        3        "1/3/2010"
  6      0       1        -3        "1/5/2010"
  7      0       1        -3         "1/5/2010"
  8      1       0        2        "1/5/2010"
  9      0       1        1         "1/7/2010" 
end 

// recode record
gen record = .
forvalues i = 1/2 {
    replace record = `i' if record`i' == 1
}
drop record?

gen date = daily(sdate, "MDY") 
format date %td 

// drop duplicates
duplicates drop record value date , force

// gen daily sum by record (loop not required due to single variable structure)
bysort record (date): gen dailysum = sum(value)

li, sepby(record) noobs

屈服

  +----------------------------------------------------------+
  | input   value      sdate   record        date   dailysum |
  |----------------------------------------------------------|
  |     1       2   1/1/2010        1   01jan2010          2 |
  |     3       3   1/3/2010        1   03jan2010          5 |
  |     8       2   1/5/2010        1   05jan2010          7 |
  |----------------------------------------------------------|
  |     6      -3   1/5/2010        2   05jan2010         -3 |
  |     9       1   1/7/2010        2   07jan2010         -2 |
  +----------------------------------------------------------+

在第二个示例中,将值向下移动一个日期是一项简单的任务:

// shift the values down by one date
bysort record (date): gen dailysum2 = dailysum[_n-1]

在第一个示例中,以下应该有效:

forvalues i = 1/2 {
    bys tag record`i' (date): gen record`i'dailysumshift = record`i'dailysum[_n-1] if tag == 1
}