具有重复记录的 Stata 累计和
Stata cumulative sum with duplicate records
clear
input input record1 record2 value str8 sdate
1 1 0 2 "1/1/2010"
2 1 0 2 "1/1/2010"
3 1 0 3 "1/3/2010"
4 1 0 3 "1/3/2010"
5 1 0 3 "1/3/2010"
6 0 1 -3 "1/5/2010"
7 0 1 -3 "1/5/2010"
8 1 0 2 "1/5/2010"
9 0 1 1 "1/7/2010"
end
gen date = daily(sdate, "MDY")
format date %td
我所拥有的 MWE 是我数据中每个人的变量 recordi
,如果他们参与该值,则为 1
。我想为每个人创建一个变量,该变量将当天的值与前一天的最终值相加。留下以下输出。
input record1 record2 value date record1dailysum record2dailysum
1 1 0 2 1/1/2010 2 .
2 1 0 2 1/1/2010 2 .
3 1 0 3 1/3/2010 5 .
4 1 0 3 1/3/2010 5 .
5 1 0 3 1/3/2010 5 .
6 0 1 -3 1/5/2010 . -3
7 0 1 -3 1/5/2010 . -3
8 1 0 2 1/5/2010 7 .
9 0 1 1 1/7/2010 . -2
我有很多记录,所以我使用循环来创建这些值。这就是我试图创建的 recorddailysumi
qui forval i = 1/2
by date: egen record`i'dailysum = value + value[_n-1] if record`i' == 1
}
最后,我想将值向下移动一个日期,因此对于 record1
,1/3/2010
的值将是现在 1/1/2010
中的值,等等
压缩数据以按日期和记录创建唯一记录并合并回来不是一种选择(至少这是最后的手段,因为它是一个庞大而混乱的数据集)。
您的代码中存在一些错误:
forvalues
行末尾缺少 {
by date
表示您希望将每个唯一日期视为一个组。你实际上并不想要这个。你想 sort date
然后 运行 你的代码 record[i]
(给定你的数据结构)。
- (可能)更正常的方法是使用一个名为
record
的字段,它等于 1、2、...,在这种情况下,您只需编写代码 bys record (date): ...
(更多稍后)。
- 请注意
generate
(不是 egen
)的 sum()
函数给出了一个累加和。
关于请求:不清楚在数据中进行重复观察会得到什么,也不清楚在观察中重复累积总和会得到什么。为什么不只是 duplicates drop [varlist]
?或者,如果您需要保留所有观察结果,我认为标记 unique 观察结果可能会更有用。
最后,"I want to create a variable for each individual that cumulatively sums that days value with the final value from the day before. Leaving the following output."与"Lastly, I would like to shift the values down by one date so for record1 the values for 1/3/2010 would be the values now in 1/1/2010, etc."
不和
一种解决方案,维护您的结构:
clear
input input record1 record2 value str8 sdate
1 1 0 2 "1/1/2010"
2 1 0 2 "1/1/2010"
3 1 0 3 "1/3/2010"
4 1 0 3 "1/3/2010"
5 1 0 3 "1/3/2010"
6 0 1 -3 "1/5/2010"
7 0 1 -3 "1/5/2010"
8 1 0 2 "1/5/2010"
9 0 1 1 "1/7/2010"
end
// tag unique obs (consider instead duplicates drop record1 record2 value sdate, force)
egen tag = tag(record1 record2 value sdate)
// generate stata data
gen date = daily(sdate, "MDY")
format date %td
// fixed loop
sort date
forval i = 1/2 {
gen record`i'dailysum = sum(value) if record`i' == 1 & tag == 1
}
// if you must have duplicated sums, you can replace by group
forvalues i = 1/2 {
clonevar record`i'dailysum2 = record`i'dailysum
bys record`i' value date (record`i'dailysum2): replace record`i'dailysum2 = record`i'dailysum2[1]
}
sort record2 date record1 date
li, sepby(record1) noobs
结果
+------------------------------------------------------------------------------------------------------------+
| input record1 record2 value sdate tag date record.. record.. record.. record.. |
|------------------------------------------------------------------------------------------------------------|
| 2 1 0 2 1/1/2010 0 01jan2010 . . 2 . |
| 1 1 0 2 1/1/2010 1 01jan2010 2 . 2 . |
| 3 1 0 3 1/3/2010 1 03jan2010 5 . 5 . |
| 5 1 0 3 1/3/2010 0 03jan2010 . . 5 . |
| 4 1 0 3 1/3/2010 0 03jan2010 . . 5 . |
| 8 1 0 2 1/5/2010 1 05jan2010 7 . 7 . |
|------------------------------------------------------------------------------------------------------------|
| 6 0 1 -3 1/5/2010 1 05jan2010 . -3 . -3 |
| 7 0 1 -3 1/5/2010 0 05jan2010 . . . -3 |
| 9 0 1 1 1/7/2010 1 07jan2010 . -2 . -2 |
+------------------------------------------------------------------------------------------------------------+
但是,如果这是我的项目,我肯定会调查这样的事情:
// AN ALTERNATIVE APPROACH
clear
input input record1 record2 value str8 sdate
1 1 0 2 "1/1/2010"
2 1 0 2 "1/1/2010"
3 1 0 3 "1/3/2010"
4 1 0 3 "1/3/2010"
5 1 0 3 "1/3/2010"
6 0 1 -3 "1/5/2010"
7 0 1 -3 "1/5/2010"
8 1 0 2 "1/5/2010"
9 0 1 1 "1/7/2010"
end
// recode record
gen record = .
forvalues i = 1/2 {
replace record = `i' if record`i' == 1
}
drop record?
gen date = daily(sdate, "MDY")
format date %td
// drop duplicates
duplicates drop record value date , force
// gen daily sum by record (loop not required due to single variable structure)
bysort record (date): gen dailysum = sum(value)
li, sepby(record) noobs
屈服
+----------------------------------------------------------+
| input value sdate record date dailysum |
|----------------------------------------------------------|
| 1 2 1/1/2010 1 01jan2010 2 |
| 3 3 1/3/2010 1 03jan2010 5 |
| 8 2 1/5/2010 1 05jan2010 7 |
|----------------------------------------------------------|
| 6 -3 1/5/2010 2 05jan2010 -3 |
| 9 1 1/7/2010 2 07jan2010 -2 |
+----------------------------------------------------------+
在第二个示例中,将值向下移动一个日期是一项简单的任务:
// shift the values down by one date
bysort record (date): gen dailysum2 = dailysum[_n-1]
在第一个示例中,以下应该有效:
forvalues i = 1/2 {
bys tag record`i' (date): gen record`i'dailysumshift = record`i'dailysum[_n-1] if tag == 1
}
clear
input input record1 record2 value str8 sdate
1 1 0 2 "1/1/2010"
2 1 0 2 "1/1/2010"
3 1 0 3 "1/3/2010"
4 1 0 3 "1/3/2010"
5 1 0 3 "1/3/2010"
6 0 1 -3 "1/5/2010"
7 0 1 -3 "1/5/2010"
8 1 0 2 "1/5/2010"
9 0 1 1 "1/7/2010"
end
gen date = daily(sdate, "MDY")
format date %td
我所拥有的 MWE 是我数据中每个人的变量 recordi
,如果他们参与该值,则为 1
。我想为每个人创建一个变量,该变量将当天的值与前一天的最终值相加。留下以下输出。
input record1 record2 value date record1dailysum record2dailysum
1 1 0 2 1/1/2010 2 .
2 1 0 2 1/1/2010 2 .
3 1 0 3 1/3/2010 5 .
4 1 0 3 1/3/2010 5 .
5 1 0 3 1/3/2010 5 .
6 0 1 -3 1/5/2010 . -3
7 0 1 -3 1/5/2010 . -3
8 1 0 2 1/5/2010 7 .
9 0 1 1 1/7/2010 . -2
我有很多记录,所以我使用循环来创建这些值。这就是我试图创建的 recorddailysumi
qui forval i = 1/2
by date: egen record`i'dailysum = value + value[_n-1] if record`i' == 1
}
最后,我想将值向下移动一个日期,因此对于 record1
,1/3/2010
的值将是现在 1/1/2010
中的值,等等
压缩数据以按日期和记录创建唯一记录并合并回来不是一种选择(至少这是最后的手段,因为它是一个庞大而混乱的数据集)。
您的代码中存在一些错误:
forvalues
行末尾缺少{
by date
表示您希望将每个唯一日期视为一个组。你实际上并不想要这个。你想sort date
然后 运行 你的代码record[i]
(给定你的数据结构)。- (可能)更正常的方法是使用一个名为
record
的字段,它等于 1、2、...,在这种情况下,您只需编写代码bys record (date): ...
(更多稍后)。 - 请注意
generate
(不是egen
)的sum()
函数给出了一个累加和。
关于请求:不清楚在数据中进行重复观察会得到什么,也不清楚在观察中重复累积总和会得到什么。为什么不只是 duplicates drop [varlist]
?或者,如果您需要保留所有观察结果,我认为标记 unique 观察结果可能会更有用。
最后,"I want to create a variable for each individual that cumulatively sums that days value with the final value from the day before. Leaving the following output."与"Lastly, I would like to shift the values down by one date so for record1 the values for 1/3/2010 would be the values now in 1/1/2010, etc."
不和一种解决方案,维护您的结构:
clear
input input record1 record2 value str8 sdate
1 1 0 2 "1/1/2010"
2 1 0 2 "1/1/2010"
3 1 0 3 "1/3/2010"
4 1 0 3 "1/3/2010"
5 1 0 3 "1/3/2010"
6 0 1 -3 "1/5/2010"
7 0 1 -3 "1/5/2010"
8 1 0 2 "1/5/2010"
9 0 1 1 "1/7/2010"
end
// tag unique obs (consider instead duplicates drop record1 record2 value sdate, force)
egen tag = tag(record1 record2 value sdate)
// generate stata data
gen date = daily(sdate, "MDY")
format date %td
// fixed loop
sort date
forval i = 1/2 {
gen record`i'dailysum = sum(value) if record`i' == 1 & tag == 1
}
// if you must have duplicated sums, you can replace by group
forvalues i = 1/2 {
clonevar record`i'dailysum2 = record`i'dailysum
bys record`i' value date (record`i'dailysum2): replace record`i'dailysum2 = record`i'dailysum2[1]
}
sort record2 date record1 date
li, sepby(record1) noobs
结果
+------------------------------------------------------------------------------------------------------------+
| input record1 record2 value sdate tag date record.. record.. record.. record.. |
|------------------------------------------------------------------------------------------------------------|
| 2 1 0 2 1/1/2010 0 01jan2010 . . 2 . |
| 1 1 0 2 1/1/2010 1 01jan2010 2 . 2 . |
| 3 1 0 3 1/3/2010 1 03jan2010 5 . 5 . |
| 5 1 0 3 1/3/2010 0 03jan2010 . . 5 . |
| 4 1 0 3 1/3/2010 0 03jan2010 . . 5 . |
| 8 1 0 2 1/5/2010 1 05jan2010 7 . 7 . |
|------------------------------------------------------------------------------------------------------------|
| 6 0 1 -3 1/5/2010 1 05jan2010 . -3 . -3 |
| 7 0 1 -3 1/5/2010 0 05jan2010 . . . -3 |
| 9 0 1 1 1/7/2010 1 07jan2010 . -2 . -2 |
+------------------------------------------------------------------------------------------------------------+
但是,如果这是我的项目,我肯定会调查这样的事情:
// AN ALTERNATIVE APPROACH
clear
input input record1 record2 value str8 sdate
1 1 0 2 "1/1/2010"
2 1 0 2 "1/1/2010"
3 1 0 3 "1/3/2010"
4 1 0 3 "1/3/2010"
5 1 0 3 "1/3/2010"
6 0 1 -3 "1/5/2010"
7 0 1 -3 "1/5/2010"
8 1 0 2 "1/5/2010"
9 0 1 1 "1/7/2010"
end
// recode record
gen record = .
forvalues i = 1/2 {
replace record = `i' if record`i' == 1
}
drop record?
gen date = daily(sdate, "MDY")
format date %td
// drop duplicates
duplicates drop record value date , force
// gen daily sum by record (loop not required due to single variable structure)
bysort record (date): gen dailysum = sum(value)
li, sepby(record) noobs
屈服
+----------------------------------------------------------+
| input value sdate record date dailysum |
|----------------------------------------------------------|
| 1 2 1/1/2010 1 01jan2010 2 |
| 3 3 1/3/2010 1 03jan2010 5 |
| 8 2 1/5/2010 1 05jan2010 7 |
|----------------------------------------------------------|
| 6 -3 1/5/2010 2 05jan2010 -3 |
| 9 1 1/7/2010 2 07jan2010 -2 |
+----------------------------------------------------------+
在第二个示例中,将值向下移动一个日期是一项简单的任务:
// shift the values down by one date
bysort record (date): gen dailysum2 = dailysum[_n-1]
在第一个示例中,以下应该有效:
forvalues i = 1/2 {
bys tag record`i' (date): gen record`i'dailysumshift = record`i'dailysum[_n-1] if tag == 1
}