单独的列指示两个不同行中两个不同日期的结果
Separate columns indicating outcomes for two different date in two different rows
我觉得标题比较难懂。我会给你一个适当的例子,说明我有什么,我想要什么。
我有很多观察数据,这些数据表明治疗前后的个体 ID 和一些结果(如工资和工时)
ID Wage_{t} Wage_{t-1} Hours_{t} Hours_{t-1} Establishment
Brain 34563 34563 45 43 X1
Lucke 2545 2356 35 36 E3
Jasmine 26789 1345 42 44 E3
Leila 1000 1234 38 39 E3
Sophie 35421 23453 50 57 Y6
我想将治疗前后的观察结果分开,并用虚拟变量表示 before/after 观察结果,如果观察结果发生在:
之后,则该变量取 1
ID Wage Hours Establishment After_dummy
Brain 34563 43 X1 0
Brain 34563 45 X1 1
Lucke 2356 36 E3 0
Lucke 2545 35 E3 1
Jasmine 1345 44 E3 0
Jasmine 26789 42 E3 1
Leila 1234 39 E3 0
Leila 1000 38 E3 1
Sophie 23453 57 Y6 0
Sophie 35421 50 Y6 1
应修改带有大括号和连字符的列名。此外,您还可以在列名中包含所需的“虚拟”值。这将使您更容易将数据重塑为长格式,例如 pivot_longer
.
在这种情况下,Wage_t_1
表示 Wage_{t}
,其 after_dummy
值为 1。
library(tidyverse)
names(df) <- c("ID", "Wage_t_1","Wage_t_0", "Hours_t_1", "Hours_t_0", "Establishment")
pivot_longer(df,
cols = -c(ID, Establishment),
names_to = c(".value", "after_dummy"),
names_pattern = "(Wage|Hours)_t_(\d+)")
输出
ID Establishment after_dummy Wage Hours
<chr> <chr> <chr> <int> <int>
1 Brain X1 1 34563 45
2 Brain X1 0 34563 43
3 Lucke E3 1 2545 35
4 Lucke E3 0 2356 36
5 Jasmine E3 1 26789 42
6 Jasmine E3 0 1345 44
7 Leila E3 1 1000 38
8 Leila E3 0 1234 39
9 Sophie Y6 1 35421 50
10 Sophie Y6 0 23453 57
数据
df <- structure(list(ID = c("Brain", "Lucke", "Jasmine", "Leila", "Sophie"
), Wage_t_1 = c(34563L, 2545L, 26789L, 1000L, 35421L), Wage_t_0 = c(34563L,
2356L, 1345L, 1234L, 23453L), Hours_t_1 = c(45L, 35L, 42L, 38L,
50L), Hours_t_0 = c(43L, 36L, 44L, 39L, 57L), Establishment = c("X1",
"E3", "E3", "E3", "Y6")), class = "data.frame", row.names = c(NA,
-5L))
我觉得标题比较难懂。我会给你一个适当的例子,说明我有什么,我想要什么。
我有很多观察数据,这些数据表明治疗前后的个体 ID 和一些结果(如工资和工时)
ID Wage_{t} Wage_{t-1} Hours_{t} Hours_{t-1} Establishment
Brain 34563 34563 45 43 X1
Lucke 2545 2356 35 36 E3
Jasmine 26789 1345 42 44 E3
Leila 1000 1234 38 39 E3
Sophie 35421 23453 50 57 Y6
我想将治疗前后的观察结果分开,并用虚拟变量表示 before/after 观察结果,如果观察结果发生在:
之后,则该变量取 1ID Wage Hours Establishment After_dummy
Brain 34563 43 X1 0
Brain 34563 45 X1 1
Lucke 2356 36 E3 0
Lucke 2545 35 E3 1
Jasmine 1345 44 E3 0
Jasmine 26789 42 E3 1
Leila 1234 39 E3 0
Leila 1000 38 E3 1
Sophie 23453 57 Y6 0
Sophie 35421 50 Y6 1
应修改带有大括号和连字符的列名。此外,您还可以在列名中包含所需的“虚拟”值。这将使您更容易将数据重塑为长格式,例如 pivot_longer
.
在这种情况下,Wage_t_1
表示 Wage_{t}
,其 after_dummy
值为 1。
library(tidyverse)
names(df) <- c("ID", "Wage_t_1","Wage_t_0", "Hours_t_1", "Hours_t_0", "Establishment")
pivot_longer(df,
cols = -c(ID, Establishment),
names_to = c(".value", "after_dummy"),
names_pattern = "(Wage|Hours)_t_(\d+)")
输出
ID Establishment after_dummy Wage Hours
<chr> <chr> <chr> <int> <int>
1 Brain X1 1 34563 45
2 Brain X1 0 34563 43
3 Lucke E3 1 2545 35
4 Lucke E3 0 2356 36
5 Jasmine E3 1 26789 42
6 Jasmine E3 0 1345 44
7 Leila E3 1 1000 38
8 Leila E3 0 1234 39
9 Sophie Y6 1 35421 50
10 Sophie Y6 0 23453 57
数据
df <- structure(list(ID = c("Brain", "Lucke", "Jasmine", "Leila", "Sophie"
), Wage_t_1 = c(34563L, 2545L, 26789L, 1000L, 35421L), Wage_t_0 = c(34563L,
2356L, 1345L, 1234L, 23453L), Hours_t_1 = c(45L, 35L, 42L, 38L,
50L), Hours_t_0 = c(43L, 36L, 44L, 39L, 57L), Establishment = c("X1",
"E3", "E3", "E3", "Y6")), class = "data.frame", row.names = c(NA,
-5L))