单独的列指示两个不同行中两个不同日期的结果

Separate columns indicating outcomes for two different date in two different rows

我觉得标题比较难懂。我会给你一个适当的例子,说明我有什么,我想要什么。

我有很多观察数据,这些数据表明治疗前后的个体 ID 和一些结果(如工资和工时)

ID         Wage_{t}  Wage_{t-1}   Hours_{t}   Hours_{t-1}  Establishment
Brain      34563     34563        45          43            X1
Lucke      2545      2356         35          36            E3
Jasmine    26789     1345         42          44            E3
Leila      1000      1234         38          39            E3
Sophie     35421     23453        50          57            Y6

我想将治疗前后的观察结果分开,并用虚拟变量表示 before/after 观察结果,如果观察结果发生在:

之后,则该变量取 1
ID         Wage    Hours         Establishment   After_dummy
Brain      34563   43            X1              0
Brain      34563   45            X1              1
Lucke      2356    36            E3              0
Lucke      2545    35            E3              1
Jasmine    1345    44            E3              0
Jasmine    26789   42            E3              1
Leila      1234    39            E3              0
Leila      1000    38            E3              1
Sophie     23453   57            Y6              0
Sophie     35421   50            Y6              1


应修改带有大括号和连字符的列名。此外,您还可以在列名中包含所需的“虚拟”值。这将使您更容易将数据重塑为长格式,例如 pivot_longer.

在这种情况下,Wage_t_1 表示 Wage_{t},其 after_dummy 值为 1。

library(tidyverse)

names(df) <- c("ID", "Wage_t_1","Wage_t_0", "Hours_t_1", "Hours_t_0", "Establishment")

pivot_longer(df,
             cols = -c(ID, Establishment), 
             names_to = c(".value", "after_dummy"),
             names_pattern = "(Wage|Hours)_t_(\d+)")

输出

   ID      Establishment after_dummy  Wage Hours
   <chr>   <chr>         <chr>       <int> <int>
 1 Brain   X1            1           34563    45
 2 Brain   X1            0           34563    43
 3 Lucke   E3            1            2545    35
 4 Lucke   E3            0            2356    36
 5 Jasmine E3            1           26789    42
 6 Jasmine E3            0            1345    44
 7 Leila   E3            1            1000    38
 8 Leila   E3            0            1234    39
 9 Sophie  Y6            1           35421    50
10 Sophie  Y6            0           23453    57

数据

df <- structure(list(ID = c("Brain", "Lucke", "Jasmine", "Leila", "Sophie"
), Wage_t_1 = c(34563L, 2545L, 26789L, 1000L, 35421L), Wage_t_0 = c(34563L, 
2356L, 1345L, 1234L, 23453L), Hours_t_1 = c(45L, 35L, 42L, 38L, 
50L), Hours_t_0 = c(43L, 36L, 44L, 39L, 57L), Establishment = c("X1", 
"E3", "E3", "E3", "Y6")), class = "data.frame", row.names = c(NA, 
-5L))