将纵向数据集重新排列为生命 table

Question

我有table关于一些人的居住和职业。我想知道从事某些职业的人是否比其他人更有可能搬迁。纵向数据如下所示：

library(tidyverse)    
id <- c(rep(1, 6), rep(2, 6), rep(3, 6))
year <- c(rep(1990:1995, 3))
occupation <- c(rep("Barrister", 6), rep("Telephone salesman", 3), rep("Baker", 3), rep("Janitor", 2), rep("Builder", 4))
residence <- c(rep("London", 2), rep("Manchester", 2), rep("Glasgow", 2), rep("London", 6), rep("Liverpool", 4), rep ("Luton", 2))

df <- tibble(id, year, occupation, residence)

我想重新排列 table 以使其具有生命 table 格式。此外，我想创建两个新变量：一个虚拟变量，用于表示个人是在 x 年之后搬迁（= 事件发生），还是个人在 x 年之后没有搬迁（= 事件被右删失），以及如果一个人改变了职业，一个变量包含以前从事的职业的信息。我希望 table 看起来像这样：

id2 <- c(rep(1, 3), rep(2, 2), rep(3, 3))         
years <- c(2, 2, 2, 3, 3, 2, 2, 2)
occupation2 <- c(rep("Barrister", 3), rep("Telephone salesman", 1), rep("Baker", 1), rep("Janitor", 1), rep("Builder", 2))
residence2 <- c(rep("London", 1), rep("Manchester", 1), rep("Glasgow", 1), rep("London", 2), rep("Liverpool", 2), rep ("Luton", 1))
relocated <- c(1,1,0,0,0,0,1,0)
experience <- c(rep(NA, 3), rep(NA, 1), rep("Telephone salesman", 1), rep(NA, 1), rep("Janitor", 2))

life.table <- tibble(id2, years, occupation2, residence2, relocated, experience)

我完全不确定如何实现这一点，任何建议将不胜感激！

Answer 1

可能是，这有帮助

library(dplyr)
n <- 2
df %>%
    group_by(id) %>%
    mutate(n1 = cumsum(c(1, diff(year))), n2 = n(), n3 = n2 - n1, 
         n4 = n_distinct(residence)) %>% 
    group_by(occupation = factor(occupation, levels = unique(occupation)), 
       residence = factor(residence, levels = unique(residence)), .add = TRUE) %>%
    summarise(years = n(), relocated = +(any(n3 > n) & first(n4) > 1)) %>%
    group_by(id) %>% 
    mutate(experience = if(n_distinct(occupation) > 1)
     c(NA_character_, rep(as.character(first(occupation)), n() - 1))
     else NA_character_)

将纵向数据集重新排列为生命 table

Rearrange longitudinal data set to life table

r

tidyverse

data-wrangling