基于配对数据的额外列(变异)
Extra column based on paired data (mutate)
我有一个包含配对数据(同一家庭成员)的数据集。
Id 是个人标识符,householdid 是伴侣的标识符(反之亦然)。
我需要为 his\her 合作伙伴的每个 id 添加一个额外的列(职业)。
我的数据是这样的
dta = rbind( c(1013661,101366, 'Never worked'),
c(1013662, 101366, 'Intermediate occs'),
c(1037552, 103755, 'Managerial & professional occs'),
c(1037551, 103755, 'Intermediate occs')
)
colnames(dta) = c('idno', 'householdid', 'occup')
dta
idno householdid occup
"1013661" "101366" "Never worked"
"1013662" "101366" "Intermediate occs"
"1037552" "103755" "Managerial & professional occs"
"1037551" "103755" "Intermediate occs"
我需要的应该是这样的
idno householdid occup occupPartner
"1013661" "101366" "Never worked" "Intermediate occs"
"1013662" "101366" "Intermediate occs" "Never worked"
"1037552" "103755" "Managerial & professional occs" "Intermediate occs"
"1037551" "103755" "Intermediate occs" "Managerial & professional occs"
我想有一个 mutate 的解决方案,但我不确定 group_by 应该是什么。
有什么想法吗?
尝试
library(dplyr)
dta1 <- as.data.frame(dta) %>%
group_by(householdid) %>%
mutate(occupPartner= rev(occup))
as.data.frame(dta1)
# idno householdid occup
#1 1013661 101366 Never worked
#2 1013662 101366 Intermediate occs
#3 1037552 103755 Managerial & professional occs
#4 1037551 103755 Intermediate occs
# occupPartner
#1 Intermediate occs
#2 Never worked
#3 Intermediate occs
#4 Managerial & professional occs
如果数据已经订购,
indx <- c(rbind(seq(2, nrow(dta), by=2), seq(1, nrow(dta), by=2)))
cbind(dta, occupPartner=dta[,3][indx])
另一个选项使用 data.table
library(data.table)
out = as.data.table(dta)[, occupPartner := rev(occup), by = householdid]
#> out
# idno householdid occup
#1: 1013661 101366 Never worked
#2: 1013662 101366 Intermediate occs
#3: 1037552 103755 Managerial & professional occs
#4: 1037551 103755 Intermediate occs
# occupPartner
#1: Intermediate occs
#2: Never worked
#3: Intermediate occs
#4: Managerial & professional occs
我有一个包含配对数据(同一家庭成员)的数据集。
Id 是个人标识符,householdid 是伴侣的标识符(反之亦然)。
我需要为 his\her 合作伙伴的每个 id 添加一个额外的列(职业)。
我的数据是这样的
dta = rbind( c(1013661,101366, 'Never worked'),
c(1013662, 101366, 'Intermediate occs'),
c(1037552, 103755, 'Managerial & professional occs'),
c(1037551, 103755, 'Intermediate occs')
)
colnames(dta) = c('idno', 'householdid', 'occup')
dta
idno householdid occup
"1013661" "101366" "Never worked"
"1013662" "101366" "Intermediate occs"
"1037552" "103755" "Managerial & professional occs"
"1037551" "103755" "Intermediate occs"
我需要的应该是这样的
idno householdid occup occupPartner
"1013661" "101366" "Never worked" "Intermediate occs"
"1013662" "101366" "Intermediate occs" "Never worked"
"1037552" "103755" "Managerial & professional occs" "Intermediate occs"
"1037551" "103755" "Intermediate occs" "Managerial & professional occs"
我想有一个 mutate 的解决方案,但我不确定 group_by 应该是什么。
有什么想法吗?
尝试
library(dplyr)
dta1 <- as.data.frame(dta) %>%
group_by(householdid) %>%
mutate(occupPartner= rev(occup))
as.data.frame(dta1)
# idno householdid occup
#1 1013661 101366 Never worked
#2 1013662 101366 Intermediate occs
#3 1037552 103755 Managerial & professional occs
#4 1037551 103755 Intermediate occs
# occupPartner
#1 Intermediate occs
#2 Never worked
#3 Intermediate occs
#4 Managerial & professional occs
如果数据已经订购,
indx <- c(rbind(seq(2, nrow(dta), by=2), seq(1, nrow(dta), by=2)))
cbind(dta, occupPartner=dta[,3][indx])
另一个选项使用 data.table
library(data.table)
out = as.data.table(dta)[, occupPartner := rev(occup), by = householdid]
#> out
# idno householdid occup
#1: 1013661 101366 Never worked
#2: 1013662 101366 Intermediate occs
#3: 1037552 103755 Managerial & professional occs
#4: 1037551 103755 Intermediate occs
# occupPartner
#1: Intermediate occs
#2: Never worked
#3: Intermediate occs
#4: Managerial & professional occs