沿着 R 中两个感兴趣的变量排序
Sequencing along two variables of interest in R
我正在尝试根据关于人们如何从一个位置移动到另一个位置的两个不同参数创建一个序列。我有以下信息
name<- c("John", "John", "John", "Sam","Sam", "Robert", "Robert","Robert")
location<- c("London", "London", "Newyork", "Houston", "Houston", "London", "Paris","Paris")
start_yr<- c(2012, 2012, 2014, 2014, 2014,2012,2013, 2013)
end_yr<- c(2013, 2013, 2015, 2015, 2015, 2013, 2015, 2015)
df<- data.frame(name,location,start_yr, end_yr)
我需要 seq_along 姓名和位置并创建年份的转换变量以了解此人在那一年是否搬家。我试过了,但效果不是很好。我的年龄越来越奇怪,这意味着名称列有时不以 1 开头。关于如何解决这个问题有什么建议吗?
ave(df$name,df$location, FUN = seq_along)
我想要
name location move year
John London 1 2012
John London 0 2013
John Newyork 1 2014
John Newyork 0 2015
如果我理解正确,您可以通过扩展数据框来完成数据框,对于从最小值 start_yr
到最大值 end_yr
的每个 name
和 location
组合,然后按 name
分组并按 start_yr
排序以检查位置是否使用 lag()
:
更改
library(dplyr)
library(tidyr)
df %>%
group_by(name, location) %>%
complete(start_yr = full_seq(min(start_yr):max(end_yr), 1)) %>%
group_by(name) %>%
arrange(start_yr) %>%
mutate(move = +(lag(location) != location))
这将 return NA
如果对于给定名称,没有以前的位置,0
如果位置相同,1
如果它改变了:
#Source: local data frame [14 x 5]
#Groups: name [3]
#
# name location start_yr end_yr move
# (fctr) (fctr) (dbl) (dbl) (int)
#1 John London 2012 2013 NA
#2 John London 2012 2013 0
#3 John London 2013 NA 0
#4 John Newyork 2014 2015 1
#5 John Newyork 2015 NA 0
#6 Robert London 2012 2013 NA
#7 Robert London 2013 NA 0
#8 Robert Paris 2013 2015 1
#9 Robert Paris 2013 2015 0
#10 Robert Paris 2014 NA 0
#11 Robert Paris 2015 NA 0
#12 Sam Houston 2014 2015 NA
#13 Sam Houston 2014 2015 0
#14 Sam Houston 2015 NA 0
我正在尝试根据关于人们如何从一个位置移动到另一个位置的两个不同参数创建一个序列。我有以下信息
name<- c("John", "John", "John", "Sam","Sam", "Robert", "Robert","Robert")
location<- c("London", "London", "Newyork", "Houston", "Houston", "London", "Paris","Paris")
start_yr<- c(2012, 2012, 2014, 2014, 2014,2012,2013, 2013)
end_yr<- c(2013, 2013, 2015, 2015, 2015, 2013, 2015, 2015)
df<- data.frame(name,location,start_yr, end_yr)
我需要 seq_along 姓名和位置并创建年份的转换变量以了解此人在那一年是否搬家。我试过了,但效果不是很好。我的年龄越来越奇怪,这意味着名称列有时不以 1 开头。关于如何解决这个问题有什么建议吗?
ave(df$name,df$location, FUN = seq_along)
我想要
name location move year
John London 1 2012
John London 0 2013
John Newyork 1 2014
John Newyork 0 2015
如果我理解正确,您可以通过扩展数据框来完成数据框,对于从最小值 start_yr
到最大值 end_yr
的每个 name
和 location
组合,然后按 name
分组并按 start_yr
排序以检查位置是否使用 lag()
:
library(dplyr)
library(tidyr)
df %>%
group_by(name, location) %>%
complete(start_yr = full_seq(min(start_yr):max(end_yr), 1)) %>%
group_by(name) %>%
arrange(start_yr) %>%
mutate(move = +(lag(location) != location))
这将 return NA
如果对于给定名称,没有以前的位置,0
如果位置相同,1
如果它改变了:
#Source: local data frame [14 x 5]
#Groups: name [3]
#
# name location start_yr end_yr move
# (fctr) (fctr) (dbl) (dbl) (int)
#1 John London 2012 2013 NA
#2 John London 2012 2013 0
#3 John London 2013 NA 0
#4 John Newyork 2014 2015 1
#5 John Newyork 2015 NA 0
#6 Robert London 2012 2013 NA
#7 Robert London 2013 NA 0
#8 Robert Paris 2013 2015 1
#9 Robert Paris 2013 2015 0
#10 Robert Paris 2014 NA 0
#11 Robert Paris 2015 NA 0
#12 Sam Houston 2014 2015 NA
#13 Sam Houston 2014 2015 0
#14 Sam Houston 2015 NA 0