如何根据 r 中的 2 个日期列打开新行
How to open new rows based on 2 date columns in r
我有以下大型数据集的示例数据集 -
isin directorid dob_Year2 ROLE_START ROLE_END gender datestartrole dateendrole
US6819771048 340769 1970 1995 2003 M 1995-02-01 2003-01-09
US6819771048 340769 1970 2003 2004 M 2003-01-09 2004-02-24
US6819771048 340769 1970 2004 2007 M 2004-02-24 2007-09-07
US68243Q1067 86917 1951 1986 2016 M 1986-01-01 2016-06-30
US68243Q1067 86917 1951 1976 1986 M 1976-04-01 1986-01-01
US68243Q1067 327069 1961 2016 2020 M 2016-06-30 2020-05-21
我的问题是-
我想根据变量 ROLE_START
和 ROLE_END
创建 New Rows。的数量
要创建的行取决于 MINIMUM ROLE_START
和 MAXIMUM ROLE_END
假设数据按 [=16= 分组] 和 directorid
。例如 - 对于 isin
US6819771048 和 directorid
340769,MINIMUM ROLE_START
年份是 1995 和 MAXIMUM ROLE_END
年是 2007 年。所以我需要为 1995-2007 年的每一年打开行,这些年份应该存储在 YEAR
变量中。 请注意,在上面的示例中,1995-2007 之间没有中断,因为从 ROLE_START
和 ROLE_END
可以清楚地看出所有年份都包括在内。如果任何年份之间有任何中断,则应排除这些中断。对于上面的样本数据集,我预期的数据集应该是这样的 -
isin directorid dob_Year2 ROLE_START ROLE_END gender datestartrole dateendrole YEAR
US6819771048 340769 1970 1995 2003 M 1995-02-01 2003-01-09 1995
US6819771048 340769 1970 1995 2003 M 1995-02-01 2003-01-09 1996
US6819771048 340769 1970 1995 2003 M 1995-02-01 2003-01-09 1997
US6819771048 340769 1970 1995 2003 M 1995-02-01 2003-01-09 1998
US6819771048 340769 1970 1995 2003 M 1995-02-01 2003-01-09 1999
US6819771048 340769 1970 1995 2003 M 1995-02-01 2003-01-09 2000
US6819771048 340769 1970 1995 2003 M 1995-02-01 2003-01-09 2001
US6819771048 340769 1970 1995 2003 M 1995-02-01 2003-01-09 2002
US6819771048 340769 1970 1995 2003 M 1995-02-01 2003-01-09 2003
US6819771048 340769 1970 2003 2004 M 2003-01-09 2004-02-24 2003
US6819771048 340769 1970 2003 2004 M 2003-01-09 2004-02-24 2004
US6819771048 340769 1970 2004 2007 M 2004-02-24 2007-09-07 2004
US6819771048 340769 1970 2004 2007 M 2004-02-24 2007-09-07 2005
US6819771048 340769 1970 2004 2007 M 2004-02-24 2007-09-07 2006
US6819771048 340769 1970 2004 2007 M 2004-02-24 2007-09-07 2007
US68243Q1067 86917 1951 1986 2016 M 1986-01-01 2016-06-30 1986
US68243Q1067 86917 1951 1986 2016 M 1986-01-01 2016-06-30 1987
US68243Q1067 86917 1951 1986 2016 M 1986-01-01 2016-06-30 1988
US68243Q1067 86917 1951 1986 2016 M 1986-01-01 2016-06-30 1989
US68243Q1067 86917 1951 1986 2016 M 1986-01-01 2016-06-30 1990
US68243Q1067 86917 1951 1986 2016 M 1986-01-01 2016-06-30 1991
US68243Q1067 86917 1951 1986 2016 M 1986-01-01 2016-06-30 1992
US68243Q1067 86917 1951 1986 2016 M 1986-01-01 2016-06-30 1993
US68243Q1067 86917 1951 1986 2016 M 1986-01-01 2016-06-30 1994
US68243Q1067 86917 1951 1986 2016 M 1986-01-01 2016-06-30 1995
US68243Q1067 86917 1951 1986 2016 M 1986-01-01 2016-06-30 1996
US68243Q1067 86917 1951 1986 2016 M 1986-01-01 2016-06-30 1997
US68243Q1067 86917 1951 1986 2016 M 1986-01-01 2016-06-30 1998
US68243Q1067 86917 1951 1986 2016 M 1986-01-01 2016-06-30 1999
US68243Q1067 86917 1951 1986 2016 M 1986-01-01 2016-06-30 2000
US68243Q1067 86917 1951 1986 2016 M 1986-01-01 2016-06-30 2001
US68243Q1067 86917 1951 1986 2016 M 1986-01-01 2016-06-30 2002
US68243Q1067 86917 1951 1986 2016 M 1986-01-01 2016-06-30 2003
US68243Q1067 86917 1951 1986 2016 M 1986-01-01 2016-06-30 2004
US68243Q1067 86917 1951 1986 2016 M 1986-01-01 2016-06-30 2005
US68243Q1067 86917 1951 1986 2016 M 1986-01-01 2016-06-30 2006
US68243Q1067 86917 1951 1986 2016 M 1986-01-01 2016-06-30 2007
US68243Q1067 86917 1951 1986 2016 M 1986-01-01 2016-06-30 2008
US68243Q1067 86917 1951 1986 2016 M 1986-01-01 2016-06-30 2009
US68243Q1067 86917 1951 1986 2016 M 1986-01-01 2016-06-30 2010
US68243Q1067 86917 1951 1986 2016 M 1986-01-01 2016-06-30 2011
US68243Q1067 86917 1951 1986 2016 M 1986-01-01 2016-06-30 2012
US68243Q1067 86917 1951 1986 2016 M 1986-01-01 2016-06-30 2013
US68243Q1067 86917 1951 1986 2016 M 1986-01-01 2016-06-30 2014
US68243Q1067 86917 1951 1986 2016 M 1986-01-01 2016-06-30 2015
US68243Q1067 86917 1951 1986 2016 M 1986-01-01 2016-06-30 2016
US68243Q1067 86917 1951 1976 1986 M 1976-04-01 1986-01-01 1976
US68243Q1067 86917 1951 1976 1986 M 1976-04-01 1986-01-01 1977
US68243Q1067 86917 1951 1976 1986 M 1976-04-01 1986-01-01 1978
US68243Q1067 86917 1951 1976 1986 M 1976-04-01 1986-01-01 1979
US68243Q1067 86917 1951 1976 1986 M 1976-04-01 1986-01-01 1980
US68243Q1067 86917 1951 1976 1986 M 1976-04-01 1986-01-01 1981
US68243Q1067 86917 1951 1976 1986 M 1976-04-01 1986-01-01 1982
US68243Q1067 86917 1951 1976 1986 M 1976-04-01 1986-01-01 1983
US68243Q1067 86917 1951 1976 1986 M 1976-04-01 1986-01-01 1984
US68243Q1067 86917 1951 1976 1986 M 1976-04-01 1986-01-01 1985
US68243Q1067 86917 1951 1976 1986 M 1976-04-01 1986-01-01 1986
US68243Q1067 327069 1961 2016 2020 M 2016-06-30 2020-05-21 2016
US68243Q1067 327069 1961 2016 2020 M 2016-06-30 2020-05-21 2017
US68243Q1067 327069 1961 2016 2020 M 2016-06-30 2020-05-21 2018
US68243Q1067 327069 1961 2016 2020 M 2016-06-30 2020-05-21 2019
US68243Q1067 327069 1961 2016 2020 M 2016-06-30 2020-05-21 2020
您可以在 ROLE_START
和 ROLE_END
之间创建一个序列,并在不同的行中获取数据。
library(dplyr)
df %>%
mutate(YEAR = purrr::map2(ROLE_START, ROLE_END, seq)) %>%
tidyr::unnest(YEAR)
# isin directorid dob_Year2 ROLE_START ROLE_END gender datestartrole dateendrole YEAR
# <chr> <int> <int> <int> <int> <chr> <chr> <chr> <int>
# 1 US6819771048 340769 1970 1995 2003 M 1995-02-01 2003-01-09 1995
# 2 US6819771048 340769 1970 1995 2003 M 1995-02-01 2003-01-09 1996
# 3 US6819771048 340769 1970 1995 2003 M 1995-02-01 2003-01-09 1997
# 4 US6819771048 340769 1970 1995 2003 M 1995-02-01 2003-01-09 1998
# 5 US6819771048 340769 1970 1995 2003 M 1995-02-01 2003-01-09 1999
# 6 US6819771048 340769 1970 1995 2003 M 1995-02-01 2003-01-09 2000
# 7 US6819771048 340769 1970 1995 2003 M 1995-02-01 2003-01-09 2001
# 8 US6819771048 340769 1970 1995 2003 M 1995-02-01 2003-01-09 2002
# 9 US6819771048 340769 1970 1995 2003 M 1995-02-01 2003-01-09 2003
#10 US6819771048 340769 1970 2003 2004 M 2003-01-09 2004-02-24 2003
# … with 52 more rows
我有以下大型数据集的示例数据集 -
isin directorid dob_Year2 ROLE_START ROLE_END gender datestartrole dateendrole
US6819771048 340769 1970 1995 2003 M 1995-02-01 2003-01-09
US6819771048 340769 1970 2003 2004 M 2003-01-09 2004-02-24
US6819771048 340769 1970 2004 2007 M 2004-02-24 2007-09-07
US68243Q1067 86917 1951 1986 2016 M 1986-01-01 2016-06-30
US68243Q1067 86917 1951 1976 1986 M 1976-04-01 1986-01-01
US68243Q1067 327069 1961 2016 2020 M 2016-06-30 2020-05-21
我的问题是-
我想根据变量 ROLE_START
和 ROLE_END
创建 New Rows。的数量
要创建的行取决于 MINIMUM ROLE_START
和 MAXIMUM ROLE_END
假设数据按 [=16= 分组] 和 directorid
。例如 - 对于 isin
US6819771048 和 directorid
340769,MINIMUM ROLE_START
年份是 1995 和 MAXIMUM ROLE_END
年是 2007 年。所以我需要为 1995-2007 年的每一年打开行,这些年份应该存储在 YEAR
变量中。 请注意,在上面的示例中,1995-2007 之间没有中断,因为从 ROLE_START
和 ROLE_END
可以清楚地看出所有年份都包括在内。如果任何年份之间有任何中断,则应排除这些中断。对于上面的样本数据集,我预期的数据集应该是这样的 -
isin directorid dob_Year2 ROLE_START ROLE_END gender datestartrole dateendrole YEAR
US6819771048 340769 1970 1995 2003 M 1995-02-01 2003-01-09 1995
US6819771048 340769 1970 1995 2003 M 1995-02-01 2003-01-09 1996
US6819771048 340769 1970 1995 2003 M 1995-02-01 2003-01-09 1997
US6819771048 340769 1970 1995 2003 M 1995-02-01 2003-01-09 1998
US6819771048 340769 1970 1995 2003 M 1995-02-01 2003-01-09 1999
US6819771048 340769 1970 1995 2003 M 1995-02-01 2003-01-09 2000
US6819771048 340769 1970 1995 2003 M 1995-02-01 2003-01-09 2001
US6819771048 340769 1970 1995 2003 M 1995-02-01 2003-01-09 2002
US6819771048 340769 1970 1995 2003 M 1995-02-01 2003-01-09 2003
US6819771048 340769 1970 2003 2004 M 2003-01-09 2004-02-24 2003
US6819771048 340769 1970 2003 2004 M 2003-01-09 2004-02-24 2004
US6819771048 340769 1970 2004 2007 M 2004-02-24 2007-09-07 2004
US6819771048 340769 1970 2004 2007 M 2004-02-24 2007-09-07 2005
US6819771048 340769 1970 2004 2007 M 2004-02-24 2007-09-07 2006
US6819771048 340769 1970 2004 2007 M 2004-02-24 2007-09-07 2007
US68243Q1067 86917 1951 1986 2016 M 1986-01-01 2016-06-30 1986
US68243Q1067 86917 1951 1986 2016 M 1986-01-01 2016-06-30 1987
US68243Q1067 86917 1951 1986 2016 M 1986-01-01 2016-06-30 1988
US68243Q1067 86917 1951 1986 2016 M 1986-01-01 2016-06-30 1989
US68243Q1067 86917 1951 1986 2016 M 1986-01-01 2016-06-30 1990
US68243Q1067 86917 1951 1986 2016 M 1986-01-01 2016-06-30 1991
US68243Q1067 86917 1951 1986 2016 M 1986-01-01 2016-06-30 1992
US68243Q1067 86917 1951 1986 2016 M 1986-01-01 2016-06-30 1993
US68243Q1067 86917 1951 1986 2016 M 1986-01-01 2016-06-30 1994
US68243Q1067 86917 1951 1986 2016 M 1986-01-01 2016-06-30 1995
US68243Q1067 86917 1951 1986 2016 M 1986-01-01 2016-06-30 1996
US68243Q1067 86917 1951 1986 2016 M 1986-01-01 2016-06-30 1997
US68243Q1067 86917 1951 1986 2016 M 1986-01-01 2016-06-30 1998
US68243Q1067 86917 1951 1986 2016 M 1986-01-01 2016-06-30 1999
US68243Q1067 86917 1951 1986 2016 M 1986-01-01 2016-06-30 2000
US68243Q1067 86917 1951 1986 2016 M 1986-01-01 2016-06-30 2001
US68243Q1067 86917 1951 1986 2016 M 1986-01-01 2016-06-30 2002
US68243Q1067 86917 1951 1986 2016 M 1986-01-01 2016-06-30 2003
US68243Q1067 86917 1951 1986 2016 M 1986-01-01 2016-06-30 2004
US68243Q1067 86917 1951 1986 2016 M 1986-01-01 2016-06-30 2005
US68243Q1067 86917 1951 1986 2016 M 1986-01-01 2016-06-30 2006
US68243Q1067 86917 1951 1986 2016 M 1986-01-01 2016-06-30 2007
US68243Q1067 86917 1951 1986 2016 M 1986-01-01 2016-06-30 2008
US68243Q1067 86917 1951 1986 2016 M 1986-01-01 2016-06-30 2009
US68243Q1067 86917 1951 1986 2016 M 1986-01-01 2016-06-30 2010
US68243Q1067 86917 1951 1986 2016 M 1986-01-01 2016-06-30 2011
US68243Q1067 86917 1951 1986 2016 M 1986-01-01 2016-06-30 2012
US68243Q1067 86917 1951 1986 2016 M 1986-01-01 2016-06-30 2013
US68243Q1067 86917 1951 1986 2016 M 1986-01-01 2016-06-30 2014
US68243Q1067 86917 1951 1986 2016 M 1986-01-01 2016-06-30 2015
US68243Q1067 86917 1951 1986 2016 M 1986-01-01 2016-06-30 2016
US68243Q1067 86917 1951 1976 1986 M 1976-04-01 1986-01-01 1976
US68243Q1067 86917 1951 1976 1986 M 1976-04-01 1986-01-01 1977
US68243Q1067 86917 1951 1976 1986 M 1976-04-01 1986-01-01 1978
US68243Q1067 86917 1951 1976 1986 M 1976-04-01 1986-01-01 1979
US68243Q1067 86917 1951 1976 1986 M 1976-04-01 1986-01-01 1980
US68243Q1067 86917 1951 1976 1986 M 1976-04-01 1986-01-01 1981
US68243Q1067 86917 1951 1976 1986 M 1976-04-01 1986-01-01 1982
US68243Q1067 86917 1951 1976 1986 M 1976-04-01 1986-01-01 1983
US68243Q1067 86917 1951 1976 1986 M 1976-04-01 1986-01-01 1984
US68243Q1067 86917 1951 1976 1986 M 1976-04-01 1986-01-01 1985
US68243Q1067 86917 1951 1976 1986 M 1976-04-01 1986-01-01 1986
US68243Q1067 327069 1961 2016 2020 M 2016-06-30 2020-05-21 2016
US68243Q1067 327069 1961 2016 2020 M 2016-06-30 2020-05-21 2017
US68243Q1067 327069 1961 2016 2020 M 2016-06-30 2020-05-21 2018
US68243Q1067 327069 1961 2016 2020 M 2016-06-30 2020-05-21 2019
US68243Q1067 327069 1961 2016 2020 M 2016-06-30 2020-05-21 2020
您可以在 ROLE_START
和 ROLE_END
之间创建一个序列,并在不同的行中获取数据。
library(dplyr)
df %>%
mutate(YEAR = purrr::map2(ROLE_START, ROLE_END, seq)) %>%
tidyr::unnest(YEAR)
# isin directorid dob_Year2 ROLE_START ROLE_END gender datestartrole dateendrole YEAR
# <chr> <int> <int> <int> <int> <chr> <chr> <chr> <int>
# 1 US6819771048 340769 1970 1995 2003 M 1995-02-01 2003-01-09 1995
# 2 US6819771048 340769 1970 1995 2003 M 1995-02-01 2003-01-09 1996
# 3 US6819771048 340769 1970 1995 2003 M 1995-02-01 2003-01-09 1997
# 4 US6819771048 340769 1970 1995 2003 M 1995-02-01 2003-01-09 1998
# 5 US6819771048 340769 1970 1995 2003 M 1995-02-01 2003-01-09 1999
# 6 US6819771048 340769 1970 1995 2003 M 1995-02-01 2003-01-09 2000
# 7 US6819771048 340769 1970 1995 2003 M 1995-02-01 2003-01-09 2001
# 8 US6819771048 340769 1970 1995 2003 M 1995-02-01 2003-01-09 2002
# 9 US6819771048 340769 1970 1995 2003 M 1995-02-01 2003-01-09 2003
#10 US6819771048 340769 1970 2003 2004 M 2003-01-09 2004-02-24 2003
# … with 52 more rows