如何根据 r 中的 2 个日期列打开新行

How to open new rows based on 2 date columns in r

我有以下大型数据集的示例数据集 -

     isin       directorid dob_Year2 ROLE_START ROLE_END gender datestartrole dateendrole
 US6819771048     340769      1970       1995     2003      M    1995-02-01  2003-01-09
 US6819771048     340769      1970       2003     2004      M    2003-01-09  2004-02-24
 US6819771048     340769      1970       2004     2007      M    2004-02-24  2007-09-07
 US68243Q1067      86917      1951       1986     2016      M    1986-01-01  2016-06-30
 US68243Q1067      86917      1951       1976     1986      M    1976-04-01  1986-01-01
 US68243Q1067     327069      1961       2016     2020      M    2016-06-30  2020-05-21

我的问题是- 我想根据变量 ROLE_STARTROLE_END 创建 New Rows。的数量 要创建的行取决于 MINIMUM ROLE_STARTMAXIMUM ROLE_END 假设数据按 [=16= 分组] 和 directorid。例如 - 对于 isin US6819771048 和 directorid 340769,MINIMUM ROLE_START 年份是 1995 和 MAXIMUM ROLE_END 年是 2007 年。所以我需要为 1995-2007 年的每一年打开行,这些年份应该存储在 YEAR 变量中。 请注意,在上面的示例中,1995-2007 之间没有中断,因为从 ROLE_STARTROLE_END 可以清楚地看出所有年份都包括在内。如果任何年份之间有任何中断,则应排除这些中断。对于上面的样本数据集,我预期的数据集应该是这样的 -

     isin       directorid dob_Year2 ROLE_START ROLE_END gender datestartrole dateendrole YEAR
US6819771048     340769      1970       1995     2003      M    1995-02-01  2003-01-09   1995
US6819771048     340769      1970       1995     2003      M    1995-02-01  2003-01-09   1996
US6819771048     340769      1970       1995     2003      M    1995-02-01  2003-01-09   1997
US6819771048     340769      1970       1995     2003      M    1995-02-01  2003-01-09   1998
US6819771048     340769      1970       1995     2003      M    1995-02-01  2003-01-09   1999
US6819771048     340769      1970       1995     2003      M    1995-02-01  2003-01-09   2000
US6819771048     340769      1970       1995     2003      M    1995-02-01  2003-01-09   2001
US6819771048     340769      1970       1995     2003      M    1995-02-01  2003-01-09   2002
US6819771048     340769      1970       1995     2003      M    1995-02-01  2003-01-09   2003
US6819771048     340769      1970       2003     2004      M    2003-01-09  2004-02-24   2003
US6819771048     340769      1970       2003     2004      M    2003-01-09  2004-02-24   2004
US6819771048     340769      1970       2004     2007      M    2004-02-24  2007-09-07   2004
US6819771048     340769      1970       2004     2007      M    2004-02-24  2007-09-07   2005
US6819771048     340769      1970       2004     2007      M    2004-02-24  2007-09-07   2006
US6819771048     340769      1970       2004     2007      M    2004-02-24  2007-09-07   2007
US68243Q1067      86917      1951       1986     2016      M    1986-01-01  2016-06-30   1986
US68243Q1067      86917      1951       1986     2016      M    1986-01-01  2016-06-30   1987
US68243Q1067      86917      1951       1986     2016      M    1986-01-01  2016-06-30   1988
US68243Q1067      86917      1951       1986     2016      M    1986-01-01  2016-06-30   1989
US68243Q1067      86917      1951       1986     2016      M    1986-01-01  2016-06-30   1990
US68243Q1067      86917      1951       1986     2016      M    1986-01-01  2016-06-30   1991
US68243Q1067      86917      1951       1986     2016      M    1986-01-01  2016-06-30   1992
US68243Q1067      86917      1951       1986     2016      M    1986-01-01  2016-06-30   1993
US68243Q1067      86917      1951       1986     2016      M    1986-01-01  2016-06-30   1994
US68243Q1067      86917      1951       1986     2016      M    1986-01-01  2016-06-30   1995
US68243Q1067      86917      1951       1986     2016      M    1986-01-01  2016-06-30   1996
US68243Q1067      86917      1951       1986     2016      M    1986-01-01  2016-06-30   1997
US68243Q1067      86917      1951       1986     2016      M    1986-01-01  2016-06-30   1998
US68243Q1067      86917      1951       1986     2016      M    1986-01-01  2016-06-30   1999
US68243Q1067      86917      1951       1986     2016      M    1986-01-01  2016-06-30   2000
US68243Q1067      86917      1951       1986     2016      M    1986-01-01  2016-06-30   2001
US68243Q1067      86917      1951       1986     2016      M    1986-01-01  2016-06-30   2002
US68243Q1067      86917      1951       1986     2016      M    1986-01-01  2016-06-30   2003
US68243Q1067      86917      1951       1986     2016      M    1986-01-01  2016-06-30   2004
US68243Q1067      86917      1951       1986     2016      M    1986-01-01  2016-06-30   2005
US68243Q1067      86917      1951       1986     2016      M    1986-01-01  2016-06-30   2006
US68243Q1067      86917      1951       1986     2016      M    1986-01-01  2016-06-30   2007
US68243Q1067      86917      1951       1986     2016      M    1986-01-01  2016-06-30   2008
US68243Q1067      86917      1951       1986     2016      M    1986-01-01  2016-06-30   2009
US68243Q1067      86917      1951       1986     2016      M    1986-01-01  2016-06-30   2010
US68243Q1067      86917      1951       1986     2016      M    1986-01-01  2016-06-30   2011
US68243Q1067      86917      1951       1986     2016      M    1986-01-01  2016-06-30   2012
US68243Q1067      86917      1951       1986     2016      M    1986-01-01  2016-06-30   2013
US68243Q1067      86917      1951       1986     2016      M    1986-01-01  2016-06-30   2014
US68243Q1067      86917      1951       1986     2016      M    1986-01-01  2016-06-30   2015
US68243Q1067      86917      1951       1986     2016      M    1986-01-01  2016-06-30   2016
US68243Q1067      86917      1951       1976     1986      M    1976-04-01  1986-01-01   1976
US68243Q1067      86917      1951       1976     1986      M    1976-04-01  1986-01-01   1977
US68243Q1067      86917      1951       1976     1986      M    1976-04-01  1986-01-01   1978
US68243Q1067      86917      1951       1976     1986      M    1976-04-01  1986-01-01   1979
US68243Q1067      86917      1951       1976     1986      M    1976-04-01  1986-01-01   1980
US68243Q1067      86917      1951       1976     1986      M    1976-04-01  1986-01-01   1981
US68243Q1067      86917      1951       1976     1986      M    1976-04-01  1986-01-01   1982
US68243Q1067      86917      1951       1976     1986      M    1976-04-01  1986-01-01   1983
US68243Q1067      86917      1951       1976     1986      M    1976-04-01  1986-01-01   1984
US68243Q1067      86917      1951       1976     1986      M    1976-04-01  1986-01-01   1985
US68243Q1067      86917      1951       1976     1986      M    1976-04-01  1986-01-01   1986
US68243Q1067     327069      1961       2016     2020      M    2016-06-30  2020-05-21   2016
US68243Q1067     327069      1961       2016     2020      M    2016-06-30  2020-05-21   2017
US68243Q1067     327069      1961       2016     2020      M    2016-06-30  2020-05-21   2018
US68243Q1067     327069      1961       2016     2020      M    2016-06-30  2020-05-21   2019
US68243Q1067     327069      1961       2016     2020      M    2016-06-30  2020-05-21   2020

您可以在 ROLE_STARTROLE_END 之间创建一个序列,并在不同的行中获取数据。

library(dplyr)
df %>%
  mutate(YEAR = purrr::map2(ROLE_START, ROLE_END, seq)) %>%
  tidyr::unnest(YEAR)


#   isin         directorid dob_Year2 ROLE_START ROLE_END gender datestartrole dateendrole  YEAR
#   <chr>             <int>     <int>      <int>    <int> <chr>  <chr>         <chr>       <int>
# 1 US6819771048     340769      1970       1995     2003 M      1995-02-01    2003-01-09   1995
# 2 US6819771048     340769      1970       1995     2003 M      1995-02-01    2003-01-09   1996
# 3 US6819771048     340769      1970       1995     2003 M      1995-02-01    2003-01-09   1997
# 4 US6819771048     340769      1970       1995     2003 M      1995-02-01    2003-01-09   1998
# 5 US6819771048     340769      1970       1995     2003 M      1995-02-01    2003-01-09   1999
# 6 US6819771048     340769      1970       1995     2003 M      1995-02-01    2003-01-09   2000
# 7 US6819771048     340769      1970       1995     2003 M      1995-02-01    2003-01-09   2001
# 8 US6819771048     340769      1970       1995     2003 M      1995-02-01    2003-01-09   2002
# 9 US6819771048     340769      1970       1995     2003 M      1995-02-01    2003-01-09   2003
#10 US6819771048     340769      1970       2003     2004 M      2003-01-09    2004-02-24   2003
# … with 52 more rows