将日期列映射到 R 中从 1 开始的唯一序列号

Map date column to unique sequential numbers starting from 1 in R

我在数据框中有一列日期,其中每个日期通常重复几次。这是我的数据框的示例,它在其他列中也有一些运动队的名称:

dput(mydf)
structure(list(date_game = structure(c(15643, 15643, 15643, 15644, 
15644, 15644, 15646, 15646), class = "Date"), team_id = c("WAS", 
"CLE", "LAL", "SAC", "CHI", "DET", "BOS", "MIL"), fran_id = c("Wizards", 
"Cavaliers", "Lakers", "Kings", "Bulls", "Pistons", "Celtics", 
"Bucks")), .Names = c("date_game", "team_id", "fran_id"), row.names = c(1L, 
2L, 3L, 7L, 8L, 9L, 29L, 30L), class = "data.frame")

在这种情况下,mydf 有 3 个不同的日期,并且还跳过了一个日期。我的完整数据框有数百个唯一日期。对于这个例子,我有兴趣向数据框添加一个新列(称之为 date_number),它看起来像这样:

mydf
    date_game team_id   fran_id  date_number
1  2012-10-30     WAS   Wizards            1
2  2012-10-30     CLE Cavaliers            1
3  2012-10-30     LAL    Lakers            1
7  2012-10-31     SAC     Kings            2
8  2012-10-31     CHI     Bulls            2
9  2012-10-31     DET   Pistons            2
29 2012-11-02     BOS   Celtics            3
30 2012-11-02     MIL     Bucks            3

如标题所说 - 从 date_number 列中的 1 开始,我想要增加日期的序号。其中的关键部分是该列是连续的,即使某些日期缺失也是如此。虽然 11-01 不存在,但 11-02 仍然设置为 3,而不是 4。

任何关于如何做到这一点的想法将不胜感激!

您可以使用来自 data.tablerleid 执行此操作:

library(data.table)

setDT(df)[, date_number := rleid(date_game)]

结果:

> df
    date_game team_id   fran_id date_number
1: 2012-10-30     WAS   Wizards           1
2: 2012-10-30     CLE Cavaliers           1
3: 2012-10-30     LAL    Lakers           1
4: 2012-10-31     SAC     Kings           2
5: 2012-10-31     CHI     Bulls           2
6: 2012-10-31     DET   Pistons           2
7: 2012-11-02     BOS   Celtics           3
8: 2012-11-02     MIL     Bucks           3

如@Mike H. 所述,您也可以从 data.table 窃取 rleid 函数而不转换 df:

df$date_numbers <- data.table::rleid(df$date_game)

Base R 的另一个选项:

df$date_numbers <- rep(seq_along(unique(df$date_game)), 
                       rle(as.integer(df$date_game))$lengths)

您可以使用

mydf$date_number = as.integer(as.factor(mydf$date_game))

另一个稍微深奥的选项:

mydf$date_numbers <- cumsum(c(1, tail(!(mydf$date_game == lag(mydf$date_game)), - 1)))