将日期列映射到 R 中从 1 开始的唯一序列号
Map date column to unique sequential numbers starting from 1 in R
我在数据框中有一列日期,其中每个日期通常重复几次。这是我的数据框的示例,它在其他列中也有一些运动队的名称:
dput(mydf)
structure(list(date_game = structure(c(15643, 15643, 15643, 15644,
15644, 15644, 15646, 15646), class = "Date"), team_id = c("WAS",
"CLE", "LAL", "SAC", "CHI", "DET", "BOS", "MIL"), fran_id = c("Wizards",
"Cavaliers", "Lakers", "Kings", "Bulls", "Pistons", "Celtics",
"Bucks")), .Names = c("date_game", "team_id", "fran_id"), row.names = c(1L,
2L, 3L, 7L, 8L, 9L, 29L, 30L), class = "data.frame")
在这种情况下,mydf 有 3 个不同的日期,并且还跳过了一个日期。我的完整数据框有数百个唯一日期。对于这个例子,我有兴趣向数据框添加一个新列(称之为 date_number),它看起来像这样:
mydf
date_game team_id fran_id date_number
1 2012-10-30 WAS Wizards 1
2 2012-10-30 CLE Cavaliers 1
3 2012-10-30 LAL Lakers 1
7 2012-10-31 SAC Kings 2
8 2012-10-31 CHI Bulls 2
9 2012-10-31 DET Pistons 2
29 2012-11-02 BOS Celtics 3
30 2012-11-02 MIL Bucks 3
如标题所说 - 从 date_number 列中的 1 开始,我想要增加日期的序号。其中的关键部分是该列是连续的,即使某些日期缺失也是如此。虽然 11-01 不存在,但 11-02 仍然设置为 3,而不是 4。
任何关于如何做到这一点的想法将不胜感激!
您可以使用来自 data.table
的 rleid
执行此操作:
library(data.table)
setDT(df)[, date_number := rleid(date_game)]
结果:
> df
date_game team_id fran_id date_number
1: 2012-10-30 WAS Wizards 1
2: 2012-10-30 CLE Cavaliers 1
3: 2012-10-30 LAL Lakers 1
4: 2012-10-31 SAC Kings 2
5: 2012-10-31 CHI Bulls 2
6: 2012-10-31 DET Pistons 2
7: 2012-11-02 BOS Celtics 3
8: 2012-11-02 MIL Bucks 3
如@Mike H. 所述,您也可以从 data.table
窃取 rleid
函数而不转换 df
:
df$date_numbers <- data.table::rleid(df$date_game)
Base R 的另一个选项:
df$date_numbers <- rep(seq_along(unique(df$date_game)),
rle(as.integer(df$date_game))$lengths)
您可以使用
mydf$date_number = as.integer(as.factor(mydf$date_game))
另一个稍微深奥的选项:
mydf$date_numbers <- cumsum(c(1, tail(!(mydf$date_game == lag(mydf$date_game)), - 1)))
我在数据框中有一列日期,其中每个日期通常重复几次。这是我的数据框的示例,它在其他列中也有一些运动队的名称:
dput(mydf)
structure(list(date_game = structure(c(15643, 15643, 15643, 15644,
15644, 15644, 15646, 15646), class = "Date"), team_id = c("WAS",
"CLE", "LAL", "SAC", "CHI", "DET", "BOS", "MIL"), fran_id = c("Wizards",
"Cavaliers", "Lakers", "Kings", "Bulls", "Pistons", "Celtics",
"Bucks")), .Names = c("date_game", "team_id", "fran_id"), row.names = c(1L,
2L, 3L, 7L, 8L, 9L, 29L, 30L), class = "data.frame")
在这种情况下,mydf 有 3 个不同的日期,并且还跳过了一个日期。我的完整数据框有数百个唯一日期。对于这个例子,我有兴趣向数据框添加一个新列(称之为 date_number),它看起来像这样:
mydf
date_game team_id fran_id date_number
1 2012-10-30 WAS Wizards 1
2 2012-10-30 CLE Cavaliers 1
3 2012-10-30 LAL Lakers 1
7 2012-10-31 SAC Kings 2
8 2012-10-31 CHI Bulls 2
9 2012-10-31 DET Pistons 2
29 2012-11-02 BOS Celtics 3
30 2012-11-02 MIL Bucks 3
如标题所说 - 从 date_number 列中的 1 开始,我想要增加日期的序号。其中的关键部分是该列是连续的,即使某些日期缺失也是如此。虽然 11-01 不存在,但 11-02 仍然设置为 3,而不是 4。
任何关于如何做到这一点的想法将不胜感激!
您可以使用来自 data.table
的 rleid
执行此操作:
library(data.table)
setDT(df)[, date_number := rleid(date_game)]
结果:
> df
date_game team_id fran_id date_number
1: 2012-10-30 WAS Wizards 1
2: 2012-10-30 CLE Cavaliers 1
3: 2012-10-30 LAL Lakers 1
4: 2012-10-31 SAC Kings 2
5: 2012-10-31 CHI Bulls 2
6: 2012-10-31 DET Pistons 2
7: 2012-11-02 BOS Celtics 3
8: 2012-11-02 MIL Bucks 3
如@Mike H. 所述,您也可以从 data.table
窃取 rleid
函数而不转换 df
:
df$date_numbers <- data.table::rleid(df$date_game)
Base R 的另一个选项:
df$date_numbers <- rep(seq_along(unique(df$date_game)),
rle(as.integer(df$date_game))$lengths)
您可以使用
mydf$date_number = as.integer(as.factor(mydf$date_game))
另一个稍微深奥的选项:
mydf$date_numbers <- cumsum(c(1, tail(!(mydf$date_game == lag(mydf$date_game)), - 1)))