在 R 中,如何根据多列中的时间和日期值创建具有唯一值的列
In R how do you create a column with unique values based on time and date values in multiple columns
我有一个包含两列的数据框,见下文:
Date Time
2020-01-12 00:00:01
2020-01-12 00:00:02
2020-01-12 00:01:01
2020-01-12 00:01:02
2020-01-12 00:02:01
2020-01-12 00:02:02
2020-01-12 00:10:01
2020-01-12 00:10:02
2020-01-12 00:11:01
2020-01-12 00:11:02
2020-01-12 00:12:01
2020-01-12 00:12:02
2020-01-12 00:20:01
2020-01-12 00:20:02
2020-01-12 00:21:01
2020-01-12 00:21:02
2020-01-12 00:22:01
2020-01-12 00:22:02
2020-01-13 00:00:01
2020-01-13 00:00:02
2020-01-13 00:01:01
2020-01-13 00:01:02
2020-01-13 00:02:01
2020-01-13 00:02:02
2020-01-13 00:10:01
2020-01-13 00:10:02
2020-01-13 00:11:01
2020-01-13 00:11:02
2020-01-13 00:12:01
2020-01-13 00:12:02
2020-01-13 00:20:01
2020-01-13 00:20:02
2020-01-13 00:21:01
2020-01-13 00:21:02
2020-01-13 00:22:01
2020-01-13 00:22:02
我想根据前两列中的值创建第三列,其唯一值从 1 开始到 x。具体来说,我想先关注第二列中的分钟位置,然后再关注第一列中的日期位置,这样结果如下所示:
Date Time Value
2020-01-12 00:00:01 1
2020-01-12 00:00:02 1
2020-01-12 00:01:01 1
2020-01-12 00:01:02 1
2020-01-12 00:02:01 1
2020-01-12 00:02:02 1
2020-01-12 00:10:01 2
2020-01-12 00:10:02 2
2020-01-12 00:11:01 2
2020-01-12 00:11:02 2
2020-01-12 00:12:01 2
2020-01-12 00:12:02 2
2020-01-12 00:20:01 3
2020-01-12 00:20:02 3
2020-01-12 00:21:01 3
2020-01-12 00:21:02 3
2020-01-12 00:22:01 3
2020-01-12 00:22:02 3
2020-01-13 00:00:01 4
2020-01-13 00:00:02 4
2020-01-13 00:01:01 4
2020-01-13 00:01:02 4
2020-01-13 00:02:01 4
2020-01-13 00:02:02 4
2020-01-13 00:10:01 5
2020-01-13 00:10:02 5
2020-01-13 00:11:01 5
2020-01-13 00:11:02 5
2020-01-13 00:12:01 5
2020-01-13 00:12:02 5
2020-01-13 00:20:01 6
2020-01-13 00:20:02 6
2020-01-13 00:21:01 6
2020-01-13 00:21:02 6
2020-01-13 00:22:01 6
2020-01-13 00:22:02 6
这个有用吗:
library(dplyr)
library(data.table)
df %>% mutate(Hour = substr(Time, 4,4), Value = rleid(Hour)) %>% select(-Hour)
Date Time Value
1 2020-01-12 00:00:01 1
2 2020-01-12 00:00:02 1
3 2020-01-12 00:01:01 1
4 2020-01-12 00:01:02 1
5 2020-01-12 00:02:01 1
6 2020-01-12 00:02:02 1
7 2020-01-12 00:10:01 2
8 2020-01-12 00:10:02 2
9 2020-01-12 00:11:01 2
10 2020-01-12 00:11:02 2
11 2020-01-12 00:12:01 2
12 2020-01-12 00:12:02 2
13 2020-01-12 00:20:01 3
14 2020-01-12 00:20:02 3
15 2020-01-12 00:21:01 3
16 2020-01-12 00:21:02 3
17 2020-01-12 00:22:01 3
18 2020-01-12 00:22:02 3
19 2020-01-13 00:00:01 4
20 2020-01-13 00:00:02 4
21 2020-01-13 00:01:01 4
22 2020-01-13 00:01:02 4
23 2020-01-13 00:02:01 4
24 2020-01-13 00:02:02 4
25 2020-01-13 00:10:01 5
26 2020-01-13 00:10:02 5
27 2020-01-13 00:11:01 5
28 2020-01-13 00:11:02 5
29 2020-01-13 00:12:01 5
30 2020-01-13 00:12:02 5
31 2020-01-13 00:20:01 6
32 2020-01-13 00:20:02 6
33 2020-01-13 00:21:01 6
34 2020-01-13 00:21:02 6
35 2020-01-13 00:22:01 6
36 2020-01-13 00:22:02 6
在基础 R 中你可以这样做:
transform(df, Value = cumsum(c(1, diff(as.POSIXlt(paste(Date, Time)))>60)))
Date Time Value
1 2020-01-12 00:00:01 1
2 2020-01-12 00:00:02 1
3 2020-01-12 00:01:01 1
4 2020-01-12 00:01:02 1
5 2020-01-12 00:02:01 1
6 2020-01-12 00:02:02 1
7 2020-01-12 00:10:01 2
8 2020-01-12 00:10:02 2
9 2020-01-12 00:11:01 2
10 2020-01-12 00:11:02 2
11 2020-01-12 00:12:01 2
12 2020-01-12 00:12:02 2
13 2020-01-12 00:20:01 3
14 2020-01-12 00:20:02 3
15 2020-01-12 00:21:01 3
16 2020-01-12 00:21:02 3
17 2020-01-12 00:22:01 3
18 2020-01-12 00:22:02 3
19 2020-01-13 00:00:01 4
20 2020-01-13 00:00:02 4
21 2020-01-13 00:01:01 4
22 2020-01-13 00:01:02 4
23 2020-01-13 00:02:01 4
24 2020-01-13 00:02:02 4
25 2020-01-13 00:10:01 5
26 2020-01-13 00:10:02 5
27 2020-01-13 00:11:01 5
28 2020-01-13 00:11:02 5
29 2020-01-13 00:12:01 5
30 2020-01-13 00:12:02 5
31 2020-01-13 00:20:01 6
32 2020-01-13 00:20:02 6
33 2020-01-13 00:21:01 6
34 2020-01-13 00:21:02 6
35 2020-01-13 00:22:01 6
36 2020-01-13 00:22:02 6
我有一个包含两列的数据框,见下文:
Date Time
2020-01-12 00:00:01
2020-01-12 00:00:02
2020-01-12 00:01:01
2020-01-12 00:01:02
2020-01-12 00:02:01
2020-01-12 00:02:02
2020-01-12 00:10:01
2020-01-12 00:10:02
2020-01-12 00:11:01
2020-01-12 00:11:02
2020-01-12 00:12:01
2020-01-12 00:12:02
2020-01-12 00:20:01
2020-01-12 00:20:02
2020-01-12 00:21:01
2020-01-12 00:21:02
2020-01-12 00:22:01
2020-01-12 00:22:02
2020-01-13 00:00:01
2020-01-13 00:00:02
2020-01-13 00:01:01
2020-01-13 00:01:02
2020-01-13 00:02:01
2020-01-13 00:02:02
2020-01-13 00:10:01
2020-01-13 00:10:02
2020-01-13 00:11:01
2020-01-13 00:11:02
2020-01-13 00:12:01
2020-01-13 00:12:02
2020-01-13 00:20:01
2020-01-13 00:20:02
2020-01-13 00:21:01
2020-01-13 00:21:02
2020-01-13 00:22:01
2020-01-13 00:22:02
我想根据前两列中的值创建第三列,其唯一值从 1 开始到 x。具体来说,我想先关注第二列中的分钟位置,然后再关注第一列中的日期位置,这样结果如下所示:
Date Time Value
2020-01-12 00:00:01 1
2020-01-12 00:00:02 1
2020-01-12 00:01:01 1
2020-01-12 00:01:02 1
2020-01-12 00:02:01 1
2020-01-12 00:02:02 1
2020-01-12 00:10:01 2
2020-01-12 00:10:02 2
2020-01-12 00:11:01 2
2020-01-12 00:11:02 2
2020-01-12 00:12:01 2
2020-01-12 00:12:02 2
2020-01-12 00:20:01 3
2020-01-12 00:20:02 3
2020-01-12 00:21:01 3
2020-01-12 00:21:02 3
2020-01-12 00:22:01 3
2020-01-12 00:22:02 3
2020-01-13 00:00:01 4
2020-01-13 00:00:02 4
2020-01-13 00:01:01 4
2020-01-13 00:01:02 4
2020-01-13 00:02:01 4
2020-01-13 00:02:02 4
2020-01-13 00:10:01 5
2020-01-13 00:10:02 5
2020-01-13 00:11:01 5
2020-01-13 00:11:02 5
2020-01-13 00:12:01 5
2020-01-13 00:12:02 5
2020-01-13 00:20:01 6
2020-01-13 00:20:02 6
2020-01-13 00:21:01 6
2020-01-13 00:21:02 6
2020-01-13 00:22:01 6
2020-01-13 00:22:02 6
这个有用吗:
library(dplyr)
library(data.table)
df %>% mutate(Hour = substr(Time, 4,4), Value = rleid(Hour)) %>% select(-Hour)
Date Time Value
1 2020-01-12 00:00:01 1
2 2020-01-12 00:00:02 1
3 2020-01-12 00:01:01 1
4 2020-01-12 00:01:02 1
5 2020-01-12 00:02:01 1
6 2020-01-12 00:02:02 1
7 2020-01-12 00:10:01 2
8 2020-01-12 00:10:02 2
9 2020-01-12 00:11:01 2
10 2020-01-12 00:11:02 2
11 2020-01-12 00:12:01 2
12 2020-01-12 00:12:02 2
13 2020-01-12 00:20:01 3
14 2020-01-12 00:20:02 3
15 2020-01-12 00:21:01 3
16 2020-01-12 00:21:02 3
17 2020-01-12 00:22:01 3
18 2020-01-12 00:22:02 3
19 2020-01-13 00:00:01 4
20 2020-01-13 00:00:02 4
21 2020-01-13 00:01:01 4
22 2020-01-13 00:01:02 4
23 2020-01-13 00:02:01 4
24 2020-01-13 00:02:02 4
25 2020-01-13 00:10:01 5
26 2020-01-13 00:10:02 5
27 2020-01-13 00:11:01 5
28 2020-01-13 00:11:02 5
29 2020-01-13 00:12:01 5
30 2020-01-13 00:12:02 5
31 2020-01-13 00:20:01 6
32 2020-01-13 00:20:02 6
33 2020-01-13 00:21:01 6
34 2020-01-13 00:21:02 6
35 2020-01-13 00:22:01 6
36 2020-01-13 00:22:02 6
在基础 R 中你可以这样做:
transform(df, Value = cumsum(c(1, diff(as.POSIXlt(paste(Date, Time)))>60)))
Date Time Value
1 2020-01-12 00:00:01 1
2 2020-01-12 00:00:02 1
3 2020-01-12 00:01:01 1
4 2020-01-12 00:01:02 1
5 2020-01-12 00:02:01 1
6 2020-01-12 00:02:02 1
7 2020-01-12 00:10:01 2
8 2020-01-12 00:10:02 2
9 2020-01-12 00:11:01 2
10 2020-01-12 00:11:02 2
11 2020-01-12 00:12:01 2
12 2020-01-12 00:12:02 2
13 2020-01-12 00:20:01 3
14 2020-01-12 00:20:02 3
15 2020-01-12 00:21:01 3
16 2020-01-12 00:21:02 3
17 2020-01-12 00:22:01 3
18 2020-01-12 00:22:02 3
19 2020-01-13 00:00:01 4
20 2020-01-13 00:00:02 4
21 2020-01-13 00:01:01 4
22 2020-01-13 00:01:02 4
23 2020-01-13 00:02:01 4
24 2020-01-13 00:02:02 4
25 2020-01-13 00:10:01 5
26 2020-01-13 00:10:02 5
27 2020-01-13 00:11:01 5
28 2020-01-13 00:11:02 5
29 2020-01-13 00:12:01 5
30 2020-01-13 00:12:02 5
31 2020-01-13 00:20:01 6
32 2020-01-13 00:20:02 6
33 2020-01-13 00:21:01 6
34 2020-01-13 00:21:02 6
35 2020-01-13 00:22:01 6
36 2020-01-13 00:22:02 6