R如何找到具有特定值的最新行
R How can I find the most recent row with a certain value
晚上好,
我在 R 中有一个非常大的数据集,我正试图找到循环遍历它以解决一些问题的最佳方法。将数据想象成员工的历史工作时间。它看起来像:
rawTable:
Department Name Date Hours
Engineering Mary 2021-01-01 8
Engineering Mary 2021-01-02 8
Engineering Mary 2021-01-03 0
Engineering Mary 2021-01-04 6
Sales Barry 2021-01-01 0
Sales Barry 2021-01-02 12
Sales Barry 2021-01-03 12
Sales Barry 2021-01-04 12
我的名单上大约有 3,200 人,一年中的每一天都是一行,所以 table 显然很大。
我需要向 table 添加两列:
第一个是显示(每天)他们最后一天休息的 LDO
第二个是 WSH,显示该人自最后一天休息后工作了多少小时。看起来像:
rawTable:
Department Name Date Hours LDO WSH
Engineering Mary 2021-01-01 8 2020-12-31 8
Engineering Mary 2021-01-02 8 2020-12-31 16
Engineering Mary 2021-01-03 0 2021-01-03 0
Engineering Mary 2021-01-04 6 2021-01-03 6
Sales Barry 2021-01-01 0 2021-01-01 0
Sales Barry 2021-01-02 12 2021-01-01 12
Sales Barry 2021-01-03 12 2021-01-01 24
Sales Barry 2021-01-04 12 2021-01-01 36
我试过使用 for 循环让它逐行应用逻辑。对于每一行,如果小时数等于零,则 LDO=Date 且 WSH=0。如果不是,则 LDO=LDO 来自前一行,WSH=WSH 来自前几个小时。使用此大小设置,运行.
需要永远半
接下来我创建了一个函数,给定一行,使用大列表的副本,并基于“哪个”语句告诉我该人在该行之前 0 小时工作的最后一天的行号日期。这也花了半天。除此之外,我什至没有接触到 WSH 部分。看起来像:
rawLU <- rawTable
LDO = function(x) {
max(c(0, which((rawLU$Name == x["Name"]) &
(rawLU$Hours == 0) & (rawLU$Date <= x[Date])
)))
}
LastOff<-apply(rawTable,1,LDO)
我知道有更简单的方法,但我也知道我似乎想不通。
有人可以帮忙吗?提前致谢!
麦克
这是 dplyr
-
的可能解决方案
如果Hours = 0
获取Date
值,使用fill
获取其他行上的前一个非工作日期。 WSH
可以用cumsum
.
来计算
library(dplyr)
library(tidyr)
rawTable %>%
mutate(Date = as.Date(Date)) %>%
group_by(Department, Name) %>%
mutate(LDO = if_else(Hours == 0, Date, as.Date(NA))) %>%
fill(LDO) %>%
mutate(LDO = if_else(is.na(LDO), min(Date) - 1, LDO)) %>%
group_by(LDO, .add = TRUE) %>%
mutate(WSH = cumsum(Hours)) %>%
ungroup
# Department Name Date Hours LDO WSH
# <chr> <chr> <date> <int> <date> <int>
#1 Engineering Mary 2021-01-01 8 2020-12-31 8
#2 Engineering Mary 2021-01-02 8 2020-12-31 16
#3 Engineering Mary 2021-01-03 0 2021-01-03 0
#4 Engineering Mary 2021-01-04 6 2021-01-03 6
#5 Sales Barry 2021-01-01 0 2021-01-01 0
#6 Sales Barry 2021-01-02 12 2021-01-01 12
#7 Sales Barry 2021-01-03 12 2021-01-01 24
#8 Sales Barry 2021-01-04 12 2021-01-01 36
数据
rawTable <- structure(list(Department = c("Engineering", "Engineering", "Engineering",
"Engineering", "Sales", "Sales", "Sales", "Sales"), Name = c("Mary",
"Mary", "Mary", "Mary", "Barry", "Barry", "Barry", "Barry"),
Date = c("2021-01-01", "2021-01-02", "2021-01-03", "2021-01-04",
"2021-01-01", "2021-01-02", "2021-01-03", "2021-01-04"),
Hours = c(8L, 8L, 0L, 6L, 0L, 12L, 12L, 12L)), class = "data.frame", row.names = c(NA, -8L))
df1 %>%
group_by(Department, Name, grp = cumsum(Hours==0)) %>%
mutate(Date = as.Date(Date),
LDO = first(Date) - (first(Hours)>0),
WHS = cumsum(Hours))
# A tibble: 8 x 7
# Groups: Department, Name, grp [3]
Department Name Date Hours grp LDO WHS
<chr> <chr> <date> <int> <int> <date> <int>
1 Engineering Mary 2021-01-01 8 0 2020-12-31 8
2 Engineering Mary 2021-01-02 8 0 2020-12-31 16
3 Engineering Mary 2021-01-03 0 1 2021-01-03 0
4 Engineering Mary 2021-01-04 6 1 2021-01-03 6
5 Sales Barry 2021-01-01 0 2 2021-01-01 0
6 Sales Barry 2021-01-02 12 2 2021-01-01 12
7 Sales Barry 2021-01-03 12 2 2021-01-01 24
8 Sales Barry 2021-01-04 12 2 2021-01-01 36
晚上好,
我在 R 中有一个非常大的数据集,我正试图找到循环遍历它以解决一些问题的最佳方法。将数据想象成员工的历史工作时间。它看起来像:
rawTable:
Department Name Date Hours
Engineering Mary 2021-01-01 8
Engineering Mary 2021-01-02 8
Engineering Mary 2021-01-03 0
Engineering Mary 2021-01-04 6
Sales Barry 2021-01-01 0
Sales Barry 2021-01-02 12
Sales Barry 2021-01-03 12
Sales Barry 2021-01-04 12
我的名单上大约有 3,200 人,一年中的每一天都是一行,所以 table 显然很大。
我需要向 table 添加两列:
第一个是显示(每天)他们最后一天休息的 LDO
第二个是 WSH,显示该人自最后一天休息后工作了多少小时。看起来像:
rawTable:
Department Name Date Hours LDO WSH
Engineering Mary 2021-01-01 8 2020-12-31 8
Engineering Mary 2021-01-02 8 2020-12-31 16
Engineering Mary 2021-01-03 0 2021-01-03 0
Engineering Mary 2021-01-04 6 2021-01-03 6
Sales Barry 2021-01-01 0 2021-01-01 0
Sales Barry 2021-01-02 12 2021-01-01 12
Sales Barry 2021-01-03 12 2021-01-01 24
Sales Barry 2021-01-04 12 2021-01-01 36
我试过使用 for 循环让它逐行应用逻辑。对于每一行,如果小时数等于零,则 LDO=Date 且 WSH=0。如果不是,则 LDO=LDO 来自前一行,WSH=WSH 来自前几个小时。使用此大小设置,运行.
需要永远半接下来我创建了一个函数,给定一行,使用大列表的副本,并基于“哪个”语句告诉我该人在该行之前 0 小时工作的最后一天的行号日期。这也花了半天。除此之外,我什至没有接触到 WSH 部分。看起来像:
rawLU <- rawTable
LDO = function(x) {
max(c(0, which((rawLU$Name == x["Name"]) &
(rawLU$Hours == 0) & (rawLU$Date <= x[Date])
)))
}
LastOff<-apply(rawTable,1,LDO)
我知道有更简单的方法,但我也知道我似乎想不通。
有人可以帮忙吗?提前致谢!
麦克
这是 dplyr
-
如果Hours = 0
获取Date
值,使用fill
获取其他行上的前一个非工作日期。 WSH
可以用cumsum
.
library(dplyr)
library(tidyr)
rawTable %>%
mutate(Date = as.Date(Date)) %>%
group_by(Department, Name) %>%
mutate(LDO = if_else(Hours == 0, Date, as.Date(NA))) %>%
fill(LDO) %>%
mutate(LDO = if_else(is.na(LDO), min(Date) - 1, LDO)) %>%
group_by(LDO, .add = TRUE) %>%
mutate(WSH = cumsum(Hours)) %>%
ungroup
# Department Name Date Hours LDO WSH
# <chr> <chr> <date> <int> <date> <int>
#1 Engineering Mary 2021-01-01 8 2020-12-31 8
#2 Engineering Mary 2021-01-02 8 2020-12-31 16
#3 Engineering Mary 2021-01-03 0 2021-01-03 0
#4 Engineering Mary 2021-01-04 6 2021-01-03 6
#5 Sales Barry 2021-01-01 0 2021-01-01 0
#6 Sales Barry 2021-01-02 12 2021-01-01 12
#7 Sales Barry 2021-01-03 12 2021-01-01 24
#8 Sales Barry 2021-01-04 12 2021-01-01 36
数据
rawTable <- structure(list(Department = c("Engineering", "Engineering", "Engineering",
"Engineering", "Sales", "Sales", "Sales", "Sales"), Name = c("Mary",
"Mary", "Mary", "Mary", "Barry", "Barry", "Barry", "Barry"),
Date = c("2021-01-01", "2021-01-02", "2021-01-03", "2021-01-04",
"2021-01-01", "2021-01-02", "2021-01-03", "2021-01-04"),
Hours = c(8L, 8L, 0L, 6L, 0L, 12L, 12L, 12L)), class = "data.frame", row.names = c(NA, -8L))
df1 %>%
group_by(Department, Name, grp = cumsum(Hours==0)) %>%
mutate(Date = as.Date(Date),
LDO = first(Date) - (first(Hours)>0),
WHS = cumsum(Hours))
# A tibble: 8 x 7
# Groups: Department, Name, grp [3]
Department Name Date Hours grp LDO WHS
<chr> <chr> <date> <int> <int> <date> <int>
1 Engineering Mary 2021-01-01 8 0 2020-12-31 8
2 Engineering Mary 2021-01-02 8 0 2020-12-31 16
3 Engineering Mary 2021-01-03 0 1 2021-01-03 0
4 Engineering Mary 2021-01-04 6 1 2021-01-03 6
5 Sales Barry 2021-01-01 0 2 2021-01-01 0
6 Sales Barry 2021-01-02 12 2 2021-01-01 12
7 Sales Barry 2021-01-03 12 2 2021-01-01 24
8 Sales Barry 2021-01-04 12 2 2021-01-01 36