使用基于时间戳的时间戳在 R 中创建整齐的数据 "size"
Create tidy data in R with time stamps based on time stap "size"
我正在分析与我们生产的不同产品相关的每个流程的周期时间变化。我们的 SAP 数据包含工人的开始和完成日志条目,objective 用于使用此信息计算周期时间。
但是,SAP 在一列中导出开始和结束时间戳,并且没有可用的参考列说明开始时间和结束时间。这使得用例如整理数据无法传播。
当前数据
- 160 万行
- 150 次操作
- 1万个订单
。一小部分数据样本如下所示。
Order <- rep(c(1059866,1059891),each = 4)
Operation <- rep(c(1510,1550),4)
Timestamp <- c("30-11-2016 07:33:30", "30-11-2016 07:33:42", "30-11-2016 16:00:13", "30-11-2016 16:00:18", "30-11-2016 07:35:21", "30-11-2016 07:35:43", "30-11-2016 16:00:43", "30-11-2016 16:00:39")
df_current <- cbind(Order, Operation, Timestamp)
每个流程步骤都需要此开始和结束信息 ("Operation")。
从逻辑上讲,最早的时间戳是开始日志条目,最晚的时间戳是完成日志条目。
但是我不知道如何告诉 R 创建一个新列,根据时间戳正确指示哪个时间戳开始和结束。
所需数据
Order <- rep(c(1059866,1059891),each = 4)
Operation <- rep(c(1510,1550),4)
Timestamp <- c("30-11-2016 07:33:30", "30-11-2016 07:33:42", "30-11-2016 16:00:13", "30-11-2016 16:00:18", "30-11-2016 07:35:21", "30-11-2016 07:35:43", "30-11-2016 16:00:43", "30-11-2016 16:00:39")
Status <- c("Start" , "Finish", "Start" , "Finish", "Start" , "Finish", "Finish", "Start")
df_desired <- cbind(Order, Operation, Timestamp, Status)
当数据看起来像那样时,我可以轻松整理数据。
谢谢
假设您可以将数据转换为 data.frame
而不是 matrix
:
df_current <- data.frame(Order, Operation, Timestamp)
df.With.Status <- do.call(rbind, #rbind the dataframes to a big dataframe
lapply(split(df_current,list(df_current$Order,df_current$Operation)), #split dataframe by unique order/operation combinations and apply function for each combination
function(df){
df$Timestamp <- strptime(rev(as.character(df$Timestamp)),format="%d-%m-%Y %H:%M:%S") #Convert to Time, so that it is sortable
df <- df[order(df$Timestamp),] # rearrange the dataframe in case of wrong order
df$Status <- c("Start","Finish") #add status
return(df)
}))
和dplyr
library(dplyr)
df_current %>% as.data.frame() %>%
group_by(Operation, Order) %>%
mutate(Timestamp = as.POSIXct(Timestamp, format = "%d-%m-%Y %H:%M:%S"),
Status = case_when(Timestamp == min(Timestamp) ~ "Start",
TRUE ~ "Finish")) %>%
arrange(Order, Operation)
# A tibble: 8 x 4
# Groups: Operation, Order [4]
Order Operation Timestamp Status
<fct> <fct> <dttm> <chr>
1 1059866 1510 2016-11-30 07:33:30 Start
2 1059866 1510 2016-11-30 16:00:13 Finish
3 1059866 1550 2016-11-30 07:33:42 Start
4 1059866 1550 2016-11-30 16:00:18 Finish
5 1059891 1510 2016-11-30 07:35:21 Start
6 1059891 1510 2016-11-30 16:00:43 Finish
7 1059891 1550 2016-11-30 07:35:43 Start
8 1059891 1550 2016-11-30 16:00:39 Finish
此外,由于您的数据很大:data.table
library(data.table)
dfc_2 <- as.data.frame(df_current)
dfc_2$Timestamp <- as.POSIXct(Timestamp, format = "%d-%m-%Y %H:%M:%S")
setDT(df_curr)[, Status := ifelse(Timestamp == min(Timestamp), "Start", "Finish"),
keyby = .(Operation, Order)]
我正在分析与我们生产的不同产品相关的每个流程的周期时间变化。我们的 SAP 数据包含工人的开始和完成日志条目,objective 用于使用此信息计算周期时间。
但是,SAP 在一列中导出开始和结束时间戳,并且没有可用的参考列说明开始时间和结束时间。这使得用例如整理数据无法传播。
当前数据
- 160 万行
- 150 次操作
- 1万个订单
。一小部分数据样本如下所示。
Order <- rep(c(1059866,1059891),each = 4)
Operation <- rep(c(1510,1550),4)
Timestamp <- c("30-11-2016 07:33:30", "30-11-2016 07:33:42", "30-11-2016 16:00:13", "30-11-2016 16:00:18", "30-11-2016 07:35:21", "30-11-2016 07:35:43", "30-11-2016 16:00:43", "30-11-2016 16:00:39")
df_current <- cbind(Order, Operation, Timestamp)
每个流程步骤都需要此开始和结束信息 ("Operation")。 从逻辑上讲,最早的时间戳是开始日志条目,最晚的时间戳是完成日志条目。
但是我不知道如何告诉 R 创建一个新列,根据时间戳正确指示哪个时间戳开始和结束。
所需数据
Order <- rep(c(1059866,1059891),each = 4)
Operation <- rep(c(1510,1550),4)
Timestamp <- c("30-11-2016 07:33:30", "30-11-2016 07:33:42", "30-11-2016 16:00:13", "30-11-2016 16:00:18", "30-11-2016 07:35:21", "30-11-2016 07:35:43", "30-11-2016 16:00:43", "30-11-2016 16:00:39")
Status <- c("Start" , "Finish", "Start" , "Finish", "Start" , "Finish", "Finish", "Start")
df_desired <- cbind(Order, Operation, Timestamp, Status)
当数据看起来像那样时,我可以轻松整理数据。
谢谢
假设您可以将数据转换为 data.frame
而不是 matrix
:
df_current <- data.frame(Order, Operation, Timestamp)
df.With.Status <- do.call(rbind, #rbind the dataframes to a big dataframe
lapply(split(df_current,list(df_current$Order,df_current$Operation)), #split dataframe by unique order/operation combinations and apply function for each combination
function(df){
df$Timestamp <- strptime(rev(as.character(df$Timestamp)),format="%d-%m-%Y %H:%M:%S") #Convert to Time, so that it is sortable
df <- df[order(df$Timestamp),] # rearrange the dataframe in case of wrong order
df$Status <- c("Start","Finish") #add status
return(df)
}))
和dplyr
library(dplyr)
df_current %>% as.data.frame() %>%
group_by(Operation, Order) %>%
mutate(Timestamp = as.POSIXct(Timestamp, format = "%d-%m-%Y %H:%M:%S"),
Status = case_when(Timestamp == min(Timestamp) ~ "Start",
TRUE ~ "Finish")) %>%
arrange(Order, Operation)
# A tibble: 8 x 4
# Groups: Operation, Order [4]
Order Operation Timestamp Status
<fct> <fct> <dttm> <chr>
1 1059866 1510 2016-11-30 07:33:30 Start
2 1059866 1510 2016-11-30 16:00:13 Finish
3 1059866 1550 2016-11-30 07:33:42 Start
4 1059866 1550 2016-11-30 16:00:18 Finish
5 1059891 1510 2016-11-30 07:35:21 Start
6 1059891 1510 2016-11-30 16:00:43 Finish
7 1059891 1550 2016-11-30 07:35:43 Start
8 1059891 1550 2016-11-30 16:00:39 Finish
此外,由于您的数据很大:data.table
library(data.table)
dfc_2 <- as.data.frame(df_current)
dfc_2$Timestamp <- as.POSIXct(Timestamp, format = "%d-%m-%Y %H:%M:%S")
setDT(df_curr)[, Status := ifelse(Timestamp == min(Timestamp), "Start", "Finish"),
keyby = .(Operation, Order)]