R:取消堆叠带有日期的列
R: Unstack a column with dates
假设我有以下数据框:
df <- data.frame(Order=c("1234567","1234567","1234567","456789","456789"),Stage=c("Pipeline","Proposal","Closed","Pipeline","Lost"),StageChange=c("2008-01-01","2008-01-02","2008-01-03","2008-01-10","2008-01-12"))
导致:
head(df)
Order Stage StageChange
1 1234567 Pipeline 2008-01-01
2 1234567 Proposal 2008-01-02
3 1234567 Closed 2008-01-03
4 456789 Pipeline 2008-01-10
5 456789 Lost 2008-01-12
我需要拆开 "Stage" 列并得到这样的数据框:
Order Pipeline Proposal Closed Lost
1 1234567 2008-01-01 2008-01-02 2008-01-03 NA
2 456789 2008-01-10 NA NA 2008-01-12
我阅读了文档并使用 dplyr 和 tidyr () 尝试了不同的方法,但我的无知是成功的。
有什么想法可以完成我需要的吗?
我的objective,说白了,就是用这个数据来计算一个特定的Order在特定的Stage上停留的天数。有些订单丢失,有些订单已关闭(赢得),这就是为什么有 "NA" 个值的原因。当订单没有更改到特定阶段时也会发生同样的情况(订单可以从管道转到丢失,中间阶段没有任何更改)。
谢谢!
您可能会使用 tidyr::pivot_wider
。那是新版本的 retired-function spread
# install.packages("tidyr")
library(tidyr)
df %>%
pivot_wider(names_from = Stage, values_from = StageChange)
# # A tibble: 2 x 5
# Order Pipeline Proposal Closed Lost
# <fct> <fct> <fct> <fct> <fct>
# 1 1234567 2008-01-01 2008-01-02 2008-01-03 NA
# 2 456789 2008-01-10 NA NA 2008-01-12
使用dplyr::spread
library(dplyr)
df %>%
spread(Stage,StageChange) %>%
select(Order,Pipeline,Proposal,Closed,Lost)
日期将是 factor
class
library(tidyverse)
df_wide <- df %>%
tidyr::pivot_wider(names_from = Stage, values_from = StageChange)
df_wide
# A tibble: 2 x 5
Order Pipeline Proposal Closed Lost
<fct> <fct> <fct> <fct> <fct>
1 1234567 2008-01-01 2008-01-02 2008-01-03 NA
2 456789 2008-01-10 NA NA 2008-01-12
如果您想将日期转换为 Date
class
df_wide_dates <- df %>%
tidyr::pivot_wider(names_from = Stage, values_from = StageChange) %>%
dplyr::mutate_at(., vars(Pipeline, Proposal, Closed, Lost), as.Date)
df_wide_dates
# A tibble: 2 x 5
Order Pipeline Proposal Closed Lost
<fct> <date> <date> <date> <date>
1 1234567 2008-01-01 2008-01-02 2008-01-03 NA
2 456789 2008-01-10 NA NA 2008-01-12
假设我有以下数据框:
df <- data.frame(Order=c("1234567","1234567","1234567","456789","456789"),Stage=c("Pipeline","Proposal","Closed","Pipeline","Lost"),StageChange=c("2008-01-01","2008-01-02","2008-01-03","2008-01-10","2008-01-12"))
导致:
head(df)
Order Stage StageChange
1 1234567 Pipeline 2008-01-01
2 1234567 Proposal 2008-01-02
3 1234567 Closed 2008-01-03
4 456789 Pipeline 2008-01-10
5 456789 Lost 2008-01-12
我需要拆开 "Stage" 列并得到这样的数据框:
Order Pipeline Proposal Closed Lost
1 1234567 2008-01-01 2008-01-02 2008-01-03 NA
2 456789 2008-01-10 NA NA 2008-01-12
我阅读了文档并使用 dplyr 和 tidyr (
有什么想法可以完成我需要的吗?
我的objective,说白了,就是用这个数据来计算一个特定的Order在特定的Stage上停留的天数。有些订单丢失,有些订单已关闭(赢得),这就是为什么有 "NA" 个值的原因。当订单没有更改到特定阶段时也会发生同样的情况(订单可以从管道转到丢失,中间阶段没有任何更改)。
谢谢!
您可能会使用 tidyr::pivot_wider
。那是新版本的 retired-function spread
# install.packages("tidyr")
library(tidyr)
df %>%
pivot_wider(names_from = Stage, values_from = StageChange)
# # A tibble: 2 x 5
# Order Pipeline Proposal Closed Lost
# <fct> <fct> <fct> <fct> <fct>
# 1 1234567 2008-01-01 2008-01-02 2008-01-03 NA
# 2 456789 2008-01-10 NA NA 2008-01-12
使用dplyr::spread
library(dplyr)
df %>%
spread(Stage,StageChange) %>%
select(Order,Pipeline,Proposal,Closed,Lost)
日期将是 factor
class
library(tidyverse)
df_wide <- df %>%
tidyr::pivot_wider(names_from = Stage, values_from = StageChange)
df_wide
# A tibble: 2 x 5
Order Pipeline Proposal Closed Lost
<fct> <fct> <fct> <fct> <fct>
1 1234567 2008-01-01 2008-01-02 2008-01-03 NA
2 456789 2008-01-10 NA NA 2008-01-12
如果您想将日期转换为 Date
class
df_wide_dates <- df %>%
tidyr::pivot_wider(names_from = Stage, values_from = StageChange) %>%
dplyr::mutate_at(., vars(Pipeline, Proposal, Closed, Lost), as.Date)
df_wide_dates
# A tibble: 2 x 5
Order Pipeline Proposal Closed Lost
<fct> <date> <date> <date> <date>
1 1234567 2008-01-01 2008-01-02 2008-01-03 NA
2 456789 2008-01-10 NA NA 2008-01-12