使用日期和时间合并两个数据集(销售数据和员工出勤率)
Combine two data sets using date & time ( Sales data with employee attendance)
> sales.csv
CustomerName InvoiceDate_Time InvoiceNo InvoiceValue
1 Hendricks, Eric 30-09-2015 1:00 PM 10 5000
2 Baker, Mark 30-09-2015 3:00 PM 11 12000
3 Catalano, Robert 01-10-2015 10:00 AM 12 25000
4 Eaton, Jeffrey 01-10-2015 4:00 PM 13 4000
5 Watanuki, Cathy 02-10-2015 9:00 AM 14 80000
6 Fier, Marilyn 02-10-2015 3:30 PM 15 18000
7 O'Brien, Donna 03-10-2015 1:30 PM 16 25000
8 Perez, Barney 03-10-2015 4:10 PM 17 20000
9 Fitzgerald, Jackie 04-10-2015 11:10 AM 18 6000
> StaffAttendance.csv
EmployeeName Designation AttendanceIn.DateTime AttendanceOut.DateTime
1 Page, Lisa Sales Rep 30-09-2015 6:50 AM 30-09-2015 2:00 PM
2 Taylor, Hector Manager 30-09-2015 7:00 AM 30-09-2015 5:00 PM
3 Dawson, Jonathan Sales Rep 30-09-2015 1:55 PM 30-09-2015 7:00 PM
4 Duran, Brian Sales Rep 01-10-2015 6:50 AM 01-10-2015 7:00 PM
5 Pratt, Erik Manager 01-10-2015 7:20 AM 01-10-2015 5:10 PM
6 Page, Lisa Sales Rep 02-10-2015 6:55 AM 02-10-2015 6:45 PM
7 Taylor, Hector Manager 02-10-2015 7:10 AM 02-10-2015 5:20 AM
8 Weber, Larry Sales Rep 03-10-2015 6:50 AM 03-10-2015 6:55 PM
9 Pratt, Erik Manager 04-10-2015 7:20 AM 04-10-2015 5:10 PM
10 Duran, Brian Sales Rep 04-10-2015 7:10 AM 04-10-2015 7:00 PM
如上所述,我有两个数据 tables(CSV 文件),我想使用日期和时间合并它们。
我如何结合使用日期和时间来查找哪些员工为客户的每笔销售工作?
如何将结果 table 另存为 CSV 文件?
请问。逐步说明要使用的 R 命令。
我也可以在 tableau 中执行此操作。步骤是什么?
好的,这是一个潜在的 dplyr
/ data.table
/ tidyr
解决方案。
总体思路是使用list variable feature of dplyr since version 0.4.0。对于每个客户,我们 select 他访问时在场的员工(使用 data.table
的 between()
函数)并将它们存储在每个客户的列表中。然后我们 unnest()
列表变量(它复制每个唯一员工的客户条目)并合并回员工信息。这导致了独特的客户-员工组合的数据框。
library(dplyr)
library(readr)
library(tidyr)
library(data.table)
#########
# For reproducibility: you can also download the .csv
# from these Dropbox links using the 'repmis' pkg
#
# customer <- repmis::source_DropboxData("customer.csv",
# "q0sf4uj13hpjz9v",
# sep = ",",
# header = TRUE)
#
# staff <- repmis::source_DropboxData("staff.csv",
# "q8p16hchsx8dzoa",
# sep = ",",
# header = TRUE)
##########
# One problem with the original .csv is the formatting of the time: the
# hour is given with a single digit; not in the format 0+digit. We therefore
# use '%k' in as.POSIXct() to parse the time instead of %H:
customer <- read_csv("https://www.dropbox.com/s/q8p16hchsx8dzoa/staff.csv?dl=1") %>%
mutate(date = as.POSIXct(date, "%d-%m-%Y %k:%M", tz = "Europe/Berlin"))
staff <- read_csv("staff.csv") %>%
mutate(start = as.POSIXct(start, "%d-%m-%Y %k:%M", tz = "Europe/Berlin"),
end = as.POSIXct(end, "%d-%m-%Y %k:%M", tz = "Europe/Berlin"))
# Now we group by customer and copy for each customer
# the list of employee names who were present at the date of the customer interaction:
staff_customer <- customer %>%
group_by(c.name) %>% # for each customer....
mutate(employee = list(staff[data.table::between(date, staff$start, staff$end), c("employee", "Record ID")])) %>% # ... select all employees which were present during the customer's visit and store them in a list
unnest() %>% # unnest this list using tidyr
left_join(., staff) # copy the staff information back (if necessary)
结果如下 table(仅显示前 10 行):
Source: local data frame [6 x 7]
RecordID c.name date employee Record ID start end
1 826 NYLLYNYBCSNCYCKC/BL MRS 2014-10-07 06:35:00 Miss. A A D R ATHAPATTU 2612 2014-10-06 18:05:00 2014-10-07 08:27:00
2 826 NYLLYNYBCSNCYCKC/BL MRS 2014-10-07 06:35:00 Mr. K K R C CHATHURAPLA 2650 2014-10-06 18:05:00 2014-10-07 08:37:00
3 826 NYLLYNYBCSNCYCKC/BL MRS 2014-10-07 06:35:00 Mr. R P C P RAJAPAKRHA 2596 2014-10-06 18:05:00 2014-10-07 08:03:00
4 826 NYLLYNYBCSNCYCKC/BL MRS 2014-10-07 06:35:00 Mr. E A M LUDDHIKA 2699 2014-10-06 18:26:00 2014-10-07 08:31:00
5 826 NYLLYNYBCSNCYCKC/BL MRS 2014-10-07 06:35:00 Mr. R W P L RILVA 2673 2014-10-06 18:27:00 2014-10-07 08:26:00
6 826 NYLLYNYBCSNCYCKC/BL MRS 2014-10-07 06:35:00 Mrs. D.A.R.R. KARUPARATPE 2565 2014-10-06 18:31:00 2014-10-07 08:28:00
遗憾的是,我不知道这在 Tableau 中如何工作。
> sales.csv
CustomerName InvoiceDate_Time InvoiceNo InvoiceValue
1 Hendricks, Eric 30-09-2015 1:00 PM 10 5000
2 Baker, Mark 30-09-2015 3:00 PM 11 12000
3 Catalano, Robert 01-10-2015 10:00 AM 12 25000
4 Eaton, Jeffrey 01-10-2015 4:00 PM 13 4000
5 Watanuki, Cathy 02-10-2015 9:00 AM 14 80000
6 Fier, Marilyn 02-10-2015 3:30 PM 15 18000
7 O'Brien, Donna 03-10-2015 1:30 PM 16 25000
8 Perez, Barney 03-10-2015 4:10 PM 17 20000
9 Fitzgerald, Jackie 04-10-2015 11:10 AM 18 6000
> StaffAttendance.csv
EmployeeName Designation AttendanceIn.DateTime AttendanceOut.DateTime
1 Page, Lisa Sales Rep 30-09-2015 6:50 AM 30-09-2015 2:00 PM
2 Taylor, Hector Manager 30-09-2015 7:00 AM 30-09-2015 5:00 PM
3 Dawson, Jonathan Sales Rep 30-09-2015 1:55 PM 30-09-2015 7:00 PM
4 Duran, Brian Sales Rep 01-10-2015 6:50 AM 01-10-2015 7:00 PM
5 Pratt, Erik Manager 01-10-2015 7:20 AM 01-10-2015 5:10 PM
6 Page, Lisa Sales Rep 02-10-2015 6:55 AM 02-10-2015 6:45 PM
7 Taylor, Hector Manager 02-10-2015 7:10 AM 02-10-2015 5:20 AM
8 Weber, Larry Sales Rep 03-10-2015 6:50 AM 03-10-2015 6:55 PM
9 Pratt, Erik Manager 04-10-2015 7:20 AM 04-10-2015 5:10 PM
10 Duran, Brian Sales Rep 04-10-2015 7:10 AM 04-10-2015 7:00 PM
如上所述,我有两个数据 tables(CSV 文件),我想使用日期和时间合并它们。 我如何结合使用日期和时间来查找哪些员工为客户的每笔销售工作?
如何将结果 table 另存为 CSV 文件?
请问。逐步说明要使用的 R 命令。 我也可以在 tableau 中执行此操作。步骤是什么?
好的,这是一个潜在的 dplyr
/ data.table
/ tidyr
解决方案。
总体思路是使用list variable feature of dplyr since version 0.4.0。对于每个客户,我们 select 他访问时在场的员工(使用 data.table
的 between()
函数)并将它们存储在每个客户的列表中。然后我们 unnest()
列表变量(它复制每个唯一员工的客户条目)并合并回员工信息。这导致了独特的客户-员工组合的数据框。
library(dplyr)
library(readr)
library(tidyr)
library(data.table)
#########
# For reproducibility: you can also download the .csv
# from these Dropbox links using the 'repmis' pkg
#
# customer <- repmis::source_DropboxData("customer.csv",
# "q0sf4uj13hpjz9v",
# sep = ",",
# header = TRUE)
#
# staff <- repmis::source_DropboxData("staff.csv",
# "q8p16hchsx8dzoa",
# sep = ",",
# header = TRUE)
##########
# One problem with the original .csv is the formatting of the time: the
# hour is given with a single digit; not in the format 0+digit. We therefore
# use '%k' in as.POSIXct() to parse the time instead of %H:
customer <- read_csv("https://www.dropbox.com/s/q8p16hchsx8dzoa/staff.csv?dl=1") %>%
mutate(date = as.POSIXct(date, "%d-%m-%Y %k:%M", tz = "Europe/Berlin"))
staff <- read_csv("staff.csv") %>%
mutate(start = as.POSIXct(start, "%d-%m-%Y %k:%M", tz = "Europe/Berlin"),
end = as.POSIXct(end, "%d-%m-%Y %k:%M", tz = "Europe/Berlin"))
# Now we group by customer and copy for each customer
# the list of employee names who were present at the date of the customer interaction:
staff_customer <- customer %>%
group_by(c.name) %>% # for each customer....
mutate(employee = list(staff[data.table::between(date, staff$start, staff$end), c("employee", "Record ID")])) %>% # ... select all employees which were present during the customer's visit and store them in a list
unnest() %>% # unnest this list using tidyr
left_join(., staff) # copy the staff information back (if necessary)
结果如下 table(仅显示前 10 行):
Source: local data frame [6 x 7]
RecordID c.name date employee Record ID start end
1 826 NYLLYNYBCSNCYCKC/BL MRS 2014-10-07 06:35:00 Miss. A A D R ATHAPATTU 2612 2014-10-06 18:05:00 2014-10-07 08:27:00
2 826 NYLLYNYBCSNCYCKC/BL MRS 2014-10-07 06:35:00 Mr. K K R C CHATHURAPLA 2650 2014-10-06 18:05:00 2014-10-07 08:37:00
3 826 NYLLYNYBCSNCYCKC/BL MRS 2014-10-07 06:35:00 Mr. R P C P RAJAPAKRHA 2596 2014-10-06 18:05:00 2014-10-07 08:03:00
4 826 NYLLYNYBCSNCYCKC/BL MRS 2014-10-07 06:35:00 Mr. E A M LUDDHIKA 2699 2014-10-06 18:26:00 2014-10-07 08:31:00
5 826 NYLLYNYBCSNCYCKC/BL MRS 2014-10-07 06:35:00 Mr. R W P L RILVA 2673 2014-10-06 18:27:00 2014-10-07 08:26:00
6 826 NYLLYNYBCSNCYCKC/BL MRS 2014-10-07 06:35:00 Mrs. D.A.R.R. KARUPARATPE 2565 2014-10-06 18:31:00 2014-10-07 08:28:00
遗憾的是,我不知道这在 Tableau 中如何工作。