使用日期和时间合并两个数据集（销售数据和员工出勤率）

Question

> sales.csv
        CustomerName    InvoiceDate_Time InvoiceNo InvoiceValue
1    Hendricks, Eric  30-09-2015 1:00 PM        10         5000
2        Baker, Mark  30-09-2015 3:00 PM        11        12000
3   Catalano, Robert 01-10-2015 10:00 AM        12        25000
4     Eaton, Jeffrey  01-10-2015 4:00 PM        13         4000
5    Watanuki, Cathy  02-10-2015 9:00 AM        14        80000
6      Fier, Marilyn  02-10-2015 3:30 PM        15        18000
7     O'Brien, Donna  03-10-2015 1:30 PM        16        25000
8      Perez, Barney  03-10-2015 4:10 PM        17        20000
9 Fitzgerald, Jackie 04-10-2015 11:10 AM        18         6000


> StaffAttendance.csv
       EmployeeName Designation AttendanceIn.DateTime AttendanceOut.DateTime
1        Page, Lisa   Sales Rep    30-09-2015 6:50 AM     30-09-2015 2:00 PM
2    Taylor, Hector     Manager    30-09-2015 7:00 AM     30-09-2015 5:00 PM
3  Dawson, Jonathan   Sales Rep    30-09-2015 1:55 PM     30-09-2015 7:00 PM
4      Duran, Brian   Sales Rep    01-10-2015 6:50 AM     01-10-2015 7:00 PM
5       Pratt, Erik     Manager    01-10-2015 7:20 AM     01-10-2015 5:10 PM
6        Page, Lisa   Sales Rep    02-10-2015 6:55 AM     02-10-2015 6:45 PM
7    Taylor, Hector     Manager    02-10-2015 7:10 AM     02-10-2015 5:20 AM
8      Weber, Larry   Sales Rep    03-10-2015 6:50 AM     03-10-2015 6:55 PM
9       Pratt, Erik     Manager    04-10-2015 7:20 AM     04-10-2015 5:10 PM
10     Duran, Brian   Sales Rep    04-10-2015 7:10 AM     04-10-2015 7:00 PM

如上所述，我有两个数据 tables（CSV 文件），我想使用日期和时间合并它们。我如何结合使用日期和时间来查找哪些员工为客户的每笔销售工作？

如何将结果 table 另存为 CSV 文件？

请问。逐步说明要使用的 R 命令。我也可以在 tableau 中执行此操作。步骤是什么？

Answer 1

好的，这是一个潜在的 dplyr / data.table / tidyr 解决方案。总体思路是使用list variable feature of dplyr since version 0.4.0。对于每个客户，我们 select 他访问时在场的员工（使用 data.table 的 between() 函数）并将它们存储在每个客户的列表中。然后我们 unnest() 列表变量（它复制每个唯一员工的客户条目）并合并回员工信息。这导致了独特的客户-员工组合的数据框。

library(dplyr)
library(readr)
library(tidyr)
library(data.table)

#########
# For reproducibility: you can also download the .csv 
# from these Dropbox links using the 'repmis' pkg 
#
# customer <- repmis::source_DropboxData("customer.csv",
#                            "q0sf4uj13hpjz9v",
#                            sep = ",",
#                            header = TRUE)
# 
# staff <- repmis::source_DropboxData("staff.csv",
#                                     "q8p16hchsx8dzoa",
#                                     sep = ",",
#                                     header = TRUE)
##########    

# One problem with the original .csv is the formatting of the time: the
# hour is given with a single digit; not in the format 0+digit. We therefore 
# use '%k' in as.POSIXct() to parse the time instead of %H:

customer <- read_csv("https://www.dropbox.com/s/q8p16hchsx8dzoa/staff.csv?dl=1") %>% 
  mutate(date = as.POSIXct(date, "%d-%m-%Y %k:%M", tz = "Europe/Berlin"))

staff <- read_csv("staff.csv")  %>% 
  mutate(start = as.POSIXct(start, "%d-%m-%Y %k:%M", tz = "Europe/Berlin"),
         end = as.POSIXct(end, "%d-%m-%Y %k:%M", tz = "Europe/Berlin"))

# Now we group by customer and copy for each customer 
# the list of employee names who were present at the date of the customer interaction:

staff_customer <- customer %>% 
  group_by(c.name) %>% # for each customer....
  mutate(employee = list(staff[data.table::between(date, staff$start, staff$end), c("employee", "Record ID")])) %>% # ... select all employees which were present during the customer's visit and store them in a list
  unnest() %>% # unnest this list using tidyr
  left_join(., staff) # copy the staff information back (if necessary)

结果如下 table（仅显示前 10 行）：

Source: local data frame [6 x 7]

  RecordID                  c.name                date                  employee Record ID               start                 end
1      826 NYLLYNYBCSNCYCKC/BL MRS 2014-10-07 06:35:00   Miss. A A D R ATHAPATTU      2612 2014-10-06 18:05:00 2014-10-07 08:27:00
2      826 NYLLYNYBCSNCYCKC/BL MRS 2014-10-07 06:35:00   Mr. K K R C CHATHURAPLA      2650 2014-10-06 18:05:00 2014-10-07 08:37:00
3      826 NYLLYNYBCSNCYCKC/BL MRS 2014-10-07 06:35:00    Mr. R P C P RAJAPAKRHA      2596 2014-10-06 18:05:00 2014-10-07 08:03:00
4      826 NYLLYNYBCSNCYCKC/BL MRS 2014-10-07 06:35:00        Mr. E A M LUDDHIKA      2699 2014-10-06 18:26:00 2014-10-07 08:31:00
5      826 NYLLYNYBCSNCYCKC/BL MRS 2014-10-07 06:35:00         Mr. R W P L RILVA      2673 2014-10-06 18:27:00 2014-10-07 08:26:00
6      826 NYLLYNYBCSNCYCKC/BL MRS 2014-10-07 06:35:00 Mrs. D.A.R.R. KARUPARATPE      2565 2014-10-06 18:31:00 2014-10-07 08:28:00

遗憾的是，我不知道这在 Tableau 中如何工作。

使用日期和时间合并两个数据集（销售数据和员工出勤率）

Combine two data sets using date & time ( Sales data with employee attendance)

csv

r

tableau-api