R - 数据透视

Question

使用 R 和 tidyverse 库，我正在尝试实现类似 pivot 的结果。这里是示例数据集：

zz <- "   Date          ParAB ParCD
1         2017-05-27    A     C
2         2017-05-27    B     D
3         2017-05-27    A     D     
4         2017-05-27    B     C     
5         2017-05-27    B     C     
6         2017-05-28    A     D     
7         2017-05-28    A     C     
8         2017-05-28    A     C
9         2017-05-28    A     D"

Data <- read.table(text=zz, header = TRUE)}

我想将数据转换为如下所示，每天出现的次数：

Date           A        B        C       D
2017-05-27     2        3        3       2
2017-05-28     2        0        1       1

我厌倦了在 ParAB 专栏上运行良好的传播功能。

Data %>%
  group_by(Date, ParAB, ParCD) %>%
  summarise(occr = n()) %>%
  spread(ParAB, occr, fill = 0) %>%
  mutate(occrCD = A+B)

所以结果是：

# A tibble: 4 x 5
# Groups:   Date [2]
    Date        ParCD     A     B   occrCD
  <fctr>        <fctr> <dbl> <dbl>  <dbl>
1 2017-05-27      C     1     2      3
2 2017-05-27      D     1     1      2
3 2017-05-28      C     2     0      2
4 2017-05-28      D     2     0      2

然后当我第二次尝试传播时，它没有按预期工作。对于 C 和 D 行的特定日期，不会添加列 A（和 B）的数据。结果我得到了错误的数据。

包含两个步骤的代码：

Data %>%
  group_by(Date, ParAB, ParCD) %>%
  summarise(occr = n()) %>%
  spread(ParAB, occr, fill = 0) %>% # first spread - result as expected
  mutate(occrCD = A+B) %>%
  spread(ParCD, occrCD, fill = 0) %>% # second spread, lost sum for A and B
  group_by(Date) %>%
  summarise_all(sum)

结果不是我想要的。该错误是可见的，因为 A+B 对于 C +D 应该是相等的，但对于 2017-05-28 它不是。 :(

# A tibble: 2 x 5
        Date     A     B     C     D
      <fctr> <dbl> <dbl> <dbl> <dbl>
1 2017-05-27     2     3     3     2
2 2017-05-28     2     0     2     2

我确信这很琐碎，但由于我是新手，非常感谢您的帮助。

男

Answer 1

如果您将所有参数都放在一列中，则没有理由 spread 两次。

library(dplyr)
library(tidyr)

zz <- "   Date          ParAB ParCD
1         2017-05-27    A     C
2         2017-05-27    B     D
3         2017-05-27    A     D     
4         2017-05-27    B     C     
5         2017-05-27    B     C     
6         2017-05-28    A     D     
7         2017-05-28    A     C     
8         2017-05-28    A     C
9         2017-05-28    A     D"

Data <- read.table(text=zz, header = TRUE, stringsAsFactors = F)


Data %>%
  gather(v1,value,-Date) %>%
  count(Date, value) %>%
  spread(value, n, fill = 0)

# # A tibble: 2 x 5
#         Date     A     B     C     D
# *      <chr> <dbl> <dbl> <dbl> <dbl>
# 1 2017-05-27     2     3     3     2
# 2 2017-05-28     4     0     2     2

R - 数据透视

R - Pivot like data

r

tidyverse