R - 将列转换为行 headers 并将 header 的存在填充到每条记录的 true/false

R - Convert a column to row headers and populate the presence of that header to a true/false for each record

我有一个如下所示的数据框:

+-----------+------------+-----------+-----+----------------+
| Unique ID | First Name | Last Name | Age | Characteristic |
+-----------+------------+-----------+-----+----------------+
|         1 | Bob        | Smith     |  25 | Intelligent    |
|         1 | Bob        | Smith     |  25 | Funny          |
|         1 | Bob        | Smith     |  25 | Short          |
|         2 | Jim        | Murphy    |  62 | Tall           |
|         2 | Jim        | Murphy    |  62 | Funny          |
|         3 | Kelly      | Green     |  33 | Tall           |
+-----------+------------+-----------+-----+----------------+

我想将 "Characteristic" 列转换为一行 header,并且对于每个记录中存在的该特征,如果有则填充 1,如果没有则填充 0 ',这样我每条记录只有 1 行,我的输出如下所示:

+-----------+------------+-----------+-----+-------------+-------+-------+------+
| Unique ID | First Name | Last Name | Age | Intelligent | Funny | Short | Tall |
+-----------+------------+-----------+-----+-------------+-------+-------+------+
|         1 | Bob        | Smith     |  25 |           1 |     1 |     1 |    0 |
|         2 | Jim        | Murphy    |  62 |           0 |     1 |     0 |    1 |
|         3 | Kelly      | Green     |  33 |           0 |     0 |     0 |    1 |
+-----------+------------+-----------+-----+-------------+-------+-------+------+

更多消耗性数据,以及使用dplyrtidyr的解决方案:

library(dplyr)
library(tidyr)
read.table(header=TRUE, stringsAsFactors=FALSE, text="
  Unique_ID   First_Name   Last_Name   Age   Characteristic  
          1   Bob          Smith        25   Intelligent     
          1   Bob          Smith        25   Funny           
          1   Bob          Smith        25   Short           
          2   Jim          Murphy       62   Tall            
          2   Jim          Murphy       62   Funny           
          3   Kelly        Green        33   Tall") %>%
  mutate(v = 1L) %>%
  tidyr::spread(Characteristic, v, fill=0L)
#   Unique_ID First_Name Last_Name Age Funny Intelligent Short Tall
# 1         1        Bob     Smith  25     1           1     1    0
# 2         2        Jim    Murphy  62     1           0     0    1
# 3         3      Kelly     Green  33     0           0     0    1

大部分工作都是用 spread 完成的。不幸的是,对于所有的空位,这都有 NA 而不是 0。如果你能忍受它,你就很好。 (根据@www 的建议编辑。)

这是另一个 tidyverse 解决方案。

df %>%
  mutate(ind = 1L) %>%
  spread(Characteristic, ind, fill = 0L)

#   Unique.ID First.Name Last.Name Age Funny Intelligent Short Tall
# 1         1        Bob     Smith  25     1           1     1    0
# 2         2        Jim    Murphy  62     1           0     0    1
# 3         3      Kelly     Green  33     0           0     0    1

您还可以使用 reshape2 来说明每个案例有超过 1 个实例的情况。

library(reshape2)
dcast(df, ...~Characteristic, fun.aggregate = length)

数据

df <- read.table(text = "Unique ID | First Name | Last Name | Age | Characteristic 
         1 | Bob        | Smith     |  25 | Intelligent    
         1 | Bob        | Smith     |  25 | Funny          
         1 | Bob        | Smith     |  25 | Short          
         2 | Jim        | Murphy    |  62 | Tall           
         2 | Jim        | Murphy    |  62 | Funny          
         3 | Kelly      | Green     |  33 | Tall         ", sep = "|", header = T, strip.white = T, stringsAsFactors = F)