R - 将列转换为行 headers 并将 header 的存在填充到每条记录的 true/false
R - Convert a column to row headers and populate the presence of that header to a true/false for each record
我有一个如下所示的数据框:
+-----------+------------+-----------+-----+----------------+
| Unique ID | First Name | Last Name | Age | Characteristic |
+-----------+------------+-----------+-----+----------------+
| 1 | Bob | Smith | 25 | Intelligent |
| 1 | Bob | Smith | 25 | Funny |
| 1 | Bob | Smith | 25 | Short |
| 2 | Jim | Murphy | 62 | Tall |
| 2 | Jim | Murphy | 62 | Funny |
| 3 | Kelly | Green | 33 | Tall |
+-----------+------------+-----------+-----+----------------+
我想将 "Characteristic" 列转换为一行 header,并且对于每个记录中存在的该特征,如果有则填充 1,如果没有则填充 0 ',这样我每条记录只有 1 行,我的输出如下所示:
+-----------+------------+-----------+-----+-------------+-------+-------+------+
| Unique ID | First Name | Last Name | Age | Intelligent | Funny | Short | Tall |
+-----------+------------+-----------+-----+-------------+-------+-------+------+
| 1 | Bob | Smith | 25 | 1 | 1 | 1 | 0 |
| 2 | Jim | Murphy | 62 | 0 | 1 | 0 | 1 |
| 3 | Kelly | Green | 33 | 0 | 0 | 0 | 1 |
+-----------+------------+-----------+-----+-------------+-------+-------+------+
更多消耗性数据,以及使用dplyr
和tidyr
的解决方案:
library(dplyr)
library(tidyr)
read.table(header=TRUE, stringsAsFactors=FALSE, text="
Unique_ID First_Name Last_Name Age Characteristic
1 Bob Smith 25 Intelligent
1 Bob Smith 25 Funny
1 Bob Smith 25 Short
2 Jim Murphy 62 Tall
2 Jim Murphy 62 Funny
3 Kelly Green 33 Tall") %>%
mutate(v = 1L) %>%
tidyr::spread(Characteristic, v, fill=0L)
# Unique_ID First_Name Last_Name Age Funny Intelligent Short Tall
# 1 1 Bob Smith 25 1 1 1 0
# 2 2 Jim Murphy 62 1 0 0 1
# 3 3 Kelly Green 33 0 0 0 1
大部分工作都是用 spread
完成的。不幸的是,对于所有的空位,这都有 NA
而不是 0
。如果你能忍受它,你就很好。 (根据@www 的建议编辑。)
这是另一个 tidyverse
解决方案。
df %>%
mutate(ind = 1L) %>%
spread(Characteristic, ind, fill = 0L)
# Unique.ID First.Name Last.Name Age Funny Intelligent Short Tall
# 1 1 Bob Smith 25 1 1 1 0
# 2 2 Jim Murphy 62 1 0 0 1
# 3 3 Kelly Green 33 0 0 0 1
您还可以使用 reshape2
来说明每个案例有超过 1 个实例的情况。
library(reshape2)
dcast(df, ...~Characteristic, fun.aggregate = length)
数据
df <- read.table(text = "Unique ID | First Name | Last Name | Age | Characteristic
1 | Bob | Smith | 25 | Intelligent
1 | Bob | Smith | 25 | Funny
1 | Bob | Smith | 25 | Short
2 | Jim | Murphy | 62 | Tall
2 | Jim | Murphy | 62 | Funny
3 | Kelly | Green | 33 | Tall ", sep = "|", header = T, strip.white = T, stringsAsFactors = F)
我有一个如下所示的数据框:
+-----------+------------+-----------+-----+----------------+
| Unique ID | First Name | Last Name | Age | Characteristic |
+-----------+------------+-----------+-----+----------------+
| 1 | Bob | Smith | 25 | Intelligent |
| 1 | Bob | Smith | 25 | Funny |
| 1 | Bob | Smith | 25 | Short |
| 2 | Jim | Murphy | 62 | Tall |
| 2 | Jim | Murphy | 62 | Funny |
| 3 | Kelly | Green | 33 | Tall |
+-----------+------------+-----------+-----+----------------+
我想将 "Characteristic" 列转换为一行 header,并且对于每个记录中存在的该特征,如果有则填充 1,如果没有则填充 0 ',这样我每条记录只有 1 行,我的输出如下所示:
+-----------+------------+-----------+-----+-------------+-------+-------+------+
| Unique ID | First Name | Last Name | Age | Intelligent | Funny | Short | Tall |
+-----------+------------+-----------+-----+-------------+-------+-------+------+
| 1 | Bob | Smith | 25 | 1 | 1 | 1 | 0 |
| 2 | Jim | Murphy | 62 | 0 | 1 | 0 | 1 |
| 3 | Kelly | Green | 33 | 0 | 0 | 0 | 1 |
+-----------+------------+-----------+-----+-------------+-------+-------+------+
更多消耗性数据,以及使用dplyr
和tidyr
的解决方案:
library(dplyr)
library(tidyr)
read.table(header=TRUE, stringsAsFactors=FALSE, text="
Unique_ID First_Name Last_Name Age Characteristic
1 Bob Smith 25 Intelligent
1 Bob Smith 25 Funny
1 Bob Smith 25 Short
2 Jim Murphy 62 Tall
2 Jim Murphy 62 Funny
3 Kelly Green 33 Tall") %>%
mutate(v = 1L) %>%
tidyr::spread(Characteristic, v, fill=0L)
# Unique_ID First_Name Last_Name Age Funny Intelligent Short Tall
# 1 1 Bob Smith 25 1 1 1 0
# 2 2 Jim Murphy 62 1 0 0 1
# 3 3 Kelly Green 33 0 0 0 1
大部分工作都是用 spread
完成的。不幸的是,对于所有的空位,这都有 NA
而不是 0
。如果你能忍受它,你就很好。 (根据@www 的建议编辑。)
这是另一个 tidyverse
解决方案。
df %>%
mutate(ind = 1L) %>%
spread(Characteristic, ind, fill = 0L)
# Unique.ID First.Name Last.Name Age Funny Intelligent Short Tall
# 1 1 Bob Smith 25 1 1 1 0
# 2 2 Jim Murphy 62 1 0 0 1
# 3 3 Kelly Green 33 0 0 0 1
您还可以使用 reshape2
来说明每个案例有超过 1 个实例的情况。
library(reshape2)
dcast(df, ...~Characteristic, fun.aggregate = length)
数据
df <- read.table(text = "Unique ID | First Name | Last Name | Age | Characteristic
1 | Bob | Smith | 25 | Intelligent
1 | Bob | Smith | 25 | Funny
1 | Bob | Smith | 25 | Short
2 | Jim | Murphy | 62 | Tall
2 | Jim | Murphy | 62 | Funny
3 | Kelly | Green | 33 | Tall ", sep = "|", header = T, strip.white = T, stringsAsFactors = F)