从带分隔符的输入创建名称-值对数据框
Create name-value pair data frame from input with delimiters
I have a data frame which can be created with this code:
input <- data.frame( 'ID'=c(1:3),
Destination=c("A\r\nB", "C", "D\r\nE\r\nF"),
Topic=c("W", "X", "Y\r\nZ") )
It looks like this:
ID Destination Topic
1 1 A\r\nB W
2 2 C X
3 3 D\r\nE\r\nF Y\r\nZ
I would like to create an output data frame that looks like this:
desiredOutput <- data.frame(
ID = c(1,1,1,2,2,3,3,3,3,3) ,
name=c( "Destination", "Destination", "Topic", "Destination", "Topic",
"Destination", "Destination", "Destination" , "Topic", "Topic"),
value=c("A","B", "W", "C", "X", "D", "E", "F", "Y", "Z") )
ID name value
1 1 Destination A
2 1 Destination B
3 1 Topic W
4 2 Destination C
5 2 Topic X
6 3 Destination D
7 3 Destination E
8 3 Destination F
9 3 Topic Y
10 3 Topic Z
Whenever the delimiter \r\n
occurs, I would like to split the contents into separate rows, with the correct ID, the name of the column, and the corresponding value.
I can split a single column into a list using strsplit
, but I don't know how to put the contents into a data frame as above apart from attempting to write a loop. I expect the tidyr
package might be helpful.
strsplit(input$Destination, split = "\r\n")
How can this be done, ideally without a loop?
使用 tidyr,gather
为长格式,然后使用 separate_rows
分隔连接的元素:
library(tidyr)
input %>% gather(name, value, -ID) %>% separate_rows(value)
## ID name value
## 1 1 Destination A
## 2 1 Destination B
## 3 2 Destination C
## 4 3 Destination D
## 5 3 Destination E
## 6 3 Destination F
## 7 1 Topic W
## 8 2 Topic X
## 9 3 Topic Y
## 10 3 Topic Z
注意: 如果您的数据是因子而不是字符,tidyr
会警告您,因为它强制转换为字符以便重新排列。无论如何它都会工作,但如果你讨厌警告,请在重塑之前手动强制角色。
这是一个使用 data.table
的选项
library(data.table)
melt(setDT(input), id.var = "ID", variable.name = "name")[,
.(value = unlist(strsplit(value, "\s+"))), .(ID, name)][order(ID)]
# ID name value
#1: 1 Destination A
#2: 1 Destination B
#3: 1 Topic W
#4: 2 Destination C
#5: 2 Topic X
#6: 3 Destination D
#7: 3 Destination E
#8: 3 Destination F
#9: 3 Topic Y
#10: 3 Topic Z
编辑:@DavidArenburg 在另一个解决方案(我之前没有看到)中评论了类似的解决方案。
I have a data frame which can be created with this code:
input <- data.frame( 'ID'=c(1:3),
Destination=c("A\r\nB", "C", "D\r\nE\r\nF"),
Topic=c("W", "X", "Y\r\nZ") )
It looks like this:
ID Destination Topic
1 1 A\r\nB W
2 2 C X
3 3 D\r\nE\r\nF Y\r\nZ
I would like to create an output data frame that looks like this:
desiredOutput <- data.frame(
ID = c(1,1,1,2,2,3,3,3,3,3) ,
name=c( "Destination", "Destination", "Topic", "Destination", "Topic",
"Destination", "Destination", "Destination" , "Topic", "Topic"),
value=c("A","B", "W", "C", "X", "D", "E", "F", "Y", "Z") )
ID name value
1 1 Destination A
2 1 Destination B
3 1 Topic W
4 2 Destination C
5 2 Topic X
6 3 Destination D
7 3 Destination E
8 3 Destination F
9 3 Topic Y
10 3 Topic Z
Whenever the delimiter \r\n
occurs, I would like to split the contents into separate rows, with the correct ID, the name of the column, and the corresponding value.
I can split a single column into a list using strsplit
, but I don't know how to put the contents into a data frame as above apart from attempting to write a loop. I expect the tidyr
package might be helpful.
strsplit(input$Destination, split = "\r\n")
How can this be done, ideally without a loop?
使用 tidyr,gather
为长格式,然后使用 separate_rows
分隔连接的元素:
library(tidyr)
input %>% gather(name, value, -ID) %>% separate_rows(value)
## ID name value
## 1 1 Destination A
## 2 1 Destination B
## 3 2 Destination C
## 4 3 Destination D
## 5 3 Destination E
## 6 3 Destination F
## 7 1 Topic W
## 8 2 Topic X
## 9 3 Topic Y
## 10 3 Topic Z
注意: 如果您的数据是因子而不是字符,tidyr
会警告您,因为它强制转换为字符以便重新排列。无论如何它都会工作,但如果你讨厌警告,请在重塑之前手动强制角色。
这是一个使用 data.table
library(data.table)
melt(setDT(input), id.var = "ID", variable.name = "name")[,
.(value = unlist(strsplit(value, "\s+"))), .(ID, name)][order(ID)]
# ID name value
#1: 1 Destination A
#2: 1 Destination B
#3: 1 Topic W
#4: 2 Destination C
#5: 2 Topic X
#6: 3 Destination D
#7: 3 Destination E
#8: 3 Destination F
#9: 3 Topic Y
#10: 3 Topic Z
编辑:@DavidArenburg 在另一个解决方案(我之前没有看到)中评论了类似的解决方案。