从带分隔符的输入创建名称-值对数据框

Question

I have a data frame which can be created with this code:

input <- data.frame( 'ID'=c(1:3), 
                      Destination=c("A\r\nB", "C", "D\r\nE\r\nF"), 
                      Topic=c("W", "X", "Y\r\nZ") )

It looks like this:

  ID Destination  Topic
1  1      A\r\nB      W
2  2           C      X
3  3 D\r\nE\r\nF Y\r\nZ

I would like to create an output data frame that looks like this:

desiredOutput <- data.frame( 
   ID = c(1,1,1,2,2,3,3,3,3,3) , 
   name=c( "Destination", "Destination", "Topic", "Destination", "Topic",
           "Destination", "Destination", "Destination" , "Topic", "Topic"), 
   value=c("A","B", "W", "C", "X", "D", "E", "F", "Y", "Z") )

   ID        name value
1   1 Destination     A
2   1 Destination     B
3   1       Topic     W
4   2 Destination     C
5   2       Topic     X
6   3 Destination     D
7   3 Destination     E
8   3 Destination     F
9   3       Topic     Y
10  3       Topic     Z

Whenever the delimiter \r\n occurs, I would like to split the contents into separate rows, with the correct ID, the name of the column, and the corresponding value.

I can split a single column into a list using strsplit, but I don't know how to put the contents into a data frame as above apart from attempting to write a loop. I expect the tidyr package might be helpful.

strsplit(input$Destination, split = "\r\n")

How can this be done, ideally without a loop?

Answer 1

使用 tidyr，gather 为长格式，然后使用 separate_rows 分隔连接的元素：

library(tidyr)

input %>% gather(name, value, -ID) %>% separate_rows(value)
##    ID        name value
## 1   1 Destination     A
## 2   1 Destination     B
## 3   2 Destination     C
## 4   3 Destination     D
## 5   3 Destination     E
## 6   3 Destination     F
## 7   1       Topic     W
## 8   2       Topic     X
## 9   3       Topic     Y
## 10  3       Topic     Z

注意： 如果您的数据是因子而不是字符，tidyr 会警告您，因为它强制转换为字符以便重新排列。无论如何它都会工作，但如果你讨厌警告，请在重塑之前手动强制角色。

Answer 2

这是一个使用 data.table

的选项

library(data.table)
melt(setDT(input), id.var = "ID", variable.name = "name")[,
      .(value = unlist(strsplit(value, "\s+"))), .(ID, name)][order(ID)]
#     ID        name value
#1:  1 Destination     A
#2:  1 Destination     B
#3:  1       Topic     W
#4:  2 Destination     C
#5:  2       Topic     X
#6:  3 Destination     D
#7:  3 Destination     E
#8:  3 Destination     F
#9:  3       Topic     Y
#10: 3       Topic     Z

编辑：@DavidArenburg 在另一个解决方案（我之前没有看到）中评论了类似的解决方案。

从带分隔符的输入创建名称-值对数据框

Create name-value pair data frame from input with delimiters

r

tidyr