用逗号将字符串拆分成列

Split strings by commas into columns

我有一列(是的,只有一列)有 200 行,其元素是由逗号分隔的字符串。

数据框中的实际数据:

"A, B, C, D"
"1, 10, 13, 4"
"0, 1, 6, 1"
"9, 3, 3, 0"
...

我想从这一列中生成以下数据框:

A   B   C   D

1   10  13  4
0   1   6   1
9   3   3   0
     ...

其中“A”、“B”、“C”、“D”是此数据框的 column-headers,行也分别用逗号分隔到每个创建的列。如何在 R 中使用 read.table 将第一行设置为 headers?

这里有几种不同的方法来提取和使用 read.table() 的数据。

我从两组假数据开始。在一个列名中没有任何价值的列中(真正的列名在第一行)。

df1 <- data.frame("V1" = c("A,B,C,D", 
                           "AA,D,E,F3", 
                           "Car1,Car2,Car3,Car4",
                           "a,b,c,d",
                           "a1,b1,c1,d1"))
#                    V1
# 1             A,B,C,D
# 2           AA,D,E,F3
# 3 Car1,Car2,Car3,Car4
# 4             a,b,c,d
# 5         a1,b1,c1,d1 

在另一个中,列为列名的字符串是 would-be 个名称的列表。

df2 <- data.frame("A,B,C,D" = c("AA,D,E,F3", 
                                "Car1,Car2,Car3,Car4",
                                "a,b,c,d",
                                "a1,b1,c1,d1"), 
                  check.names = F)
#               A,B,C,D
# 1           AA,D,E,F3
# 2 Car1,Car2,Car3,Car4
# 3             a,b,c,d
# 4         a1,b1,c1,d1 

提取由逗号分隔的名称和值,其中 would-be 标题位于第 1 行(使用 df1)。

# single data.frame with headers concatenated in the first row
df.noHeader <- read.table(col.names = unlist(strsplit(df1[1,], 
                                                      split = "[,]")),
                          sep = ",",
                          skip = 1, # since the headers were in row 1
                          text = unlist(df1, use.names = F)) 
#      A    B    C    D
# 1   AA    D    E   F3
# 2 Car1 Car2 Car3 Car4
# 3    a    b    c    d
# 4   a1   b1   c1   d1 

为清楚起见,当名称位于原始数据框的列名称中时,这是有效的。

# splitting the original header when splitting the data
df.header <- read.table(col.names = unlist(strsplit(names(df2), 
                                                    split = "[,]")),
                        sep = ",", 
                        text = unlist(df2))
#      A    B    C    D
# 1   AA    D    E   F3
# 2 Car1 Car2 Car3 Car4
# 3    a    b    c    d
# 4   a1   b1   c1   d1 

如果您在其他行中有标题,则只需更改对 strsplit() 的调用中的值,如下所示:

# if the headers were in row 2
df.noHeader <- read.table(col.names = unlist(strsplit(df1[2,], # <- see 2 here
                                                      split = "[,]")),
                          sep = ",",
                          skip = 2,  # since the headers were in row 2
                          text = unlist(df1, use.names = F))
#     AA    D    E   F3
# 1 Car1 Car2 Car3 Car4
# 2    a    b    c    d
# 3   a1   b1   c1   d1 

# if the headers were in row 3
df.noHeader <- read.table(col.names = unlist(strsplit(df1[3,], # <- see 3 here
                                                      split = "[,]")),
                          sep = ",",
                          skip = 3, # since the headers were in row 3
                          text = unlist(df1, use.names = F))
#   Car1 Car2 Car3 Car4
# 1    a    b    c    d
# 2   a1   b1   c1   d1 

为了完整起见, 包中非常方便的 fread() 函数可以在这里方便地用在一行中:

data.table::fread(text = paste0(df1$V1, collapse = "\n"))
       A     B     C     D
   <int> <int> <int> <int>
1:     1    10    13     4
2:     0     1     6     1
3:     9     3     3     0

唯一需要做的准备是通过调用 paste0(df1$V1, collapse = "\n").

将一列 data.frame df1 折叠成多行字符向量

fread() 默认从第一行开始读取第 headers 列。

同样默认情况下,fread() returns a data.table object 但也可以被告知 return a data.frame:

data.table::fread(text = paste0(df1$V1, collapse = "\n"), data.table = FALSE)
  A  B  C D
1 1 10 13 4
2 0  1  6 1
3 9  3  3 0

数据

df1 <- data.frame(V1 = c("A, B, C, D", 
                         "1, 10, 13, 4", 
                         "0, 1, 6, 1",
                         "9, 3, 3, 0")))

所以,df1只包含一个字符列V1