用逗号将字符串拆分成列
Split strings by commas into columns
我有一列(是的,只有一列)有 200 行,其元素是由逗号分隔的字符串。
数据框中的实际数据:
"A, B, C, D"
"1, 10, 13, 4"
"0, 1, 6, 1"
"9, 3, 3, 0"
...
我想从这一列中生成以下数据框:
A B C D
1 10 13 4
0 1 6 1
9 3 3 0
...
其中“A”、“B”、“C”、“D”是此数据框的 column-headers,行也分别用逗号分隔到每个创建的列。如何在 R 中使用 read.table 将第一行设置为 headers?
这里有几种不同的方法来提取和使用 read.table()
的数据。
我从两组假数据开始。在一个列名中没有任何价值的列中(真正的列名在第一行)。
df1 <- data.frame("V1" = c("A,B,C,D",
"AA,D,E,F3",
"Car1,Car2,Car3,Car4",
"a,b,c,d",
"a1,b1,c1,d1"))
# V1
# 1 A,B,C,D
# 2 AA,D,E,F3
# 3 Car1,Car2,Car3,Car4
# 4 a,b,c,d
# 5 a1,b1,c1,d1
在另一个中,列为列名的字符串是 would-be 个名称的列表。
df2 <- data.frame("A,B,C,D" = c("AA,D,E,F3",
"Car1,Car2,Car3,Car4",
"a,b,c,d",
"a1,b1,c1,d1"),
check.names = F)
# A,B,C,D
# 1 AA,D,E,F3
# 2 Car1,Car2,Car3,Car4
# 3 a,b,c,d
# 4 a1,b1,c1,d1
提取由逗号分隔的名称和值,其中 would-be 标题位于第 1 行(使用 df1
)。
# single data.frame with headers concatenated in the first row
df.noHeader <- read.table(col.names = unlist(strsplit(df1[1,],
split = "[,]")),
sep = ",",
skip = 1, # since the headers were in row 1
text = unlist(df1, use.names = F))
# A B C D
# 1 AA D E F3
# 2 Car1 Car2 Car3 Car4
# 3 a b c d
# 4 a1 b1 c1 d1
为清楚起见,当名称位于原始数据框的列名称中时,这是有效的。
# splitting the original header when splitting the data
df.header <- read.table(col.names = unlist(strsplit(names(df2),
split = "[,]")),
sep = ",",
text = unlist(df2))
# A B C D
# 1 AA D E F3
# 2 Car1 Car2 Car3 Car4
# 3 a b c d
# 4 a1 b1 c1 d1
如果您在其他行中有标题,则只需更改对 strsplit()
的调用中的值,如下所示:
# if the headers were in row 2
df.noHeader <- read.table(col.names = unlist(strsplit(df1[2,], # <- see 2 here
split = "[,]")),
sep = ",",
skip = 2, # since the headers were in row 2
text = unlist(df1, use.names = F))
# AA D E F3
# 1 Car1 Car2 Car3 Car4
# 2 a b c d
# 3 a1 b1 c1 d1
# if the headers were in row 3
df.noHeader <- read.table(col.names = unlist(strsplit(df1[3,], # <- see 3 here
split = "[,]")),
sep = ",",
skip = 3, # since the headers were in row 3
text = unlist(df1, use.names = F))
# Car1 Car2 Car3 Car4
# 1 a b c d
# 2 a1 b1 c1 d1
为了完整起见,data.table 包中非常方便的 fread()
函数可以在这里方便地用在一行中:
data.table::fread(text = paste0(df1$V1, collapse = "\n"))
A B C D
<int> <int> <int> <int>
1: 1 10 13 4
2: 0 1 6 1
3: 9 3 3 0
唯一需要做的准备是通过调用 paste0(df1$V1, collapse = "\n")
.
将一列 data.frame df1
折叠成多行字符向量
fread()
默认从第一行开始读取第 headers 列。
同样默认情况下,fread()
returns a data.table object 但也可以被告知 return a data.frame:
data.table::fread(text = paste0(df1$V1, collapse = "\n"), data.table = FALSE)
A B C D
1 1 10 13 4
2 0 1 6 1
3 9 3 3 0
数据
df1 <- data.frame(V1 = c("A, B, C, D",
"1, 10, 13, 4",
"0, 1, 6, 1",
"9, 3, 3, 0")))
所以,df1
只包含一个字符列V1
。
我有一列(是的,只有一列)有 200 行,其元素是由逗号分隔的字符串。
数据框中的实际数据:
"A, B, C, D"
"1, 10, 13, 4"
"0, 1, 6, 1"
"9, 3, 3, 0"
...
我想从这一列中生成以下数据框:
A B C D
1 10 13 4
0 1 6 1
9 3 3 0
...
其中“A”、“B”、“C”、“D”是此数据框的 column-headers,行也分别用逗号分隔到每个创建的列。如何在 R 中使用 read.table 将第一行设置为 headers?
这里有几种不同的方法来提取和使用 read.table()
的数据。
我从两组假数据开始。在一个列名中没有任何价值的列中(真正的列名在第一行)。
df1 <- data.frame("V1" = c("A,B,C,D",
"AA,D,E,F3",
"Car1,Car2,Car3,Car4",
"a,b,c,d",
"a1,b1,c1,d1"))
# V1
# 1 A,B,C,D
# 2 AA,D,E,F3
# 3 Car1,Car2,Car3,Car4
# 4 a,b,c,d
# 5 a1,b1,c1,d1
在另一个中,列为列名的字符串是 would-be 个名称的列表。
df2 <- data.frame("A,B,C,D" = c("AA,D,E,F3",
"Car1,Car2,Car3,Car4",
"a,b,c,d",
"a1,b1,c1,d1"),
check.names = F)
# A,B,C,D
# 1 AA,D,E,F3
# 2 Car1,Car2,Car3,Car4
# 3 a,b,c,d
# 4 a1,b1,c1,d1
提取由逗号分隔的名称和值,其中 would-be 标题位于第 1 行(使用 df1
)。
# single data.frame with headers concatenated in the first row
df.noHeader <- read.table(col.names = unlist(strsplit(df1[1,],
split = "[,]")),
sep = ",",
skip = 1, # since the headers were in row 1
text = unlist(df1, use.names = F))
# A B C D
# 1 AA D E F3
# 2 Car1 Car2 Car3 Car4
# 3 a b c d
# 4 a1 b1 c1 d1
为清楚起见,当名称位于原始数据框的列名称中时,这是有效的。
# splitting the original header when splitting the data
df.header <- read.table(col.names = unlist(strsplit(names(df2),
split = "[,]")),
sep = ",",
text = unlist(df2))
# A B C D
# 1 AA D E F3
# 2 Car1 Car2 Car3 Car4
# 3 a b c d
# 4 a1 b1 c1 d1
如果您在其他行中有标题,则只需更改对 strsplit()
的调用中的值,如下所示:
# if the headers were in row 2
df.noHeader <- read.table(col.names = unlist(strsplit(df1[2,], # <- see 2 here
split = "[,]")),
sep = ",",
skip = 2, # since the headers were in row 2
text = unlist(df1, use.names = F))
# AA D E F3
# 1 Car1 Car2 Car3 Car4
# 2 a b c d
# 3 a1 b1 c1 d1
# if the headers were in row 3
df.noHeader <- read.table(col.names = unlist(strsplit(df1[3,], # <- see 3 here
split = "[,]")),
sep = ",",
skip = 3, # since the headers were in row 3
text = unlist(df1, use.names = F))
# Car1 Car2 Car3 Car4
# 1 a b c d
# 2 a1 b1 c1 d1
为了完整起见,data.table 包中非常方便的 fread()
函数可以在这里方便地用在一行中:
data.table::fread(text = paste0(df1$V1, collapse = "\n"))
A B C D <int> <int> <int> <int> 1: 1 10 13 4 2: 0 1 6 1 3: 9 3 3 0
唯一需要做的准备是通过调用 paste0(df1$V1, collapse = "\n")
.
df1
折叠成多行字符向量
fread()
默认从第一行开始读取第 headers 列。
同样默认情况下,fread()
returns a data.table object 但也可以被告知 return a data.frame:
data.table::fread(text = paste0(df1$V1, collapse = "\n"), data.table = FALSE)
A B C D 1 1 10 13 4 2 0 1 6 1 3 9 3 3 0
数据
df1 <- data.frame(V1 = c("A, B, C, D",
"1, 10, 13, 4",
"0, 1, 6, 1",
"9, 3, 3, 0")))
所以,df1
只包含一个字符列V1
。