如何在 R 中重塑 csv table?
How to reshape a csv table in R?
我有这个数据集:
Group Group Group Cat Cat Cat Betw
1 a A 5.87
2 b j A 0.11
3 c B A 2.18
4 d C D 5.31
5 e E C 0.00
6 f E 352.10
7 g E 0.35
8 h A B 0.00
9 i m F 0.00
10 j A D 15.04
我想对其进行整形,以便只有 3 列:Var1(将是 'Group' 或 'Cat')、Var2(将具有小写或大写字母)和之间
因此,例如,c、b 和 A 的值均为 2.1892749,
Var1 Var2 Betw
1 Group a 5.87
2 Cat A 5.87
3 Group b 0.11
4 Group j 0.11
5 Cat A 0.11
...
如何用 R 做到这一点?
您可以使用 dplyr
和 tidyr
。首先我们 gather
到长数据,然后删除放在列上的额外数字,然后我们删除空白:
library(dplyr)
library(tidyr)
dat %>% gather(Var1, Var2, -Betw) %>%
mutate(Var1 = gsub(".[0-9]$", "", Var1)) %>%
filter(Var2 != "")
使用的数据:
structure(list(Group = structure(1:10, .Label = c("a", "b", "c",
"d", "e", "f", "g", "h", "i", "j"), class = "factor"), Group.1 = structure(c(1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 1L), .Label = c("", "m"), class = "factor"),
Group.2 = structure(c(1L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L), .Label = c("", "j"), class = "factor"), Cat = structure(c(1L,
1L, 2L, 3L, 4L, 4L, 4L, 1L, 5L, 1L), .Label = c("A", "B",
"C", "E", "F"), class = "factor"), Cat.1 = structure(c(1L,
1L, 1L, 4L, 3L, 1L, 1L, 2L, 1L, 4L), .Label = c("", "B",
"C", "D"), class = "factor"), Cat.2 = structure(c(1L, 1L,
2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), .Label = c("", "A"), class = "factor"),
Betw = c(5.87, 0.11, 2.18, 5.31, 0, 352.1, 0.35, 0, 0, 15.04
)), .Names = c("Group", "Group.1", "Group.2", "Cat", "Cat.1",
"Cat.2", "Betw"), class = "data.frame", row.names = c("1", "2",
"3", "4", "5", "6", "7", "8", "9", "10"))
我们也可以使用 data.table
。我们将 'data.frame' 转换为 'data.table'(setDT(dat)
,使用 melt
重塑为长格式,删除 'Var2' 的空白行,并删除子字符串'Var1' 以 .
开头到字符串结尾(如果存在)。
library(data.table)#v1.9.6+
melt(setDT(dat), id.var='Betw', variable.name='Var1',
value.name='Var2')[Var2!=''][, Var1:= sub('\..*', '', Var1)][]
我想直接应用 melt
对您不起作用,因为数据框中的列名称重复。所以按照@akrun 的思路,你可以使用这样的东西
tmp <- data.frame(df, check.names=T)
tmp <- melt(tmp, id="Betw", variable.name="Var1", value.name="Var2")
tmp$Var1 <- gsub("(.*)\.[0-9]", "\1", tmp$Var1)
df <- subset(tmp, Var2!="")
我用的数据框,
df <- data.frame(Group=c("a","b","c","d","e","f","g","h","i","j"),
Group=c("","","","","","","","","m",""),
Group=c("","j","","","","","","","",""),
Cat=c("A","A","B","C","E","E","E","A","F","A"),
Cat=c("","","","D","C","","","B","","D"),
Cat=c("","","A","","","","","","",""),
Betw=c(5.87,0.11,2.18,5.31,0,352.1,0.35,0,0,15.04),
check.names = F)
我有这个数据集:
Group Group Group Cat Cat Cat Betw
1 a A 5.87
2 b j A 0.11
3 c B A 2.18
4 d C D 5.31
5 e E C 0.00
6 f E 352.10
7 g E 0.35
8 h A B 0.00
9 i m F 0.00
10 j A D 15.04
我想对其进行整形,以便只有 3 列:Var1(将是 'Group' 或 'Cat')、Var2(将具有小写或大写字母)和之间
因此,例如,c、b 和 A 的值均为 2.1892749,
Var1 Var2 Betw
1 Group a 5.87
2 Cat A 5.87
3 Group b 0.11
4 Group j 0.11
5 Cat A 0.11
...
如何用 R 做到这一点?
您可以使用 dplyr
和 tidyr
。首先我们 gather
到长数据,然后删除放在列上的额外数字,然后我们删除空白:
library(dplyr)
library(tidyr)
dat %>% gather(Var1, Var2, -Betw) %>%
mutate(Var1 = gsub(".[0-9]$", "", Var1)) %>%
filter(Var2 != "")
使用的数据:
structure(list(Group = structure(1:10, .Label = c("a", "b", "c",
"d", "e", "f", "g", "h", "i", "j"), class = "factor"), Group.1 = structure(c(1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 1L), .Label = c("", "m"), class = "factor"),
Group.2 = structure(c(1L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L), .Label = c("", "j"), class = "factor"), Cat = structure(c(1L,
1L, 2L, 3L, 4L, 4L, 4L, 1L, 5L, 1L), .Label = c("A", "B",
"C", "E", "F"), class = "factor"), Cat.1 = structure(c(1L,
1L, 1L, 4L, 3L, 1L, 1L, 2L, 1L, 4L), .Label = c("", "B",
"C", "D"), class = "factor"), Cat.2 = structure(c(1L, 1L,
2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), .Label = c("", "A"), class = "factor"),
Betw = c(5.87, 0.11, 2.18, 5.31, 0, 352.1, 0.35, 0, 0, 15.04
)), .Names = c("Group", "Group.1", "Group.2", "Cat", "Cat.1",
"Cat.2", "Betw"), class = "data.frame", row.names = c("1", "2",
"3", "4", "5", "6", "7", "8", "9", "10"))
我们也可以使用 data.table
。我们将 'data.frame' 转换为 'data.table'(setDT(dat)
,使用 melt
重塑为长格式,删除 'Var2' 的空白行,并删除子字符串'Var1' 以 .
开头到字符串结尾(如果存在)。
library(data.table)#v1.9.6+
melt(setDT(dat), id.var='Betw', variable.name='Var1',
value.name='Var2')[Var2!=''][, Var1:= sub('\..*', '', Var1)][]
我想直接应用 melt
对您不起作用,因为数据框中的列名称重复。所以按照@akrun 的思路,你可以使用这样的东西
tmp <- data.frame(df, check.names=T)
tmp <- melt(tmp, id="Betw", variable.name="Var1", value.name="Var2")
tmp$Var1 <- gsub("(.*)\.[0-9]", "\1", tmp$Var1)
df <- subset(tmp, Var2!="")
我用的数据框,
df <- data.frame(Group=c("a","b","c","d","e","f","g","h","i","j"),
Group=c("","","","","","","","","m",""),
Group=c("","j","","","","","","","",""),
Cat=c("A","A","B","C","E","E","E","A","F","A"),
Cat=c("","","","D","C","","","B","","D"),
Cat=c("","","A","","","","","","",""),
Betw=c(5.87,0.11,2.18,5.31,0,352.1,0.35,0,0,15.04),
check.names = F)