merge()中的列乱序
Column disorder in merge()
我正在尝试根据公共列将数据框 bar
连接到 foo
,然后在 foo
:
中保留原始列顺序
> head(foo)
city course1
1 Aalborg JEMES
2 Aarhus EM-SANF
3 Aix-en-Provence EMLE
4 Almaty IMRCEES
5 Alnarp SUFONAMA
6 Amsterdam ATOSIM
> colnames(foo)
[1] "city" "course1"
> head(bar)
code website
1 4CITIES http://www.4cities.eu/
2 ACES http://www.sams.ac.uk/aces-erasmus
3 ADVANCES http://www.socialworkadvances.org/
4 AMASE http://www.amase-master.net/
5 ARCHMAT http://www.erasmusmundus-archmat.uevora.pt/
6 ASC
http://www.master-asc.org/
> colnames(bar)
[1] "code" "website"
连接列是 foo
中的 course
和 bar
中的 code
。我使用了以下公式:
test <- merge(x = foo, y = bar, by.x = "course1", by.y = "code", all.x=TRUE)[, union(names(foo), names(bar))]
这会失败并产生以下错误消息:
Error in `[.data.frame`(merge(x = foo, y = bar, by.x = "course1", by.y = "code", :
undefined columns selected
我找到了这个解决方案 here,但它不起作用,即使 none 的列名称重复。可能是什么问题呢?
一个简单的连接有效(无需重新排序)但将连接列置于最前面:
> head(test)
course1 city website
1 JEMES Aalborg http://www.jemes-cisu.eu/
2 JEMES Aveiro http://www.jemes-cisu.eu/
3 JEMES Hamburg http://www.jemes-cisu.eu/
4 EM-SANF Aarhus http://www.emsanf.eu/UK/
5 EM-SANF Wageningen http://www.emsanf.eu/UK/
6 EM-SANF Debrecen http://www.emsanf.eu/UK/
我试过添加 sort = F
和删除 all.x = TRUE
,但这不起作用。问题是我的实际数据帧有更多的列,并且将通过多个连接进行,所以我想在一个函数中保留所有列的顺序。是否有已知的有效解决方法或保留连接中列顺序的包?
> names(test)
[1] "course1" "city" "website"
> names(foo)
[1] "city" "course1"
> names(bar)
[1] "code" "website"
你的重建索引 ([,union(names(foo), names(bar))]
) 是罪魁祸首:因为 names(bar)
有 "code"
不存在,你会得到一个索引错误。这是更正后的代码:
allnames <- union(names(foo), recode(names(bar), code = "course1"))
merge(foo, bar, by.x = "course1", by.y = "code", all.x = TRUE)[,allnames]
由于您的示例不可重现(此处的合并为空,没有任何共同点),我将使用修改后的结构进行演示:
foo <- structure(list(city = c("Aalborg", "Aarhus", "Aix-en-Provence",
"Almaty", "Alnarp", "Amsterdam"), course1 = c("JEMES", "EM-SANF",
"EMLE", "IMRCEES", "SUFONAMA", "ATOSIM")), .Names = c("city",
"course1"), class = "data.frame", row.names = c(NA, -6L))
bar <- structure(list(code = c("4CITIES", "ACES", "ADVANCES", "AMASE",
"ARCHMAT", "ASC"), website = c("http://www.4cities.eu/", "http://www.sams.ac.uk/aces-erasmus",
"http://www.socialworkadvances.org/", "http://www.amase-master.net/",
"http://www.erasmusmundus-archmat.uevora.pt/", "http://www.master-asc.org/"
)), .Names = c("code", "website"), row.names = c(NA, 6L), class = "data.frame")
现在确保 bar$code
中的值出现在 foo$course
中:
set.seed(42)
bar$code <- sample(foo$course1)
结果:
allnames <- union(names(foo), recode(names(bar), code = "course1"))
merge(foo, bar, by.x = "course1", by.y = "code", all.x = TRUE)[,allnames]
# city course1 website
# 1 Amsterdam ATOSIM http://www.4cities.eu/
# 2 Aarhus EM-SANF http://www.socialworkadvances.org/
# 3 Aix-en-Provence EMLE http://www.amase-master.net/
# 4 Almaty IMRCEES http://www.erasmusmundus-archmat.uevora.pt/
# 5 Aalborg JEMES http://www.master-asc.org/
# 6 Alnarp SUFONAMA http://www.sams.ac.uk/aces-erasmus
我正在尝试根据公共列将数据框 bar
连接到 foo
,然后在 foo
:
> head(foo)
city course1
1 Aalborg JEMES
2 Aarhus EM-SANF
3 Aix-en-Provence EMLE
4 Almaty IMRCEES
5 Alnarp SUFONAMA
6 Amsterdam ATOSIM
> colnames(foo)
[1] "city" "course1"
> head(bar)
code website
1 4CITIES http://www.4cities.eu/
2 ACES http://www.sams.ac.uk/aces-erasmus
3 ADVANCES http://www.socialworkadvances.org/
4 AMASE http://www.amase-master.net/
5 ARCHMAT http://www.erasmusmundus-archmat.uevora.pt/
6 ASC
http://www.master-asc.org/
> colnames(bar)
[1] "code" "website"
连接列是 foo
中的 course
和 bar
中的 code
。我使用了以下公式:
test <- merge(x = foo, y = bar, by.x = "course1", by.y = "code", all.x=TRUE)[, union(names(foo), names(bar))]
这会失败并产生以下错误消息:
Error in `[.data.frame`(merge(x = foo, y = bar, by.x = "course1", by.y = "code", :
undefined columns selected
我找到了这个解决方案 here,但它不起作用,即使 none 的列名称重复。可能是什么问题呢?
一个简单的连接有效(无需重新排序)但将连接列置于最前面:
> head(test)
course1 city website
1 JEMES Aalborg http://www.jemes-cisu.eu/
2 JEMES Aveiro http://www.jemes-cisu.eu/
3 JEMES Hamburg http://www.jemes-cisu.eu/
4 EM-SANF Aarhus http://www.emsanf.eu/UK/
5 EM-SANF Wageningen http://www.emsanf.eu/UK/
6 EM-SANF Debrecen http://www.emsanf.eu/UK/
我试过添加 sort = F
和删除 all.x = TRUE
,但这不起作用。问题是我的实际数据帧有更多的列,并且将通过多个连接进行,所以我想在一个函数中保留所有列的顺序。是否有已知的有效解决方法或保留连接中列顺序的包?
> names(test)
[1] "course1" "city" "website"
> names(foo)
[1] "city" "course1"
> names(bar)
[1] "code" "website"
你的重建索引 ([,union(names(foo), names(bar))]
) 是罪魁祸首:因为 names(bar)
有 "code"
不存在,你会得到一个索引错误。这是更正后的代码:
allnames <- union(names(foo), recode(names(bar), code = "course1"))
merge(foo, bar, by.x = "course1", by.y = "code", all.x = TRUE)[,allnames]
由于您的示例不可重现(此处的合并为空,没有任何共同点),我将使用修改后的结构进行演示:
foo <- structure(list(city = c("Aalborg", "Aarhus", "Aix-en-Provence",
"Almaty", "Alnarp", "Amsterdam"), course1 = c("JEMES", "EM-SANF",
"EMLE", "IMRCEES", "SUFONAMA", "ATOSIM")), .Names = c("city",
"course1"), class = "data.frame", row.names = c(NA, -6L))
bar <- structure(list(code = c("4CITIES", "ACES", "ADVANCES", "AMASE",
"ARCHMAT", "ASC"), website = c("http://www.4cities.eu/", "http://www.sams.ac.uk/aces-erasmus",
"http://www.socialworkadvances.org/", "http://www.amase-master.net/",
"http://www.erasmusmundus-archmat.uevora.pt/", "http://www.master-asc.org/"
)), .Names = c("code", "website"), row.names = c(NA, 6L), class = "data.frame")
现在确保 bar$code
中的值出现在 foo$course
中:
set.seed(42)
bar$code <- sample(foo$course1)
结果:
allnames <- union(names(foo), recode(names(bar), code = "course1"))
merge(foo, bar, by.x = "course1", by.y = "code", all.x = TRUE)[,allnames]
# city course1 website
# 1 Amsterdam ATOSIM http://www.4cities.eu/
# 2 Aarhus EM-SANF http://www.socialworkadvances.org/
# 3 Aix-en-Provence EMLE http://www.amase-master.net/
# 4 Almaty IMRCEES http://www.erasmusmundus-archmat.uevora.pt/
# 5 Aalborg JEMES http://www.master-asc.org/
# 6 Alnarp SUFONAMA http://www.sams.ac.uk/aces-erasmus