将来自两个数据帧的数据关联为一个,保留原始数据的信息
Correlate data from two dataframes into one, retaining original ones' info
我必须将来自两个数据帧的信息转换成一个矩阵,该矩阵是专门为我的进一步分析而设计的。我将首先介绍我正在使用的数据类型的玩具示例。
Game1 <- structure(list(Score1 = c(5, 9), Score2 = c(4.8, 12.8), Score3 = c(7.22,
2.3), Class = structure(2:1, .Label = c("Dwarf", "Paladin"), class = "factor"),
Race = structure(1:2, .Label = c("Dwarf,", "Elf"), class = "factor")), row.names = c("Stan",
"Kyle"), class = "data.frame")
Game2 <- structure(list(Score1 = c(3, 8.1), Score2 = c(6.3, 6.6), Score3 = c(1.2,
10.3), Class = structure(2:1, .Label = c("Rouge", "Wizard"), class = "factor"),
Race = structure(2:1, .Label = c("Gnome", "Human,"), class = "factor")), row.names = c("Cartman",
"Kenny"), class = "data.frame")
我想对不同球员的得分进行相关性分析,理想情况下是平均皮尔逊相关性。我想保留原始两个数据框中的某些残留特征,如下所示。
我希望的输出是:
Correlation Game1_Class Game1_Race Game2_Class Game2_Race
Stan:Cartman -0.815 Paladin Dwarf Wizard Human
Kyle:Cartman 0.942 Fighter Elf Wizard Human
Stan:Kenny 0.947 Wizard Human Ranger Gnome
Kyle:Kenny -0.998 Rouge Gnome Ranger Gnome
我使用了通用系数相关性来计算相关性,我更愿意使用 pearsons 或 spearman。
每个数据框中的行数(在我的真实数据中)有很大不同。
您可以定义一个自定义函数来创建这样的 table
name_combn <- expand.grid(rownames(Game1), rownames(Game2),stringsAsFactors = FALSE)
cor_table <- function(names,df1, df2){
n1 <- as.character(names[1])
n2 <- as.character(names[2])
# 1:3 are the col. positions with numeric scores:
r <- cor(as.numeric(df1[n1, 1:3]),as.numeric(df2[n2, 1:3]))
data.frame(names = paste(n1,":",n2, sep = ""),cor =r,
df1[n1, -c(1:3)],
df2[n2, -c(1:3)], row.names = "")
}
df <- do.call(rbind,apply(name_combn, 1, cor_table, df1 = Game1, df2 = Game2))
# names cor Class Race Class.1 Race.1
# Stan:Cartman -0.8154535 Paladin Dwarf Wizard Human
# Kyle:Cartman 0.9472246 Dwarf Elf Wizard Human
# Stan:Kenny 0.9426604 Paladin Dwarf Rouge Gnome
# Kyle:Kenny -0.9987835 Dwarf Elf Rouge Gnome
我必须将来自两个数据帧的信息转换成一个矩阵,该矩阵是专门为我的进一步分析而设计的。我将首先介绍我正在使用的数据类型的玩具示例。
Game1 <- structure(list(Score1 = c(5, 9), Score2 = c(4.8, 12.8), Score3 = c(7.22,
2.3), Class = structure(2:1, .Label = c("Dwarf", "Paladin"), class = "factor"),
Race = structure(1:2, .Label = c("Dwarf,", "Elf"), class = "factor")), row.names = c("Stan",
"Kyle"), class = "data.frame")
Game2 <- structure(list(Score1 = c(3, 8.1), Score2 = c(6.3, 6.6), Score3 = c(1.2,
10.3), Class = structure(2:1, .Label = c("Rouge", "Wizard"), class = "factor"),
Race = structure(2:1, .Label = c("Gnome", "Human,"), class = "factor")), row.names = c("Cartman",
"Kenny"), class = "data.frame")
我想对不同球员的得分进行相关性分析,理想情况下是平均皮尔逊相关性。我想保留原始两个数据框中的某些残留特征,如下所示。
我希望的输出是:
Correlation Game1_Class Game1_Race Game2_Class Game2_Race
Stan:Cartman -0.815 Paladin Dwarf Wizard Human
Kyle:Cartman 0.942 Fighter Elf Wizard Human
Stan:Kenny 0.947 Wizard Human Ranger Gnome
Kyle:Kenny -0.998 Rouge Gnome Ranger Gnome
我使用了通用系数相关性来计算相关性,我更愿意使用 pearsons 或 spearman。
每个数据框中的行数(在我的真实数据中)有很大不同。
您可以定义一个自定义函数来创建这样的 table
name_combn <- expand.grid(rownames(Game1), rownames(Game2),stringsAsFactors = FALSE)
cor_table <- function(names,df1, df2){
n1 <- as.character(names[1])
n2 <- as.character(names[2])
# 1:3 are the col. positions with numeric scores:
r <- cor(as.numeric(df1[n1, 1:3]),as.numeric(df2[n2, 1:3]))
data.frame(names = paste(n1,":",n2, sep = ""),cor =r,
df1[n1, -c(1:3)],
df2[n2, -c(1:3)], row.names = "")
}
df <- do.call(rbind,apply(name_combn, 1, cor_table, df1 = Game1, df2 = Game2))
# names cor Class Race Class.1 Race.1
# Stan:Cartman -0.8154535 Paladin Dwarf Wizard Human
# Kyle:Cartman 0.9472246 Dwarf Elf Wizard Human
# Stan:Kenny 0.9426604 Paladin Dwarf Rouge Gnome
# Kyle:Kenny -0.9987835 Dwarf Elf Rouge Gnome