基于R中组的数据框中两列的相关比
Correlation ratio for two columns in data frame based on group in R
我正在尝试根据另一列的唯一值查找数据框中两列之间的相关率(我认为这是正确的术语,不擅长统计)。我不确定我是否使用了正确的功能。我希望下面的数字以黄色突出显示。我似乎无法得到我正在寻找的东西。非常感谢任何帮助。
示例数据:
test_df<-structure(list(stdate = c("2015-06-25", "2015-06-25", "2015-06-29",
"2015-06-29", "2008-05-05", "2008-05-05", "2015-06-30", "2015-06-30",
"2015-06-30", "2017-11-15", "2017-11-15", "2017-11-13", "2017-11-13",
"2015-08-31", "2015-08-31", "2008-05-01", "2008-05-01", "2017-02-14",
"2017-02-14", "2017-02-13"), sttime = c("10:30:00", "10:30:00",
"09:45:00", "09:45:00", "11:50:00", "11:50:00", "10:45:00", "10:45:00",
"09:00:00", "09:50:00", "09:50:00", "09:10:00", "09:10:00", "13:50:00",
"13:50:00", "09:30:00", "09:30:00", "10:30:00", "10:30:00", "08:30:00"
), locid = c("USGS-01388500", "USGS-01388500", "USGS-01464585",
"USGS-01464585", "USGS-01464515", "USGS-01464515", "USGS-01407330",
"USGS-01407330", "USGS-01466500", "USGS-01387500", "USGS-01387500",
"USGS-01395000", "USGS-01395000", "USGS-01400860", "USGS-01400860",
"USGS-01377000", "USGS-01377000", "USGS-01367625", "USGS-01367625",
"USGS-01398000"), Specific_conductance = c(525, 525, 184, 184,
226, 226, 203, 203, 41, 674, 674, 466, 466, 312, 312, 540, 540,
844, 844, 683), tds = c(294, 275, 119, 100, 155, 116, 155, 115,
43, 403, 382, 286, 274, 177, 173, 328, 277, 435, 440, 347)), .Names = c("stdate",
"sttime", "locid", "Specific_conductance", "tds"), row.names = c(NA,
20L), class = "data.frame")
代码:
correlation_df<-test_df%>%
group_by(locid)%>%
summarise(correl=cor(tds,Specific_conductance))
当我 运行 时,我得到了 NA 的 1 x 1 数据帧。我想要每个 locid 的值
您是否使用完整数据尝试过 运行 该代码?在您的 test_df
中,每个 locid
只有两个条目,因此它试图关联两个数字(这将始终给出 NA)。如果我用更多数据组成一个虚拟数据框,它工作正常:
test_df <- tibble(locid = rep(c("a", "b", "c", "d"), 100), tds = rnorm(400),
Specific_conductance = rnorm(400))
correlation_df <- test_df%>%
group_by(locid)%>%
summarise(correl = cor(tds,Specific_conductance))
correlation_df
我正在尝试根据另一列的唯一值查找数据框中两列之间的相关率(我认为这是正确的术语,不擅长统计)。我不确定我是否使用了正确的功能。我希望下面的数字以黄色突出显示。我似乎无法得到我正在寻找的东西。非常感谢任何帮助。
示例数据:
test_df<-structure(list(stdate = c("2015-06-25", "2015-06-25", "2015-06-29",
"2015-06-29", "2008-05-05", "2008-05-05", "2015-06-30", "2015-06-30",
"2015-06-30", "2017-11-15", "2017-11-15", "2017-11-13", "2017-11-13",
"2015-08-31", "2015-08-31", "2008-05-01", "2008-05-01", "2017-02-14",
"2017-02-14", "2017-02-13"), sttime = c("10:30:00", "10:30:00",
"09:45:00", "09:45:00", "11:50:00", "11:50:00", "10:45:00", "10:45:00",
"09:00:00", "09:50:00", "09:50:00", "09:10:00", "09:10:00", "13:50:00",
"13:50:00", "09:30:00", "09:30:00", "10:30:00", "10:30:00", "08:30:00"
), locid = c("USGS-01388500", "USGS-01388500", "USGS-01464585",
"USGS-01464585", "USGS-01464515", "USGS-01464515", "USGS-01407330",
"USGS-01407330", "USGS-01466500", "USGS-01387500", "USGS-01387500",
"USGS-01395000", "USGS-01395000", "USGS-01400860", "USGS-01400860",
"USGS-01377000", "USGS-01377000", "USGS-01367625", "USGS-01367625",
"USGS-01398000"), Specific_conductance = c(525, 525, 184, 184,
226, 226, 203, 203, 41, 674, 674, 466, 466, 312, 312, 540, 540,
844, 844, 683), tds = c(294, 275, 119, 100, 155, 116, 155, 115,
43, 403, 382, 286, 274, 177, 173, 328, 277, 435, 440, 347)), .Names = c("stdate",
"sttime", "locid", "Specific_conductance", "tds"), row.names = c(NA,
20L), class = "data.frame")
代码:
correlation_df<-test_df%>%
group_by(locid)%>%
summarise(correl=cor(tds,Specific_conductance))
当我 运行 时,我得到了 NA 的 1 x 1 数据帧。我想要每个 locid 的值
您是否使用完整数据尝试过 运行 该代码?在您的 test_df
中,每个 locid
只有两个条目,因此它试图关联两个数字(这将始终给出 NA)。如果我用更多数据组成一个虚拟数据框,它工作正常:
test_df <- tibble(locid = rep(c("a", "b", "c", "d"), 100), tds = rnorm(400),
Specific_conductance = rnorm(400))
correlation_df <- test_df%>%
group_by(locid)%>%
summarise(correl = cor(tds,Specific_conductance))
correlation_df