R中相关性的输出

Question

我希望有人能帮我弄清楚相关矩阵。具体来说，我想知道输出以及 - 为什么输出是这样的。

我的目的是了解两个分类（无序名义）数据之间的相关性。在使用其他方法获取计数之前，清理数据（下方）以从名义变量中创建因子。

例如，我利用虚拟数据在 R 中创建了一个相关矩阵：

set.seed(1234)
randomCities<-c("Washington","Boston","Seattle","Portland","Oakland","Dallas","Miami")
randomYachts<-c("BigOl Yacht","Notsobig Yacht","Fancy Yacht","SuperFancy Yacht")
randomYears<-c(2019,2017,2016,2015,2018)
randomQuarters<-c(1,2,3,4)

dat1<-data.frame(city=sample(randomCities,400,replace = T),
                 yachts=sample(randomYachts,400,replace = T),
                 year = sample(randomYears,400,replace=T),
                 qtr = sample(randomQuarters,400,replace = T),
                 stringsAsFactors = F)

然后我对数据进行子集化，将我要检查的变量转换为因子：

#store the vars as factors
fac.Yachts<-as.factor(dat1$yachts)
fac.City<-as.factor(dat1$city)

使用 gmodels 包，我创建了一个应急方案 table:

#Create contingency table 
joint_counts = joint$t

joint_counts
            y
x            BigOl Yacht Fancy Yacht Notsobig Yacht
  Boston              19          12             10
  Dallas              12          18             15
  Miami               16          16             11
  Oakland              6          12             11
  Portland            14          16             14
  Seattle             12          19              9
  Washington          13          15             16

最后，我利用 cor() 和 Hmsic 包创建了一个相关矩阵：

cor1<-cor(joint_counts)

#cor() function

>cor(joint_counts)
                  BigOl Yacht  Fancy Yacht Notsobig Yacht SuperFancy Yacht
BigOl Yacht       1.000000000 -0.006586363    -0.09691724      -0.25682171
Fancy Yacht      -0.006586363  1.000000000     0.14098436       0.01312562
Notsobig Yacht   -0.096917240  0.140984364     1.00000000      -0.66337471
SuperFancy Yacht -0.256821708  0.013125623    -0.66337471       1.00000000


#Output from Hmsic
res2<-rcorr(as.matrix(joint_counts))
>res2$r
                  BigOl Yacht  Fancy Yacht Notsobig Yacht SuperFancy Yacht
BigOl Yacht       1.000000000 -0.006586363    -0.09691724      -0.25682171
Fancy Yacht      -0.006586363  1.000000000     0.14098436       0.01312562
Notsobig Yacht   -0.096917240  0.140984364     1.00000000      -0.66337471
SuperFancy Yacht -0.256821708  0.013125623    -0.66337471       1.00000000

现在，我的问题是 - 为什么相关矩阵会产生此输出？意思是，我的目的是了解 Yacht 与 City[ 的关系=43=]，但矩阵（似乎？）告诉我 Yacht 的水平是如何相关的。

*注意：利用创建的 *joint 变量，我得到了一些信息，但是，从中创建相关矩阵时，似乎我只得到游艇之间的关系。我只是读错了相关矩阵吗？

joint = CrossTable(fac.City,fac.Yachts,prop.chisq = F) $prop.row y x BigOl Yacht Fancy Yacht Notsobig Yacht SuperFancy Yacht Boston 0.3275862 0.2068966 0.1724138 0.2931034 Dallas 0.2142857 0.3214286 0.2678571 0.1964286 Miami 0.2909091 0.2909091 0.2000000 0.2181818 Oakland 0.1224490 0.2448980 0.2244898 0.4081633 Portland 0.2187500 0.2500000 0.2187500 0.3125000 Seattle 0.1875000 0.2968750 0.1406250 0.3750000 Washington 0.2407407 0.2777778 0.2962963 0.1851852

Answer 1

相关性仅对定量变量有意义。您的代码计算每种类型的游艇数量之间的相关性，即，频率矩阵的列之间的相关性。

定性变量有类似的相关性： Cramer's V、披等

library(DescTools) 
counts <- table(dat1[,1:2])
CramerV(counts)  # 0.15

R中相关性的输出

Output for correlation in R

r

chi-squared

correlation